Normalization and subtraction: two approaches to facilitate gene discovery

Genome Res. 1996 Sep;6(9):791-806. doi: 10.1101/gr.6.9.791.


Large-scale sequencing of cDNAs randomly picked from libraries has proven to be a very powerful approach to discover (putatively) expressed sequences that, in turn, once mapped, may greatly expedite the process involved in the identification and cloning of human disease genes. However, the integrity of the data and the pace at which novel sequences can be identified depends to a great extent on the cDNA libraries that are used. Because altogether, in a typical cell, the mRNAs of the prevalent and intermediate frequency classes comprise as much as 50-65% of the total mRNA mass, but represent no more than 1000-2000 different mRNAs, redundant identification of mRNAs of these two frequency classes is destined to become overwhelming relatively early in any such random gene discovery programs, thus seriously compromising their cost-effectiveness. With the goal of facilitating such efforts, previously we developed a method to construct directionally cloned normalized cDNA libraries and applied it to generate infant brain (INIB) and fetal liver/spleen (INFLS) libraries, from which a total of 45,192 and 86,088 expressed sequence tags, respectively, have been derived. While improving the representation of the longest cDNAs in our libraries, we developed three additional methods to normalize cDNA libraries and generated over 35 libraries, most of which have been contributed to our integrated Molecular Analysis of Genomes and Their Expression (IMAGE) Consortium and thus distributed widely and used for sequencing and mapping. In an attempt to facilitate the process of gene discovery further, we have also developed a subtractive hybridization approach designed specifically to eliminate (or reduce significantly the representation of) large pools of arrayed and (mostly) sequenced clones from normalized libraries yet to be (or just partly) surveyed. Here we present a detailed description and a comparative analysis of four methods that we developed and used to generate normalize cDNA libraries from human (15), mouse (3), rat (2), as well as the parasite Schistosoma mansoni (1). In addition, we describe the construction and preliminary characterization of a subtracted liver/spleen library (INFLS-SI) that resulted from the elimination (or reduction of representation) of -5000 INFLS-IMAGE clones from the INFLS library.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Adult
  • Animals
  • Base Sequence
  • Cloning, Molecular*
  • DNA Primers
  • DNA, Complementary
  • Female
  • Gene Library*
  • Genetic Techniques*
  • Humans
  • Mice
  • Molecular Sequence Data
  • Multiple Sclerosis / genetics
  • Plasmids
  • Polymerase Chain Reaction
  • RNA, Messenger / genetics
  • Rats
  • Schistosoma mansoni / genetics


  • DNA Primers
  • DNA, Complementary
  • RNA, Messenger

Associated data

  • GENBANK/AA817666
  • GENBANK/AA817667
  • GENBANK/AA817668
  • GENBANK/AA817669
  • GENBANK/AA817670
  • GENBANK/AA817671
  • GENBANK/AA817672
  • GENBANK/AA817673
  • GENBANK/AA817674
  • GENBANK/AA817675
  • GENBANK/AA817676
  • GENBANK/AA817677
  • GENBANK/AA817678
  • GENBANK/AA817679
  • GENBANK/AA817680
  • GENBANK/AA817681
  • GENBANK/AA817682
  • GENBANK/AA817683
  • GENBANK/AA817685
  • GENBANK/AA817686
  • GENBANK/AA817687
  • GENBANK/AA817688
  • GENBANK/AA817689
  • GENBANK/AA817690
  • GENBANK/AA817692
  • GENBANK/AA817693
  • GENBANK/AA817694