The significance of digital gene expression profiles

Genome Res. 1997 Oct;7(10):986-95. doi: 10.1101/gr.7.10.986.


Genes differentially expressed in different tissues, during development, or during specific pathologies are of foremost interest to both basic and pharmaceutical research. "Transcript profiles" or "digital Northerns" are generated routinely by partially sequencing thousands of randomly selected clones from relevant cDNA libraries. Differentially expressed genes can then be detected from variations in the counts of their cognate sequence tags. Here we present the first systematic study on the influence of random fluctuations and sampling size on the reliability of this kind of data. We establish a rigorous significance test and demonstrate its use on publicly available transcript profiles. The theory links the threshold of selection of putatively regulated genes (e.g., the number of pharmaceutical leads) to the fraction of false positive clones one is willing to risk. Our results delineate more precisely and extend the limits within which digital Northern data can be used.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Actins / genetics
  • Bayes Theorem
  • Brain
  • Confidence Intervals
  • DNA, Complementary / genetics*
  • Dimethyl Sulfoxide / pharmacology
  • Gene Expression*
  • Humans
  • Liver
  • Models, Genetic*
  • Models, Statistical*
  • Pancreas
  • Probability
  • Sample Size
  • Tetradecanoylphorbol Acetate / pharmacology
  • Tumor Cells, Cultured / drug effects


  • Actins
  • DNA, Complementary
  • Tetradecanoylphorbol Acetate
  • Dimethyl Sulfoxide