Using the transcriptome to annotate the genome

Nat Biotechnol. 2002 May;20(5):508-12. doi: 10.1038/nbt0502-508.


A remaining challenge for the human genome project involves the identification and annotation of expressed genes. The public and private sequencing efforts have identified approximately 15,000 sequences that meet stringent criteria for genes, such as correspondence with known genes from humans or other species, and have made another approximately 10,000-20,000 gene predictions of lower confidence, supported by various types of in silico evidence, including homology studies, domain searches, and ab initio gene predictions. These computational methods have limitations, both because they are unable to identify a significant fraction of genes and exons and because they are unable to provide definitive evidence about whether a hypothetical gene is actually expressed. As the in silico approaches identified a smaller number of genes than anticipated, we wondered whether high-throughput experimental analyses could be used to provide evidence for the expression of hypothetical genes and to reveal previously undiscovered genes. We describe here the development of such a method--called long serial analysis of gene expression (LongSAGE), an adaption of the original SAGE approach--that can be used to rapidly identify novel genes and exons.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • DNA, Complementary / metabolism
  • Genetic Techniques*
  • Genome*
  • Human Genome Project*
  • Humans
  • Models, Genetic
  • RNA, Messenger / metabolism*
  • Reverse Transcriptase Polymerase Chain Reaction
  • Software


  • DNA, Complementary
  • RNA, Messenger