Comparing vertebrate whole-genome shotgun reads to the human genome

Genome Res. 2001 Nov;11(11):1807-16. doi: 10.1101/gr.203601.

Abstract

Multi-species sequence comparisons are a very efficient way to reveal conserved genes. Because sequence finishing is expensive and time consuming, many genome sequences are likely to stay incomplete. A challenge is to use these fragmented data for understanding the human genome. Methods for using cross-species whole-genome shotgun sequence (WGS) for genome annotation are described in this paper. About one-half million high-quality rat WGS reads (covering 7.5% of the rat genome) generated at the Baylor College of Medicine Human Genome Sequencing Center were compared with the human genome. Using computer-generated random reads as a negative control, a set of parameters was determined for reliable interpretation of BLAST search results. About 10% of the rat reads contain regions that are conserved in the human genomic sequence and about one-third of these include known gene-coding regions. Mapping the conserved regions to human chromosomes showed a 23-fold enrichment for coding regions compared with noncoding regions. This approach can also be applied to other mammalian genomes for gene finding. These data predicted approximately 42,500 genes in the human, slightly more than reported previously.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • Base Sequence
  • Computational Biology / statistics & numerical data
  • Conserved Sequence
  • Databases, Genetic / statistics & numerical data
  • Expressed Sequence Tags
  • Genome*
  • Genome, Human*
  • Heterochromatin / genetics
  • Humans
  • Mice
  • Molecular Sequence Data
  • Rats
  • Sequence Analysis, DNA / methods*
  • Sequence Analysis, DNA / statistics & numerical data
  • Sequence Homology, Nucleic Acid
  • Species Specificity
  • Transcription, Genetic / genetics

Substances

  • Heterochromatin