Advances in the Exon-Intron Database (EID)

Brief Bioinform. 2006 Jun;7(2):178-85. doi: 10.1093/bib/bbl003. Epub 2006 Mar 9.

Abstract

Investigation of exon-intron gene structures is a non-trivial task due to enormous expansions of the eukaryotic genomes, great variety of gene forms, and the imperfectness in sequence data. A number of available informational systems on various gene characteristics complement each other and are indispensable for many genomic studies. Among them, the Exon-Intron Database (EID) is a good choice for large-scale computational examination of exon/intron structure and splicing. It has many internal filters that control for sequence quality, consistency of gene descriptions, accordance to standards, and possible errors. New innovations in EID are described. The collection of exons and introns has been extended beyond coding regions and current versions of EID contain data on untranslated regions of gene sequences as well. Intron-less genes are included as a special part of EID. For species with entirely sequenced genomes, species-specific databases have been generated. A novel Mammalian Orthologous Intron Database (MOID) has been introduced which includes the full set of introns that come from orthologous genes that have the same positions relative to the reading frames. Examples of statistical analyses of gene sequences using EID are provided. We present the latest data on our comparison of intron positions in 11,025 orthologous genes of human, mouse and rat, and find no convincing cases of intron gain. We discuss relevant data-quality issues of genomic databases. In particular, 5% of genes in genomic databases contain internal stop codons. This fact is due to a combination of biological reasons and also to errors in sequence annotations. The EID is freely available at www.meduohio.edu/bioinfo/eid/.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • Base Sequence
  • Chromosome Mapping / methods*
  • DNA, Recombinant / genetics
  • Database Management Systems*
  • Databases, Genetic*
  • Documentation / methods
  • Exons / genetics*
  • Information Storage and Retrieval / methods*
  • Introns / genetics*
  • Molecular Sequence Data
  • Sequence Alignment / methods
  • Sequence Analysis, DNA / methods*
  • User-Computer Interface

Substances

  • DNA, Recombinant