The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes

Y Lee; J Tsai; S Sunkara; S Karamycheva; G Pertea; R Sultana; V Antonescu; A Chan; F Cheung; J Quackenbush

doi:10.1093/nar/gki064

The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D71-4. doi: 10.1093/nar/gki064.

Authors

Y Lee¹, J Tsai, S Sunkara, S Karamycheva, G Pertea, R Sultana, V Antonescu, A Chan, F Cheung, J Quackenbush

Affiliation

¹ The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA. dlee@tigr.org

Abstract

Although the list of completed genome sequencing projects has expanded rapidly, sequencing and analysis of expressed sequence tags (ESTs) remain a primary tool for discovery of novel genes in many eukaryotes and a key element in genome annotation. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi) are a collection of 77 species-specific databases that use a highly refined protocol to analyze gene and EST sequences in an attempt to identify and characterize expressed transcripts and to present them on the Web in a user-friendly, consistent fashion. A Gene Index database is constructed for each selected organism by first clustering, then assembling EST and annotated cDNA and gene sequences from GenBank. This process produces a set of unique, high-fidelity virtual transcripts, or tentative consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to genetic and physical maps, to provide links to orthologous and paralogous genes, and as a resource for comparative and functional genomic analysis.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Animals
Base Sequence
Consensus Sequence
Databases, Genetic* / trends
Eukaryotic Cells / metabolism
Expressed Sequence Tags / chemistry*
Genome
Genomics*
Humans
Internet
Sequence Analysis, DNA
Software