A tool for analyzing and annotating genomic sequences
- PMID: 9403056
- DOI: 10.1006/geno.1997.4984
A tool for analyzing and annotating genomic sequences
Abstract
We describe a tool for analyzing and annotating large genomic sequences containing introns. The analysis and annotation tool (AAT) includes two sets of programs, one for comparing the query sequence with a protein database and the other for comparing the query with a cDNA database. Each set contains a fast database search program and a rigorous alignment program. The database search program quickly identifies regions of the query sequence that are similar to a database sequence. Then the alignment program constructs an optimal alignment for each region and the database sequence. The alignment program also reports the coordinates of exons in the query sequence. Pairwise alignments of the query sequence with protein and cDNA database sequences are combined into multiple sequence alignments, which provide a view of all protein and cDNA sequences matching a query region. On a data set of 570 DNA sequences, AAT identified 94% of coding nucleotides correctly and 74% of exons exactly. Results of analyzing a human BAC sequence with the AAT tool are also presented. The AAT tool reduces the labor-intensive work of locating the exons of the query sequence and improves the process of defining intron-exon boundaries by using the wealth of available protein and cDNA data.
Similar articles
-
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].Yi Chuan Xue Bao. 2004 May;31(5):431-43. Yi Chuan Xue Bao. 2004. PMID: 15478601 Chinese.
-
Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring.J Mol Biol. 2000 Apr 14;297(5):1075-85. doi: 10.1006/jmbi.2000.3641. J Mol Biol. 2000. PMID: 10764574
-
Database similarity searches.Methods Mol Biol. 2008;484:361-78. doi: 10.1007/978-1-59745-398-1_24. Methods Mol Biol. 2008. PMID: 18592192
-
Advances in the Exon-Intron Database (EID).Brief Bioinform. 2006 Jun;7(2):178-85. doi: 10.1093/bib/bbl003. Epub 2006 Mar 9. Brief Bioinform. 2006. PMID: 16772261 Review.
-
Finding homologs to nucleic acid or protein sequences using the framesearch program.Curr Protoc Bioinformatics. 2002 Aug;Chapter 3:Unit 3.2. doi: 10.1002/0471250953.bi0302s00. Curr Protoc Bioinformatics. 2002. PMID: 18792937 Review.
Cited by
-
Whole genome shotgun sequencing of Brassica oleracea and its application to gene discovery and annotation in Arabidopsis.Genome Res. 2005 Apr;15(4):487-95. doi: 10.1101/gr.3176505. Genome Res. 2005. PMID: 15805490 Free PMC article.
-
Cloning and sequencing of cDNAs for hypothetical genes from chromosome 2 of Arabidopsis.Plant Physiol. 2002 Dec;130(4):2118-28. doi: 10.1104/pp.010207. Plant Physiol. 2002. PMID: 12481096 Free PMC article.
-
The TIGR Maize Database.Nucleic Acids Res. 2006 Jan 1;34(Database issue):D771-6. doi: 10.1093/nar/gkj072. Nucleic Acids Res. 2006. PMID: 16381977 Free PMC article.
-
GINGER: an integrated method for high-accuracy prediction of gene structure in higher eukaryotes at the gene and exon level.DNA Res. 2023 Aug 1;30(4):dsad017. doi: 10.1093/dnares/dsad017. DNA Res. 2023. PMID: 37478310 Free PMC article.
-
The P10K database: a data portal for the protist 10 000 genomes project.Nucleic Acids Res. 2024 Jan 5;52(D1):D747-D755. doi: 10.1093/nar/gkad992. Nucleic Acids Res. 2024. PMID: 37930867 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous
