Gene recognition in eukaryotic DNA by comparison of genomic sequences

P S Novichkov; M S Gelfand; A A Mironov

doi:10.1093/bioinformatics/17.11.1011

Gene recognition in eukaryotic DNA by comparison of genomic sequences

Bioinformatics. 2001 Nov;17(11):1011-8. doi: 10.1093/bioinformatics/17.11.1011.

Authors

P S Novichkov¹, M S Gelfand, A A Mironov

Affiliation

¹ State Scientific Center GosNIIGenetika, 1 Dorozhny pr. 1, Moscow 113545, Russia.

PMID: 11724729
DOI: 10.1093/bioinformatics/17.11.1011

Abstract

Motivation: Sequencing of complete eukaryotic genomes and large syntenic fragments of genomes makes it possible to apply genomic comparison for gene recognition.

Results: This paper describes a spliced alignment algorithm that aligns candidate exon chains of two homologous genomic sequence fragments from different species. The algorithm is implemented in Pro-Gen software. Unlike other algorithms, Pro-Gen does not assume conservation of the exon-intron structure. Amino acid sequences obtained by the formal translation of candidate exons are aligned instead of nucleotide sequences, which allows for distant comparisons. The algorithm was tested on a sample of human-mammal (mouse), human-vertebrate (Xenopus ) and human-invertebrate (Drosophila ) gene pairs. Surprisingly, the best results, 97-98% correlation between the actual and predicted genes, were obtained for more distant comparisons, whereas the correlation on the human-mouse sample was only 93%. The latter value increases to 95% if conservation of the exon-intron structure is assumed. This is caused by a large amount of sequence conservation in non-coding regions of the human and mouse genes probably due to regulatory elements.

Availability: Pro-Gen v. 3.0 is available to academic researchers free of charge at http://www.anchorgen.com/pro_gen/pro_gen.html.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Animals
Computational Biology
DNA / genetics*
Drosophila / genetics
Eukaryotic Cells
Exons
Genome*
Humans
Mice
Sequence Alignment / statistics & numerical data
Software*
Xenopus / genetics

Substances

DNA