Gene recognition in eukaryotic DNA by comparison of genomic sequences

Bioinformatics. 2001 Nov;17(11):1011-8. doi: 10.1093/bioinformatics/17.11.1011.

Abstract

Motivation: Sequencing of complete eukaryotic genomes and large syntenic fragments of genomes makes it possible to apply genomic comparison for gene recognition.

Results: This paper describes a spliced alignment algorithm that aligns candidate exon chains of two homologous genomic sequence fragments from different species. The algorithm is implemented in Pro-Gen software. Unlike other algorithms, Pro-Gen does not assume conservation of the exon-intron structure. Amino acid sequences obtained by the formal translation of candidate exons are aligned instead of nucleotide sequences, which allows for distant comparisons. The algorithm was tested on a sample of human-mammal (mouse), human-vertebrate (Xenopus ) and human-invertebrate (Drosophila ) gene pairs. Surprisingly, the best results, 97-98% correlation between the actual and predicted genes, were obtained for more distant comparisons, whereas the correlation on the human-mouse sample was only 93%. The latter value increases to 95% if conservation of the exon-intron structure is assumed. This is caused by a large amount of sequence conservation in non-coding regions of the human and mouse genes probably due to regulatory elements.

Availability: Pro-Gen v. 3.0 is available to academic researchers free of charge at http://www.anchorgen.com/pro_gen/pro_gen.html.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Computational Biology
  • DNA / genetics*
  • Drosophila / genetics
  • Eukaryotic Cells
  • Exons
  • Genome*
  • Humans
  • Mice
  • Sequence Alignment / statistics & numerical data
  • Software*
  • Xenopus / genetics

Substances

  • DNA