Finding prokaryotic genes by the 'frame-by-frame' algorithm: targeting gene starts and overlapping genes

Bioinformatics. 1999 Nov;15(11):874-86. doi: 10.1093/bioinformatics/15.11.874.


Motivation: Tightly packed prokaryotic genes frequently overlap with each other. This feature, rarely seen in eukaryotic DNA, makes detection of translation initiation sites and, therefore, exact predictions of prokaryotic genes notoriously difficult. Improving the accuracy of precise gene prediction in prokaryotic genomic DNA remains an important open problem.

Results: A software program implementing a new algorithm utilizing a uniform Hidden Markov Model for prokaryotic gene prediction was developed. The algorithm analyzes a given DNA sequence in each of six possible global reading frames independently. Twelve complete prokaryotic genomes were analyzed using the new tool. The accuracy of gene finding, predicting locations of protein-coding ORFs, as well as the accuracy of precise gene prediction, and detecting the whole gene including translation initiation codon were assessed by comparison with existing annotation. It was shown that in terms of gene finding, the program performs at least as well as the previously developed tools, such as GeneMark and GLIMMER. In terms of precise gene prediction the new program was shown to be more accurate, by several percentage points, than earlier developed tools, such as GeneMark.hmm, ECOPARSE and ORPHEUS. The results of testing the program indicated the possibility of systematic bias in start codon annotation in several early sequenced prokaryotic genomes.

Availability: The new gene-finding program can be accessed through the Web site:


Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms*
  • Bacteria / genetics*
  • Codon, Initiator / genetics*
  • Computational Biology / methods
  • Databases, Factual
  • Evaluation Studies as Topic
  • Genes, Bacterial / genetics*
  • Genes, Overlapping / genetics*
  • Genome, Bacterial
  • Models, Genetic
  • Open Reading Frames / genetics
  • Protein Biosynthesis
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Analysis, DNA*
  • Software Validation


  • Codon, Initiator