Ab initio gene identification: prokaryote genome annotation with GeneScan and GLIMMER

J Biosci. 2002 Feb;27(1 Suppl 1):7-14. doi: 10.1007/BF02703679.

Abstract

We compare the annotation of three complete genomes using the ab initio methods of gene identification GeneScan and GLIMMER. The annotation given in GenBank, the standard against which these are compared, has been made using GeneMark. We find a number of novel genes which are predicted by both methods used here, as well as a number of genes that are predicted by GeneMark, but are not identified by either of the nonconsensus methods that we have used. The three organisms studied here are all prokaryotic species with fairly compact genomes. The Fourier measure forms the basis for an efficient non-consensus method for gene prediction, and the algorithm GeneScan exploits this measure. We have bench-marked this program as well as GLIMMER using 3 complete prokaryotic genomes. An effort has also been made to study the limitations of these techniques for complete genome analysis. GeneScan and GLIMMER are of comparable accuracy insofar as gene-identification is concerned, with sensitivities and specificities typically greater than 0.9. The number of false predictions (both positive and negative) is higher for GeneScan as compared to GLIMMER, but in a significant number of cases, similar results are provided by the two techniques. This suggests that there could be some as-yet unidentified additional genes in these three genomes, and also that some of the putative identifications made hitherto might require re-evaluation. All these cases are discussed in detail.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Campylobacter jejuni / genetics
  • Computational Biology*
  • DNA, Bacterial / genetics
  • Databases, Nucleic Acid
  • Fourier Analysis
  • Genes, Bacterial
  • Genome, Bacterial*
  • Haemophilus influenzae / genetics
  • Helicobacter pylori / genetics
  • Mathematics
  • Sensitivity and Specificity
  • Sequence Analysis, DNA* / instrumentation
  • Sequence Analysis, DNA* / methods

Substances

  • DNA, Bacterial