GS-Finder: a program to find bacterial gene start sites with a self-training method

Int J Biochem Cell Biol. 2004 Mar;36(3):535-44. doi: 10.1016/j.biocel.2003.08.013.


In this paper, a self-training method is proposed to recognize translation start sites in bacterial genomes without a prior knowledge of rRNA in the genomes concerned. Many features with biological meanings are incorporated, including mononucleotide distribution patterns near the start codon, the start codon itself, the coding potential and the distance from the most-left start codon to the start codon. The proposed method correctly predicts 92% of the translation start sites of 195 experimentally confirmed Escherichia coli CDSs, 96% of 58 reliable Bacillus subtilis CDSs and 82% of 140 reliable Synechocystis CDSs. Moreover, the self-training method presented might also be used to relocate the translation start sites of putative CDSs of genomes, which are predicted by gene-finding programs. After post-processing by the method presented, the improvement of gene start prediction of some gene-finding programs is remarkable, e.g., the accuracy of gene start prediction of Glimmer 2.02 increases from 63 to 91% for 832 E. coli reliable CDSs. An open source computer program to implement the method, GS-Finder, is freely available for academic purposes from

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Codon, Initiator*
  • Databases, Genetic
  • Genome, Bacterial*
  • Nucleotides / genetics
  • Predictive Value of Tests
  • Protein Biosynthesis*
  • Software*


  • Codon, Initiator
  • Nucleotides