Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis

Proc Int Conf Intell Syst Mol Biol. 1997:5:226-33.


Translation in eukaryotes does not always start at the first AUG in an mRNA, implying that context information also plays a role. This makes prediction of translation initiation sites a non-trivial task, especially when analysing EST and genome data where the entire mature mRNA sequence is not known. In this paper, we employ artificial neural networks to predict which AUG triplet in an mRNA sequence is the start codon. The trained networks correctly classified 88% of Arabidopsis and 85% of vertebrate AUG triplets. We find that our trained neural networks use a combination of local start codon context and global sequence information. Furthermore, analysis of false predictions shows that AUGs in frame with the actual start codon are more frequently selected than out-of-frame AUGs, suggesting that our networks use reading frame detection. A number of conflicts between neural network predictions and database annotations are analysed in detail, leading to identification of possible database errors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Angiotensinogen / genetics
  • Animals
  • Binding Sites / genetics
  • Codon, Initiator / genetics
  • Databases, Factual
  • Eukaryotic Cells
  • Evaluation Studies as Topic
  • Gene Expression*
  • Genome*
  • Genome, Human
  • Humans
  • Molecular Sequence Data
  • Neural Networks, Computer*
  • Peptide Chain Initiation, Translational*
  • Protein Sorting Signals / genetics


  • Codon, Initiator
  • Protein Sorting Signals
  • Angiotensinogen