Presence of ATG triplets in 5' untranslated regions of eukaryotic cDNAs correlates with a 'weak' context of the start codon

Bioinformatics. 2001 Oct;17(10):890-900. doi: 10.1093/bioinformatics/17.10.890.


Motivation: The context of the start codon (typically, AUG) and the features of the 5' Untranslated Regions (5' UTRs) are important for understanding translation regulation in eukaryotic mRNAs and for accurate prediction of the coding region in genomic and cDNA sequences. The presence of AUG triplets in 5' UTRs (upstream AUGs) might effect the initiation rate and, in the context of gene prediction, could reduce the accuracy of the identification of the authentic start. To reveal potential connections between the presence of upstream AUGs and other features of 5' UTRs, such as their length and the start codon context, we undertook a systematic analysis of the available eukaryotic 5' UTR sequences.

Results: We show that a large fraction of 5' UTRs in the available cDNA sequences, 15-53% depending on the organism, contain upstream ATGs. A negative correlation was observed between the information content of the translation start signal and the length of the 5' UTR. Similarly, a negative correlation exists between the 'strength' of the start context and the number of upstream ATGs. Typically, cDNAs containing long 5' UTRs with multiple upstream ATGs have a 'weak' start context, and in contrast, cDNAs containing short 5' UTRs without ATGs have 'strong' starts. These counter-intuitive results may be interpreted in terms of upstream AUGs having an important role in the regulation of translation efficiency by ensuring low basal translation level via double negative control and creating the potential for additional regulatory mechanisms. One of such mechanisms, supported by experimental studies of some mRNAs, includes removal of the AUG-containing portion of the 5' UTR by alternative splicing.

Availability: An ATG_ EVALUATOR program is available upon request or at


Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • 5' Untranslated Regions*
  • Alternative Splicing
  • Animals
  • Base Composition
  • Base Sequence
  • Codon, Initiator / genetics*
  • Computational Biology
  • DNA, Complementary / genetics*
  • Humans
  • Linear Models
  • Models, Genetic
  • RNA, Messenger / genetics
  • Sequence Analysis, DNA / statistics & numerical data
  • Software


  • 5' Untranslated Regions
  • Codon, Initiator
  • DNA, Complementary
  • RNA, Messenger