Nonrandom tripeptide sequence distributions at protein carboxyl termini

Genome Res. 2003 Apr;13(4):617-23. doi: 10.1101/gr.667603.

Abstract

The availability of complete genome sequences enables the statistical analysis of sequence features without significant database-imposed bias. The carboxyl termini of proteins often contain regions associated with protein targeting and enhanced translational termination. We analyzed the frequency of occurrence of C-terminal tripeptides in representative archaeal, bacterial, and eukaryotic genomes. The sequence distribution in prokaryotic genomes nearly matches that generated by the randomization of the observed tripeptide set. In contrast, eukaryotic genomes contain large numbers of overrepresented sequences. Some of these correspond to highly repeated sequences from either duplicated endogenous genes or transposon open reading frames. Gratifyingly, others represent previously known targeting signals or sequences associated with an increase in translational termination efficiency. However, a number of overrepresented tripeptides have not been previously noted and may represent novel functional sequences. For example, the sequence XSS may enhance translational termination efficiency in plants, whereas FWC may be a targeting or processing signal for certain amino acid permeases in yeast.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Amino Acid Motifs
  • Amino Acid Sequence
  • Animals
  • Arabidopsis Proteins / chemistry
  • Archaeal Proteins / chemistry
  • Caenorhabditis elegans Proteins / chemistry
  • Computational Biology / methods
  • Computational Biology / statistics & numerical data
  • Databases, Protein / statistics & numerical data
  • Escherichia coli Proteins / chemistry
  • Humans
  • Molecular Sequence Data
  • Oligopeptides / chemistry*
  • Peptide Fragments / chemistry*
  • Protein Structure, Tertiary
  • Saccharomyces cerevisiae Proteins / chemistry
  • Statistical Distributions*

Substances

  • Arabidopsis Proteins
  • Archaeal Proteins
  • Caenorhabditis elegans Proteins
  • Escherichia coli Proteins
  • Oligopeptides
  • Peptide Fragments
  • Saccharomyces cerevisiae Proteins