Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats

Genomics. 2007 Mar;89(3):316-25. doi: 10.1016/j.ygeno.2006.11.011. Epub 2006 Dec 28.


Single-amino-acid tandem repeats are very common in mammalian proteins but their function and evolution are still poorly understood. Here we investigate how the variability and prevalence of amino acid repeats are related to the evolutionary constraints operating on the proteins. We find a significant positive correlation between repeat size difference and protein nonsynonymous substitution rate in human and mouse orthologous genes. This association is observed for all the common amino acid repeat types and indicates that rapid diversification of repeat structures, involving both trinucleotide slippage and nucleotide substitutions, preferentially occurs in proteins subject to low selective constraints. However, strikingly, we also observe a significant negative correlation between the number of repeats in a protein and the gene nonsynonymous substitution rate, particularly for glutamine, glycine, and alanine repeats. This implies that proteins subject to strong selective constraints tend to contain an unexpectedly high number of repeats, which tend to be well conserved between the two species. This is consistent with a role for selection in the maintenance of a significant number of repeats. Analysis of the codon structure of the sequences encoding the repeats shows that codon purity is associated with high repeat size interspecific variability. Interestingly, polyalanine and polyglutamine repeats associated with disease show very distinctive features regarding the degree of repeat conservation and the protein sequence selective constraints.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Conserved Sequence
  • DNA, Complementary
  • Evolution, Molecular*
  • Humans
  • Mice
  • Point Mutation
  • Proteins / chemistry*
  • Proteins / genetics*
  • Repetitive Sequences, Amino Acid*
  • Selection, Genetic
  • Trinucleotide Repeats


  • DNA, Complementary
  • Proteins