Deep conservation of human protein tandem repeats within the eukaryotes

Mol Biol Evol. 2014 May;31(5):1132-48. doi: 10.1093/molbev/msu062. Epub 2014 Feb 3.


Tandem repeats (TRs) are a major element of protein sequences in all domains of life. They are particularly abundant in mammals, where by conservative estimates one in three proteins contain a TR. High generation-scale duplication and deletion rates were reported for nucleic TR units. However, it is not known whether protein TR units can also be frequently lost or gained providing a source of variation for rapid adaptation of protein function, or alternatively, tend to have conserved TR unit configurations over long evolutionary times. To obtain a systematic picture, we performed a proteome-wide analysis of the mode of evolution for human protein TRs. For this purpose, we propose a novel method for the detection of orthologous TRs based on circular profile hidden Markov models. For all detected TRs, we reconstructed bispecies TR unit phylogenies across 61 eukaryotes ranging from human to yeast. Moreover, we performed additional analyses to correlate functional and structural annotations of human TRs with their mode of evolution. Surprisingly, we find that the vast majority of human TRs are ancient, with TR unit number and order preserved intact since distant speciation events. For example, ≥ 61% of all human TRs have been strongly conserved at least since the root of all mammals, approximately 300 Ma. Further, we find no human protein TR that shows evidence for strong recent duplications and deletions. The results are in contrast to the high generation-scale mutability of nucleic TRs. Presumably, most protein TRs fold into stable and conserved structures that are indispensable for the function of the TR-containing protein. All of our data and results are available for download from

Keywords: conservation; phylogenetic analysis; protein evolution; tandem repeats.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Substitution
  • Animals
  • Conserved Sequence
  • Eukaryota / chemistry*
  • Eukaryota / genetics*
  • Evolution, Molecular*
  • Exons
  • Genome, Human
  • Humans
  • Markov Chains
  • Models, Genetic
  • Phylogeny
  • Proteins / chemistry*
  • Proteins / genetics*
  • Proteome / chemistry
  • Proteome / genetics
  • Tandem Repeat Sequences*
  • Time Factors


  • Proteins
  • Proteome