Amino acid repeats and the structure and evolution of proteins

Genome Dyn. 2007;3:119-130. doi: 10.1159/000107607.


Many proteins have repeats or runs of single amino acids. The pathogenicity of some repeat expansions has fueled proteomic, genomic and structural explorations of homopolymeric runs not only in human but in a wide variety of other organisms. Other types of amino acid repetitive structures exhibit more complex patterns than homopeptides. Irrespective of their precise organization, repetitive sequences are defined as low complexity or simple sequences, as one or a few residues are particularly abundant. Prokaryotes show a relatively low frequency of simple sequences compared to eukaryotes. In the latter the percentage of proteins containing homopolymeric runs varies greatly from one group to another. For instance, within vertebrates, amino acid repeat frequency is much higher in mammals than in amphibians, birds or fishes. For some repeats, this is correlated with the GC-richness of the regions containing the corresponding genes. Homopeptides tend to occur in disordered regions of transcription factors or developmental proteins. They can trigger the formation of protein aggregates, particularly in 'disease' proteins. Simple sequences seem to evolve more rapidly than the rest of the protein/gene and may have a functional impact. Therefore, they are good candidates to promote rapid evolutionary changes. All these diverse facets of homopolymeric runs are explored in this review.

Publication types

  • Review

MeSH terms

  • Animals
  • Base Composition
  • Evolution, Molecular*
  • Humans
  • Open Reading Frames / genetics
  • Peptides / chemistry
  • Proteins / chemistry*
  • Proteins / genetics*
  • Repetitive Sequences, Amino Acid*


  • Peptides
  • Proteins