Role of low-complexity sequences in the formation of novel protein coding sequences

Mol Biol Evol. 2012 Mar;29(3):883-6. doi: 10.1093/molbev/msr263. Epub 2011 Oct 31.


Low-complexity sequences are extremely abundant in eukaryotic proteins for reasons that remain unclear. One hypothesis is that they contribute to the formation of novel coding sequences, facilitating the generation of novel protein functions. Here, we test this hypothesis by examining the content of low-complexity sequences in proteins of different age. We show that recently emerged proteins contain more low-complexity sequences than older proteins and that these sequences often form functional domains. These data are consistent with the idea that low-complexity sequences may play a key role in the emergence of novel genes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Motifs / genetics*
  • Amino Acid Sequence
  • Base Composition
  • Computational Biology
  • Evolution, Molecular*
  • Humans
  • Models, Genetic*
  • Phylogeny
  • Proteins / genetics*
  • Species Specificity


  • Proteins