Sequence space and the ongoing expansion of the protein universe

Nature. 2010 Jun 17;465(7300):922-6. doi: 10.1038/nature09105. Epub 2010 May 19.


The need to maintain the structural and functional integrity of an evolving protein severely restricts the repertoire of acceptable amino-acid substitutions. However, it is not known whether these restrictions impose a global limit on how far homologous protein sequences can diverge from each other. Here we explore the limits of protein evolution using sequence divergence data. We formulate a computational approach to study the rate of divergence of distant protein sequences and measure this rate for ancient proteins, those that were present in the last universal common ancestor. We show that ancient proteins are still diverging from each other, indicating an ongoing expansion of the protein sequence universe. The slow rate of this divergence is imposed by the sparseness of functional protein sequences in sequence space and the ruggedness of the protein fitness landscape: approximately 98 per cent of sites cannot accept an amino-acid substitution at any given moment but a vast majority of all sites may eventually be permitted to evolve when other, compensatory, changes occur. Thus, approximately 3.5 x 10(9) yr has not been enough to reach the limit of divergent evolution of proteins, and for most proteins the limit of sequence similarity imposed by common function may not exceed that of random sequences.

MeSH terms

  • Amino Acid Sequence
  • Amino Acid Substitution
  • Amino Acids / chemistry
  • Evolution, Molecular*
  • Genetic Variation*
  • Molecular Sequence Data
  • Mutation
  • Prokaryotic Cells
  • Protein Structure, Secondary
  • Proteins / chemistry*
  • Selection, Genetic / genetics
  • Sequence Analysis, Protein
  • Sequence Homology, Amino Acid


  • Amino Acids
  • Proteins