The structure of the protein universe and genome evolution

Nature. 2002 Nov 14;420(6912):218-23. doi: 10.1038/nature01256.


Despite the practically unlimited number of possible protein sequences, the number of basic shapes in which proteins fold seems not only to be finite, but also to be relatively small, with probably no more than 10,000 folds in existence. Moreover, the distribution of proteins among these folds is highly non-homogeneous -- some folds and superfamilies are extremely abundant, but most are rare. Protein folds and families encoded in diverse genomes show similar size distributions with notable mathematical properties, which also extend to the number of connections between domains in multidomain proteins. All these distributions follow asymptotic power laws, such as have been identified in a wide variety of biological and physical systems, and which are typically associated with scale-free networks. These findings suggest that genome evolution is driven by extremely general mechanisms based on the preferential attachment principle.

Publication types

  • Review

MeSH terms

  • Databases, Protein
  • Evolution, Molecular*
  • Genome*
  • Models, Genetic
  • Protein Folding*
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Proteins / classification
  • Proteins / genetics
  • Proteome
  • Proteomics


  • Proteins
  • Proteome