Evolution of function in protein superfamilies, from a structural perspective

J Mol Biol. 2001 Apr 6;307(4):1113-43. doi: 10.1006/jmbi.2001.4513.


The recent growth in protein databases has revealed the functional diversity of many protein superfamilies. We have assessed the functional variation of homologous enzyme superfamilies containing two or more enzymes, as defined by the CATH protein structure classification, by way of the Enzyme Commission (EC) scheme. Combining sequence and structure information to identify relatives, the majority of superfamilies display variation in enzyme function, with 25 % of superfamilies in the PDB having members of different enzyme types. We determined the extent of functional similarity at different levels of sequence identity for 486,000 homologous pairs (enzyme/enzyme and enzyme/non-enzyme), with structural and sequence relatives included. For single and multi-domain proteins, variation in EC number is rare above 40 % sequence identity, and above 30 %, the first three digits may be predicted with an accuracy of at least 90 %. For more distantly related proteins sharing less than 30 % sequence identity, functional variation is significant, and below this threshold, structural data are essential for understanding the molecular basis of observed functional differences. To explore the mechanisms for generating functional diversity during evolution, we have studied in detail 31 diverse structural enzyme superfamilies for which structural data are available. A large number of variations and peculiarities are observed, at the atomic level through to gross structural rearrangements. Almost all superfamilies exhibit functional diversity generated by local sequence variation and domain shuffling. Commonly, substrate specificity is diverse across a superfamily, whilst the reaction chemistry is maintained. In many superfamilies, the position of catalytic residues may vary despite playing equivalent functional roles in related proteins. The implications of functional diversity within supefamilies for the structural genomics projects are discussed. More detailed information on these superfamilies is available at http://www.biochem.ucl.ac.uk/bsm/FAM-EC/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binding Sites
  • Catalysis
  • Conserved Sequence
  • Databases as Topic
  • Enzymes / chemistry
  • Enzymes / classification
  • Enzymes / metabolism
  • Evolution, Molecular*
  • Metals / metabolism
  • Models, Molecular
  • Multigene Family
  • Mutation
  • Protein Binding
  • Protein Structure, Quaternary
  • Protein Structure, Tertiary
  • Protein Subunits
  • Proteins / chemistry*
  • Proteins / classification
  • Proteins / metabolism*
  • Repetitive Sequences, Amino Acid
  • Sequence Homology, Amino Acid
  • Structure-Activity Relationship
  • Substrate Specificity


  • Enzymes
  • Metals
  • Protein Subunits
  • Proteins