Structure-based identification and clustering of protein families and superfamilies

J Comput Aided Mol Des. 1994 Feb;8(1):5-27. doi: 10.1007/BF00124346.


We describe an approach to protein structure comparison designed to detect distantly related proteins of similar fold, where the procedure must be sufficiently flexible to take into account the elasticity of protein folds without losing specificity. Protein structures are represented as a series of secondary structure elements, where for each element a local environment describes its relations with the elements that surround it. Secondary structures are then aligned by comparing their features and local environments. The procedure is illustrated with searches of a database of 468 protein structures in order to identify proteins of similar topology to porcine pepsin, porphobilinogen deaminase and serum amyloid P-component. In all cases the searches correctly identify protein structures of similar fold as the search proteins. Multiple cross-comparisons of protein structures allow the clustering of proteins of similar fold. This is exemplified with a clustering of alpha/beta- and beta-class protein structures. We discuss applications of the comparison and clustering of three-dimensional protein structures to comparative modelling and structure-based protein design.

Publication types

  • Comparative Study

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Animals
  • Databases, Factual
  • Drug Design
  • Endopeptidases / chemistry
  • Hydrogen Bonding
  • Hydroxymethylbilane Synthase / chemistry
  • Models, Molecular*
  • Molecular Sequence Data
  • Pepsin A / chemistry
  • Protein Conformation
  • Protein Folding
  • Protein Structure, Secondary*
  • Proteins / chemistry*
  • Proteins / classification*
  • Retroviridae Proteins / chemistry
  • Serum Amyloid P-Component / chemistry
  • Software
  • Swine


  • Proteins
  • Retroviridae Proteins
  • Serum Amyloid P-Component
  • Hydroxymethylbilane Synthase
  • Endopeptidases
  • Pepsin A