An automatic method involving cluster analysis of secondary structures for the identification of domains in proteins

Protein Sci. 1995 Mar;4(3):506-20. doi: 10.1002/pro.5560040317.


With a growing number of structures available in the Brookhaven Protein Data Bank, automatic methods for domain identification are required for the construction of databases. Domains are considered to be clusters of secondary structure elements. Thus, helices and strands are first clustered using intersecondary structural distances between C alpha positions, and dendrograms based on this distance measure are used to identify domains. Individual domains are recognized by a disjoint factor, which enables the automatic identification and classification into disjoint, interacting, and conjoint domains. Application to a database of 83 protein families and 18 unique structures shows that the approach provides an effective delineation of boundaries and identifies those proteins that can be considered as a single domain. A quantitative estimate of the interaction between domains has been proposed. The database of protein domains is a useful tool for understanding protein folding, for recognizing protein folds, and for understanding structure-activity relationships.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Aspartic Acid Endopeptidases / chemistry
  • Calmodulin / chemistry
  • Cluster Analysis*
  • Databases, Factual
  • Hydroxymethylbilane Synthase / chemistry
  • Models, Chemical
  • Models, Molecular
  • Papain / chemistry
  • Porins / chemistry
  • Protein Structure, Secondary*
  • Protein Structure, Tertiary*
  • Sequence Alignment


  • Calmodulin
  • Porins
  • Hydroxymethylbilane Synthase
  • Papain
  • Aspartic Acid Endopeptidases
  • Endothia aspartic proteinase