Identification of homology in protein structure classification

Nat Struct Biol. 2001 Nov;8(11):953-7. doi: 10.1038/nsb1101-953.

Abstract

Structural biology and structural genomics are expected to produce many three-dimensional protein structures in the near future. Each new structure raises questions about its function and evolution. Correct functional and evolutionary classification of a new structure is difficult for distantly related proteins and error-prone using simple statistical scores based on sequence or structure similarity. Here we present an accurate numerical method for the identification of evolutionary relationships (homology). The method is based on the principle that natural selection maintains structural and functional continuity within a diverging protein family. The problem of different rates of structural divergence between different families is solved by first using structural similarities to produce a global map of folds in protein space and then further subdividing fold neighborhoods into superfamilies based on functional similarities. In a validation test against a classification by human experts (SCOP), 77% of homologous pairs were identified with 92% reliability. The method is fully automated, allowing fast, self-consistent and complete classification of large numbers of protein structures. In particular, the discrimination between analogy and homology of close structural neighbors will lead to functional predictions while avoiding overprediction.

MeSH terms

  • Automation / methods
  • Bacterial Proteins / chemistry
  • Bacterial Proteins / classification
  • Bacterial Proteins / metabolism
  • Calibration
  • Computational Biology / methods*
  • Databases, Genetic
  • Evolution, Molecular
  • Fungal Proteins / chemistry
  • Fungal Proteins / classification
  • Fungal Proteins / metabolism
  • Internet
  • Neural Networks, Computer
  • Protein Conformation
  • Protein Folding*
  • Proteins / chemistry*
  • Proteins / classification*
  • Proteins / metabolism
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment
  • Sequence Homology, Amino Acid*
  • Structure-Activity Relationship

Substances

  • Bacterial Proteins
  • Fungal Proteins
  • Proteins