Fast structure alignment for protein databank searching

Proteins. 1992 Oct;14(2):139-67. doi: 10.1002/prot.340140203.

Abstract

A fast method is described for searching and analyzing the protein structure databank. It uses secondary structure followed by residue matching to compare protein structures and is developed from a previous structural alignment method based on dynamic programming. Linear representations of secondary structures are derived and their features compared to identify equivalent elements in two proteins. The secondary structure alignment then constrains the residue alignment, which compares only residues within aligned secondary structures and with similar buried areas and torsional angles. The initial secondary structure alignment improves accuracy and provides a means of filtering out unrelated proteins before the slower residue alignment stage. It is possible to search or sort the protein structure databank very quickly using just secondary structure comparisons. A search through 720 structures with a probe protein of 10 secondary structures required 1.7 CPU hours on a Sun 4/280. Alternatively, combined secondary structure and residue alignments, with a cutoff on the secondary structure score to remove pairs of unrelated proteins from further analysis, took 10.1 CPU hours. The method was applied in searches on different classes of proteins and to cluster a subset of the databank into structurally related groups. Relationships were consistent with known families of protein structure.

Publication types

  • Comparative Study

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Databases, Factual*
  • Humans
  • Information Storage and Retrieval*
  • Molecular Sequence Data
  • Protein Structure, Secondary*
  • Reference Standards
  • Sequence Alignment*
  • Sequence Homology, Amino Acid
  • Software
  • Time Factors