Revealing divergent evolution, identifying circular permutations and detecting active-sites by protein structure comparison

BMC Struct Biol. 2006 Sep 2:6:18. doi: 10.1186/1472-6807-6-18.

Abstract

Background: Protein structure comparison is one of the most important problems in computational biology and plays a key role in protein structure prediction, fold family classification, motif finding, phylogenetic tree reconstruction and protein docking.

Results: We propose a novel method to compare the protein structures in an accurate and efficient manner. Such a method can be used to not only reveal divergent evolution, but also identify circular permutations and further detect active-sites. Specifically, we define the structure alignment as a multi-objective optimization problem, i.e., maximizing the number of aligned atoms and minimizing their root mean square distance. By controlling a single distance-related parameter, theoretically we can obtain a variety of optimal alignments corresponding to different optimal matching patterns, i.e., from a large matching portion to a small matching portion. The number of variables in our algorithm increases with the number of atoms of protein pairs in almost a linear manner. In addition to solid theoretical background, numerical experiments demonstrated significant improvement of our approach over the existing methods in terms of quality and efficiency. In particular, we show that divergent evolution, circular permutations and active-sites (or structural motifs) can be identified by our method. The software SAMO is available upon request from the authors, or from http://zhangroup.aporc.org/bioinfo/samo/ and http://intelligent.eic.osaka-sandai.ac.jp/chenen/samo.htm.

Conclusion: A novel formulation is proposed to accurately align protein structures in the framework of multi-objective optimization, based on a sequence order-independent strategy. A fast and accurate algorithm based on the bipartite matching algorithm is developed by exploiting the special features. Convergence of computation is shown in experiments and is also theoretically proven.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Binding Sites*
  • Chromosome Aberrations
  • Computational Biology
  • Cysteine Endopeptidases / chemistry
  • Databases, Protein
  • Evolution, Molecular*
  • Gene Duplication*
  • Models, Molecular
  • Models, Theoretical
  • Molecular Sequence Data
  • Protein Conformation*
  • Sequence Homology, Amino Acid*
  • Structural Homology, Protein
  • Trypsin / chemistry

Substances

  • Trypsin
  • Cysteine Endopeptidases
  • actinidain