Consistency analysis of similarity between multiple alignments: prediction of protein function and fold structure from analysis of local sequence motifs

J Mol Biol. 2001 Mar 30;307(3):939-49. doi: 10.1006/jmbi.2001.4466.

Abstract

A new method to analyze the similarity between multiply aligned protein motifs (blocks) was developed. It identifies sets of consistently aligned blocks. These are found to be protein regions of similar function and structure that appear in different contexts. For example, the Rossmann fold ligand-binding region is found similar to TIM barrel and methylase regions, various protein families are predicted to have a TIM-barrel fold and the structural relation between the ClpP protease and crotonase folds is identified from their sequence. Besides identifying local structure features, sequence similarity across short sequence-regions (less than 20 amino acid regions) also predicts structure similarity of whole domains (folds) a few hundred amino acid residues long. Most of these relations could not be identified by other advanced sequence-to-sequence or sequence-to-multiple alignments comparisons. We describe the method (termed CYRCA), present examples of our findings, and discuss their implications.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Adenosine Triphosphatases / chemistry
  • Adenosine Triphosphatases / metabolism
  • Algorithms
  • Amino Acid Motifs
  • Automation
  • Binding Sites
  • Computational Biology / methods*
  • Databases as Topic
  • Endopeptidase Clp
  • Enoyl-CoA Hydratase / chemistry
  • Enoyl-CoA Hydratase / metabolism
  • Internet
  • Ligands
  • Models, Molecular
  • Protein Binding
  • Protein Folding*
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Proteins / metabolism*
  • Sequence Alignment*
  • Serine Endopeptidases / chemistry
  • Serine Endopeptidases / metabolism
  • Software
  • Structure-Activity Relationship

Substances

  • Ligands
  • Proteins
  • Serine Endopeptidases
  • Endopeptidase Clp
  • Adenosine Triphosphatases
  • Enoyl-CoA Hydratase