Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks

Proteins. 2000 Nov 15;41(3):271-87. doi: 10.1002/1097-0134(20001115)41:3<271::aid-prot10>3.0.co;2-z.

Abstract

By using an unsupervised cluster analyzer, we have identified a local structural alphabet composed of 16 folding patterns of five consecutive C(alpha) ("protein blocks"). The dependence that exists between successive blocks is explicitly taken into account. A Bayesian approach based on the relation protein block-amino acid propensity is used for prediction and leads to a success rate close to 35%. Sharing sequence windows associated with certain blocks into "sequence families" improves the prediction accuracy by 6%. This prediction accuracy exceeds 75% when keeping the first four predicted protein blocks at each site of the protein. In addition, two different strategies are proposed: the first one defines the number of protein blocks in each site needed for respecting a user-fixed prediction accuracy, and alternatively, the second one defines the different protein sites to be predicted with a user-fixed number of blocks and a chosen accuracy. This last strategy applied to the ubiquitin conjugating enzyme (alpha/beta protein) shows that 91% of the sites may be predicted with a prediction accuracy larger than 77% considering only three blocks per site. The prediction strategies proposed improve our knowledge about sequence-structure dependence and should be very useful in ab initio protein modelling.

MeSH terms

  • Artificial Intelligence
  • Bayes Theorem*
  • Cluster Analysis
  • Computer Simulation*
  • Databases, Factual
  • Forecasting
  • Ligases
  • Models, Molecular*
  • Neural Networks, Computer
  • Peptide Fragments / chemistry*
  • Peptide Fragments / classification
  • Protein Conformation*
  • Protein Structure, Secondary
  • Ubiquitins / metabolism

Substances

  • Peptide Fragments
  • Ubiquitins
  • Ligases