Enhanced prediction of conformational flexibility and phosphorylation in proteins

Adv Exp Med Biol. 2010:680:307-19. doi: 10.1007/978-1-4419-5913-3_35.

Abstract

Many sequence-based predictors of structural and functional properties of proteins have been developed in the past. In this study, we developed new methods for predicting measures of conformational flexibility in proteins, including X-ray structure-derived temperature (B-) factors and the variance within NMR structural ensemble, as effectively measured by the solvent accessibility standard deviations (SASDs). We further tested whether these predicted measures of conformational flexibility in crystal lattices and solution, respectively, can be used to improve the prediction of phosphorylation in proteins. The latter is an example of a common post-translational modification that modulates protein function, e.g., by affecting interactions and conformational flexibility of phosphorylated sites. Using robust epsilon-insensitive support vector regression (ε-SVR) models, we assessed two specific representations of protein sequences: one based on the position-specific scoring matrices (PSSMs) derived from multiple sequence alignments, and an augmented representation that incorporates real-valued solvent accessibility and secondary structure predictions (RSA/SS) as additional measures of local structural propensities. We showed that a combination of PSSMs and real-valued SS/RSA predictions provides systematic improvements in the accuracy of both B-factors and SASD prediction. These intermediate predictions were subsequently combined into an enhanced predictor of phosphorylation that was shown to significantly outperform methods based on PSSM alone. We would like to stress that to the best of our knowledge, this is the first example of using predicted from sequence NMR structure-based measures of conformational flexibility in solution for the prediction of other properties of proteins. Phosphorylation prediction methods typically employ a two-class classification approach with the limitation that the set of negative examples used for training may include some sites that are simply unknown to be phosphorylated. While one-class classification techniques have been considered in the past as a solution to this problem, their performance has not been systematically compared to two-class techniques. In this study, we developed and compared one- and two-class support vector machine (SVM)-based predictors for several commonly used sets of attributes. [These predictors are being made available at http://sable.cchmc.org/].

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence
  • Computational Biology
  • Crystallography, X-Ray
  • Databases, Protein
  • Nuclear Magnetic Resonance, Biomolecular
  • Phosphorylation
  • Protein Conformation*
  • Proteins / chemistry*
  • Sequence Alignment

Substances

  • Proteins