Fragment-based local statistical potentials derived by combining an alphabet of protein local structures with secondary structures and solvent accessibilities

Proteins. 2009 Mar;74(4):820-36. doi: 10.1002/prot.22191.


General and transferable statistical potentials to quantify the compatibility between local structures and local sequences of peptide fragments in proteins were derived. In the derivation, structure clusters of fragments are obtained by clustering five-residue fragments in native proteins based on their conformations represented by a local structure alphabet (de Brevern et al., Proteins 2000;41:271-287), secondary structure states, and solvent accessibilities. On the basis of the native sequences of the structurally clustered fragments, the probabilities of different amino acid sequences were estimated for each structure cluster. From the sequence probabilities, statistical energies as a function of sequence for a given structure were directly derived. The same sequence probabilities were employed in a database-matching approach to derive statistical energies as a function of local structure for a given sequence. Compared with prior models of local statistical potentials, we provided an integrated approach in which local conformations and local environments are treated jointly, structures are treated in units of fragments instead of individual residues so that coupling between the conformations of adjacent residues is included, and strong interdependences between the conformations of overlapping or neighboring fragment units are also considered. In tests including fragment threading, pseudosequence design, and local structure predictions, the potentials performed at least comparably and, in most cases, better than a number of existing models applicable to the same contexts indicating the advantages of such an integrated approach for deriving local potentials and suggesting applicability of the statistical potentials derived here in sequence designs and structure predictions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • Databases, Protein
  • Markov Chains
  • Protein Structure, Secondary
  • Proteins / chemistry*
  • Solvents / chemistry


  • Proteins
  • Solvents