Ultra-fast evaluation of protein energies directly from sequence

PLoS Comput Biol. 2006 Jun 16;2(6):e63. doi: 10.1371/journal.pcbi.0020063. Epub 2006 Jun 16.


The structure, function, stability, and many other properties of a protein in a fixed environment are fully specified by its sequence, but in a manner that is difficult to discern. We present a general approach for rapidly mapping sequences directly to their energies on a pre-specified rigid backbone, an important sub-problem in computational protein design and in some methods for protein structure prediction. The cluster expansion (CE) method that we employ can, in principle, be extended to model any computable or measurable protein property directly as a function of sequence. Here we show how CE can be applied to the problem of computational protein design, and use it to derive excellent approximations of physical potentials. The approach provides several attractive advantages. First, following a one-time derivation of a CE expansion, the amount of time necessary to evaluate the energy of a sequence adopting a specified backbone conformation is reduced by a factor of 10(7) compared to standard full-atom methods for the same task. Second, the agreement between two full-atom methods that we tested and their CE sequence-based expressions is very high (root mean square deviation 1.1-4.7 kcal/mol, R2 = 0.7-1.0). Third, the functional form of the CE energy expression is such that individual terms of the expansion have clear physical interpretations. We derived expressions for the energies of three classic protein design targets-a coiled coil, a zinc finger, and a WW domain-as functions of sequence, and examined the most significant terms. Single-residue and residue-pair interactions are sufficient to accurately capture the energetics of the dimeric coiled coil, whereas higher-order contributions are important for the two more globular folds. For the task of designing novel zinc-finger sequences, a CE-derived energy function provides significantly better solutions than a standard design protocol, in comparable computation time. Given these advantages, CE is likely to find many uses in computational structural modeling.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computer Simulation
  • Computer Systems
  • Energy Transfer*
  • Models, Chemical*
  • Models, Molecular*
  • Protein Conformation
  • Proteins / chemistry*
  • Proteins / classification
  • Sequence Analysis, Protein / methods*
  • Structure-Activity Relationship


  • Proteins