Automatic protein design with all atom force-fields by exact and heuristic optimization

J Mol Biol. 2000 Aug 18;301(3):713-36. doi: 10.1006/jmbi.2000.3984.


A fully automatic procedure for predicting the amino acid sequences compatible with a given target structure is described. It is based on the CHARMM package, and uses an all atom force-field and rotamer libraries to describe and evaluate side-chain types and conformations. Sequences are ranked by a quantity akin to the free energy of folding, which incorporates hydration effects. Exact (Branch and Bound) and heuristic optimisation procedures are used to identifying highly scoring sequences from an astronomical number of possibilities. These sequences include the minimum free energy sequence, as well as all amino acid sequences whose free energy lies within a specified window from the minimum. Several applications of our procedure are illustrated. Prediction of side-chain conformations for a set of ten proteins yields results comparable to those of established side-chain placement programs. Applications to sequence optimisation comprise the re-design of the protein cores of c-Crk SH3 domain, the B1 domain of protein G and Ubiquitin, and of surface residues of the SH3 domain. In all calculations, no restrictions are imposed on the amino acid composition and identical parameter settings are used for core and surface residues. The best scoring sequences for the protein cores are virtually identical to wild-type. They feature no more than one to three mutations in a total of 11-16 variable positions. Tests suggest that this is due to the balance between various contributions in the force-field rather than to overwhelming influence from packing constraints. The effectiveness of our force-field is further supported by the sequence predictions for surface residues of the SH3 domain. More mutations are predicted than in the core, seemingly in order to optimise the network of complementary interactions between polar and charged groups. This appears to be an important energetic requirement in absence of the partner molecules with which the SH3 domain interacts, which were not included in the calculations. Finally, a detailed comparison between the sequences generated by the heuristic and exact optimisation algorithms, commends a note of caution concerning the efficiency of heuristic procedures in exploring sequence space.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computer Simulation
  • Models, Molecular
  • Models, Statistical
  • Mutation
  • Nerve Tissue Proteins / chemistry
  • Protein Conformation
  • Protein Folding
  • Protein Structure, Secondary
  • Protein Structure, Tertiary
  • Proto-Oncogene Proteins / chemistry
  • Proto-Oncogene Proteins c-crk
  • Sequence Analysis / methods*
  • Software
  • Thermodynamics
  • Ubiquitins / chemistry


  • G-substrate
  • Nerve Tissue Proteins
  • Proto-Oncogene Proteins
  • Proto-Oncogene Proteins c-crk
  • Ubiquitins