Identifying and reducing error in cluster-expansion approximations of protein energies

Seungsoo Hahn; Orr Ashenberg; Gevorg Grigoryan; Amy E Keating

doi:10.1002/jcc.21585

Identifying and reducing error in cluster-expansion approximations of protein energies

J Comput Chem. 2010 Dec;31(16):2900-14. doi: 10.1002/jcc.21585.

Authors

Seungsoo Hahn¹, Orr Ashenberg, Gevorg Grigoryan, Amy E Keating

Affiliation

¹ Department of Biology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA.

PMID: 20602445
DOI: 10.1002/jcc.21585

Abstract

Protein design involves searching a vast space for sequences that are compatible with a defined structure. This can pose significant computational challenges. Cluster expansion is a technique that can accelerate the evaluation of protein energies by generating a simple functional relationship between sequence and energy. The method consists of several steps. First, for a given protein structure, a training set of sequences with known energies is generated. Next, this training set is used to expand energy as a function of clusters consisting of single residues, residue pairs, and higher order terms, if required. The accuracy of the sequence-based expansion is monitored and improved using cross-validation testing and iterative inclusion of additional clusters. As a trade-off for evaluation speed, the cluster-expansion approximation causes prediction errors, which can be reduced by including more training sequences, including higher order terms in the expansion, and/or reducing the sequence space described by the cluster expansion. This article analyzes the sources of error and introduces a method whereby accuracy can be improved by judiciously reducing the described sequence space. The method is applied to describe the sequence-stability relationship for several protein structures: coiled-coil dimers and trimers, a PDZ domain, and T4 lysozyme as examples with computationally derived energies, and SH3 domains in amphiphysin-1 and endophilin-1 as examples where the expanded pseudo-energies are obtained from experiments. Our open-source software package Cluster Expansion Version 1.0 allows users to expand their own energy function of interest and thereby apply cluster expansion to custom problems in protein design.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Bacteriophage T4 / enzymology
Models, Chemical*
Models, Molecular
Muramidase / chemistry
Protein Structure, Tertiary
Proteins / chemistry*

Substances

Proteins
Muramidase