Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Filters applied. Clear all
. 2005 Jan;88(1):156-71.
doi: 10.1529/biophysj.104.042044. Epub 2004 Sep 17.

Computational Protein Design Is a Challenge for Implicit Solvation Models

Affiliations
Free PMC article

Computational Protein Design Is a Challenge for Implicit Solvation Models

Alfonso Jaramillo et al. Biophys J. .
Free PMC article

Abstract

Increasingly complex schemes for representing solvent effects in an implicit fashion are being used in computational analyses of biological macromolecules. These schemes speed up the calculations by orders of magnitude and are assumed to compromise little on essential features of the solvation phenomenon. In this work we examine this assumption. Five implicit solvation models, a surface area-based empirical model, two models that approximate the generalized Born treatment and a finite difference Poisson-Boltzmann method are challenged in situations differing from those where these models were calibrated. These situations are encountered in automatic protein design procedures, whose job is to select sequences, which stabilize a given protein 3D structure, from a large number of alternatives. To this end we evaluate the energetic cost of burying amino acids in thousands of environments with different solvent exposures belonging, respectively, to decoys built with random sequences and to native protein crystal structures. In addition we perform actual sequence design calculations. Except for the crudest surface area-based procedure, all the tested models tend to favor the burial of polar amino acids in the protein interior over nonpolar ones, a behavior that leads to poor performance in protein design calculations. We show, on the other hand, that three of the examined models are nonetheless capable of discriminating between the native fold and many nonnative alternatives, a test commonly used to validate force fields. It is concluded that protein design is a particularly challenging test for implicit solvation models because it requires accurate estimates of the solvation contribution of individual residues. This contrasts with native recognition, which depends less on solvation and more on other nonbonded contributions.

Figures

FIGURE 1
FIGURE 1
Thermodynamic cycle for calculating the contribution of an amino acid side chain to the folding free energy of a decoy structure. ΔG folding is the contribution of the considered residue (back bone and side chain) to the free energy of folding of the protein (here the decoy). ΔG (BB) folding is the contribution of the backbone of the considered residue to the folding free energy of the decoy. ΔGw-solv(SC) is the free energy cost of introducing the side chain into the water solvent. ΔGd-solv (SC), is the free energy cost of introducing the same side chain into the decoy structure. This cost includes the interaction energy of the side chain with the surrounding residues in the decoy as well as the cost of burying side-chain atoms and surrounding decoy atoms.
FIGURE 2
FIGURE 2
Contributions of individual amino acids to the folding free energy (kcal/mol) of proteinlike decoys, as a function of their solvent accessibility, computed with the EAS solvation model. (a) Energy of the Val side chain versus its SA for 4018 random environments. (b) Energy of the Thr side chain versus its SA for 4174 random environments. (c) Energy of the Lys side chain versus its SA for 4176 random environments. The energy values were computed as indicated in Fig. 1 and described in the text. The SA is defined as the ratio of the side-chain ASA in the decoy over its ASA when it is completely solvated.
FIGURE 3
FIGURE 3
Contributions of individual amino acids to the folding free energy (kcal/mol) of proteinlike decoys, as a function of their solvent accessibility, computed with the EEF1 solvation model. (a) Energy of the Val side chain versus its SA for 4018 random environments. (b) Energy of the Thr side chain versus its SA for 4174 random environments. (c) Energy of the Lys side chain versus its SA for 4176 random environments. The energy values were computed as indicated in Fig. 1 and described in the text. The SA is defined as the ratio of the side-chain ASA in the decoy over its ASA when it is completely solvated.
FIGURE 4
FIGURE 4
Contributions of individual amino acids to the folding free energy (kcal/mol) of proteinlike decoys, as a function of their solvent accessibility, computed with the ACE solvation model. (a) Energy of the Val side chain versus its SA for 4018 random environments. (b) Energy of the Thr side chain versus its SA for 4174 random environments. (c) Energy of the Lys side chain versus its SA for 4176 random environments. The energy values were computed as indicated in Fig. 1 and described in the text. The SA is defined as the ratio of the side-chain ASA in the decoy over its ASA when it is completely solvated.
FIGURE 5
FIGURE 5
Contributions of individual amino acids to the folding free energy (kcal/mol) of proteinlike decoys, as a function of their solvent accessibility, computed using the generalized Born implementation of Lee et al. (2002). (a) Energy of the Val side chain versus its SA for 4018 random environments. (b) Energy of the Thr side chain versus its SA for 4174 random environments. (c) Energy of the Lys side chain versus its SA for 4176 random environments. The energy values were computed as indicated in Fig. 1 and described in the text. The SA is defined as the ratio of the side-chain ASA in the decoy over its ASA when it is completely solvated.
FIGURE 6
FIGURE 6
Contributions of individual amino acids to the folding free energy (kcal/mol) of proteinlike decoys, as a function of their solvent accessibility, computed using the FDPB electrostatics and surface area-dependent hydrophobic term. (a) Energy of the Val side chain versus its SA for 4018 random environments. (b) Energy of the Thr side chain versus its SA for 4174 random environments. (c) Energy of the Lys side chain versus its SA for 4176 random environments. The energy values were computed as indicated in Fig. 1 and described in the text. The SA is defined as the ratio of the side chain ASA in the decoy over its ASA when it is completely solvated.
FIGURE 7
FIGURE 7
Transfer free energies of amino acids from water to the protein interior, computed using the five different implicit solvation models analyzed in this study. The transfer free energy was computed as formula image where Gaccessible(A) is the average free energy of the amino acid A, when its accessibility to solvent is >80%, and Gburied(A), is the average free energy of the same amino acid when it is completely buried (<1% accessibility). Dark circles represent average values, and bars, standard deviations. (a) ΔGtransfer computed with the EAS solvation model. (b) ΔGtransfer computed with the EEF1 solvation model. (c) ΔGtransfer computed with the ACE solvation model. (d) ΔGtransfer computed with the GBMV solvation model. (e) ΔGtransfer computed using FDPB and a surface area-dependent hydrophobic term (see Methods).
FIGURE 8
FIGURE 8
Transfer free energies of amino acids from water to the protein interior, in 362 high-resolution protein crystal structures deposited in the PDB. The energies were computed using three of the implicit solvation models analyzed in this study. The transfer free energies were computed as detailed in the legend of Fig. 7. Shown are the average values (dark circles) and corresponding standard deviations (bars). (a) ΔGtransfer computed with the EAS solvation model. (b) ΔGtransfer computed with the EEF1 solvation model. (c) ΔGtransfer computed with the GBMV solvation model.
FIGURE 9
FIGURE 9
Profiles of the designed sequences computed by DESIGNER for the homeodomain protein (RSCB-PDB code 1enh), using the EAS model (a) and the EEF1 model (b), respectively. The first row lists the residue number. The second and third rows list the wild-type sequence and the consensus-designed sequence (the most probable amino acid at each position along the polypeptide), respectively, using the one-letter amino acid code. Subsequent rows list the amino acids that occur with a frequency >10%. Buried positions (those with a solvent-accessible surface area of <25% in the native structure) are colored red in the wild-type sequence. Designs with the EAS model produced a total of 104 sequences; those with the EEF1 model produced 186 sequences.
FIGURE 10
FIGURE 10
Arrangements of amino acid side chains in the core of the minimum energy-designed protein and the wild-type protein for the homeodomain protein, using the EAS model (a) and the EEF1 model (b), respectively. The side chains in the wild-type structures are colored yellow, those of the designed structures are colored using the CPK convention. It is clearly visible that the sequence and structures designed using the EAS model are more nativelike than the one designed using the EEF1 model. In the latter structure, several buried hydrophobic residues are replaced by polar ones.
FIGURE 11
FIGURE 11
Arrangements of amino acid side chains on the surface of minimum energy-designed and wild-type homeodomain proteins. The minimum energy-designed proteins were computed using the EEF1 and EAS models, respectively. (a) Minimum energy-designed protein using the EEF1 solvation model. (b) Minimum energy-designed protein using the EAS solvation model. (c) Wild-type homeodomain crystal structure (PDB RSCB-code 1enh).
FIGURE 12
FIGURE 12
Distinguishing between nativelike and misfolded structures using various implicit solvation models. (a) Distributions of protein energies computed using the EAS solvation model. (b) Distributions of protein energies computed using the EEF1 solvation model. (c) Distributions of protein energies computed using the ACE solvation model. Each of the plots displays four different distributions. The one displayed with red bars represents the energies computed when the natural sequences of the SH3 domain are mounted onto the backbone of the C-crk SH3 domain (PDB-RCSB code 1cka). The green bars represent energies of the natural SH3 domain sequences mounted onto the engrailed homeodomain backbone (PDB-RSCB code 1enh). The blue and yellow bars represent energies of random sequences (see text for details), when those are mounted onto the SH3 (1cka) backbone and homeodomain (1enh) backbone, respectively.

Similar articles

See all similar articles

Cited by 17 articles

See all "Cited by" articles

Publication types

MeSH terms

LinkOut - more resources

Feedback