Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 26;12(5):e0178272.
doi: 10.1371/journal.pone.0178272. eCollection 2017.

Sequence statistics of tertiary structural motifs reflect protein stability

Affiliations
Free PMC article

Sequence statistics of tertiary structural motifs reflect protein stability

Fan Zheng et al. PLoS One. .
Free PMC article

Abstract

The Protein Data Bank (PDB) has been a key resource for learning general rules of sequence-structure relationships in proteins. Quantitative insights have been gained by defining geometric descriptors of structure (e.g., distances, dihedral angles, solvent exposure, etc.) and observing their distributions and sequence preferences. Here we argue that as the PDB continues to grow, it may become unnecessary to reduce structure into a set of elementary descriptors. Instead, it could be possible to deduce quantitative sequence-structure relationships in the context of precisely-defined complex structural motifs by mining the PDB for closely matching backbone geometries. To validate this idea, we turned to the the task of predicting changes in protein stability upon amino-acid substitution-a difficult problem of broad significance. We defined non-contiguous tertiary motifs (TERMs) around a protein site of interest and extracted sequence preferences from ensembles of closely-matching substructures in the PDB to predict mutational stability changes at the site, ΔΔGm. We demonstrate that these ensemble statistics predict ΔΔGm on par with state-of-the-art statistical and machine-learning methods on large thermodynamic datasets, and outperform these, along with a leading structure-based modeling approach, when tested in the context of unbiased diverse mutations. Further, we show that the performance of the TERM-based method is directly related to the amount of available relevant structural data, automatically improving with the growing PDB. This enables a means of estimating prediction accuracy. Our results clearly demonstrate that: 1) statistics of non-contiguous structural motifs in the PDB encode fundamental sequence-structure relationships related to protein thermodynamic stability, and 2) the PDB is now large enough that such statistics are already useful in practice, with their accuracy expected to continue increasing as the database grows. These observations suggest new ways of using structural data towards addressing problems of computational structural biology.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. TERM-based ΔΔGm prediction.
Procedural flow is indicated with arrows, starting from the top left. Given a structure of the protein of interest, a TERM is defiend around the mutated position (green sphere) to include any potentially contacting positions (yellow spheres) and flanking backbone segments (white sticks and ribbon). The TERM is next decomposed into sub-TERMs—i.e., substructures containing a subset of the contacting positions and flanking segments. Structural ensembles for each sub-TERMs are generated by searching the PDB for close structural matches using MASTER [47]. Finally, sequences from matching ensemble of all sub-TERMs (and the original TERM, data permitting) are used to extract positional and pair amino-acid preferences to predict ΔΔGm.
Fig 2
Fig 2. The performance of TERM-ΔΔG2 on S2648.
Predicted and measured ΔΔGm values are plotted on the X- and Y-axes, respectively. Color represents point cloud density. The least-squares regression line is shown with dashes.
Fig 3
Fig 3. The role of multi-contact ensembles in ΔΔGm prediction, on the example of 1RIS_AI8A.
(A) and (B) correspond to models TERM-ΔΔG1 and TERM-ΔΔG2, respectively. The mutated position is shown in yellow and all its contacting positions (9 in total) are shown in cyan. Values of estimated sEP and pEPs are shown in red and blue, respectively. The experimental ΔΔGm for the mutation is 3.56 kcal/mol (destabilizing).
Fig 4
Fig 4. The performance of different methods on the S699 set.
Data in each pannel are shown in the same manner as in Fig 2, with panel title indicating the prediction method used.
Fig 5
Fig 5. Abundance of structural information is critical to performance of prediction.
(A) The distribution of ubiquity for mutations in the S2648 set. Quartile boundaries are labeled as dashed lines. (B) Performance of prediction on the four subgroups, from low ubiquity (group 1) to high ubiquity (group 4). The same representation is used here as in Fig 2.
Fig 6
Fig 6. Prediction performance increases with the size of the structural database.
The model represented by each curve is indicated in the legend. For each level of subsampling, three samples were generated, with error bars showing the standard deviations among the three trials for each experiment. The functional form used in fitting is shown in the upper-left corner. The numbers on the right side of each curve indicate the corresponding best-fit plateau values (i.e., parameter a).

Similar articles

Cited by

References

    1. Christ CD, Mark AE, Van Gunsteren WF. Basic ingredients of free energy calculations: a review. Journal of computational chemistry. 2010;31(8):1569–1582. 10.1002/jcc.21450 - DOI - PubMed
    1. Woo HJ, Roux B. Calculation of absolute protein-ligand binding free energy from computer simulations. Proc Natl Acad Sci U S A. 2005. May;102(19):6825–30. 10.1073/pnas.0409005102 - DOI - PMC - PubMed
    1. Grigoryan G. Absolute free energies of biomolecules from unperturbed ensembles. Journal of computational chemistry. 2013;34(31):2726–2741. 10.1002/jcc.23448 - DOI - PubMed
    1. Skolnick J. In quest of an empirical potential for protein structure prediction. Current opinion in structural biology. 2006;16(2):166–171. 10.1016/j.sbi.2006.02.004 - DOI - PubMed
    1. Li Z, Yang Y, Zhan J, Dai L, Zhou Y. Energy Functions in De Novo Protein Design: Current Challenges and Future Prospects. Annual Review of Biophysics. 2013. May;42(1):315–335. 10.1146/annurev-biophys-083012-130315 - DOI - PMC - PubMed