Sequence statistics of tertiary structural motifs reflect protein stability
- PMID: 28552940
- PMCID: PMC5446159
- DOI: 10.1371/journal.pone.0178272
Sequence statistics of tertiary structural motifs reflect protein stability
Abstract
The Protein Data Bank (PDB) has been a key resource for learning general rules of sequence-structure relationships in proteins. Quantitative insights have been gained by defining geometric descriptors of structure (e.g., distances, dihedral angles, solvent exposure, etc.) and observing their distributions and sequence preferences. Here we argue that as the PDB continues to grow, it may become unnecessary to reduce structure into a set of elementary descriptors. Instead, it could be possible to deduce quantitative sequence-structure relationships in the context of precisely-defined complex structural motifs by mining the PDB for closely matching backbone geometries. To validate this idea, we turned to the the task of predicting changes in protein stability upon amino-acid substitution-a difficult problem of broad significance. We defined non-contiguous tertiary motifs (TERMs) around a protein site of interest and extracted sequence preferences from ensembles of closely-matching substructures in the PDB to predict mutational stability changes at the site, ΔΔGm. We demonstrate that these ensemble statistics predict ΔΔGm on par with state-of-the-art statistical and machine-learning methods on large thermodynamic datasets, and outperform these, along with a leading structure-based modeling approach, when tested in the context of unbiased diverse mutations. Further, we show that the performance of the TERM-based method is directly related to the amount of available relevant structural data, automatically improving with the growing PDB. This enables a means of estimating prediction accuracy. Our results clearly demonstrate that: 1) statistics of non-contiguous structural motifs in the PDB encode fundamental sequence-structure relationships related to protein thermodynamic stability, and 2) the PDB is now large enough that such statistics are already useful in practice, with their accuracy expected to continue increasing as the database grows. These observations suggest new ways of using structural data towards addressing problems of computational structural biology.
Conflict of interest statement
Figures
Similar articles
-
Tertiary alphabet for the observable protein structural universe.Proc Natl Acad Sci U S A. 2016 Nov 22;113(47):E7438-E7447. doi: 10.1073/pnas.1607178113. Epub 2016 Nov 3. Proc Natl Acad Sci U S A. 2016. PMID: 27810958 Free PMC article.
-
Tertiary structural propensities reveal fundamental sequence/structure relationships.Structure. 2015 May 5;23(5):961-971. doi: 10.1016/j.str.2015.03.015. Epub 2015 Apr 23. Structure. 2015. PMID: 25914055
-
Protein structural motifs in prediction and design.Curr Opin Struct Biol. 2017 Jun;44:161-167. doi: 10.1016/j.sbi.2017.03.012. Epub 2017 Apr 28. Curr Opin Struct Biol. 2017. PMID: 28460216 Free PMC article. Review.
-
Improving the accuracy of protein stability predictions with multistate design using a variety of backbone ensembles.Proteins. 2014 May;82(5):771-84. doi: 10.1002/prot.24457. Epub 2013 Nov 22. Proteins. 2014. PMID: 24174277
-
Machine learning methods for protein structure prediction.IEEE Rev Biomed Eng. 2008;1:41-9. doi: 10.1109/RBME.2008.2008239. IEEE Rev Biomed Eng. 2008. PMID: 22274898 Review.
Cited by
-
Mega-scale experimental analysis of protein folding stability in biology and design.Nature. 2023 Aug;620(7973):434-444. doi: 10.1038/s41586-023-06328-6. Epub 2023 Jul 19. Nature. 2023. PMID: 37468638 Free PMC article.
-
A Conserved Local Structural Motif Controls the Kinetics of PTP1B Catalysis.J Chem Inf Model. 2023 Jul 10;63(13):4115-4124. doi: 10.1021/acs.jcim.3c00286. Epub 2023 Jun 28. J Chem Inf Model. 2023. PMID: 37378552 Free PMC article.
-
Neural network-derived Potts models for structure-based protein design using backbone atomic coordinates and tertiary motifs.Protein Sci. 2023 Feb;32(2):e4554. doi: 10.1002/pro.4554. Protein Sci. 2023. PMID: 36564857 Free PMC article.
-
Data-driven computational protein design.Curr Opin Struct Biol. 2021 Aug;69:63-69. doi: 10.1016/j.sbi.2021.03.009. Epub 2021 Apr 25. Curr Opin Struct Biol. 2021. PMID: 33910104 Free PMC article. Review.
-
Structural analysis of cross α-helical nanotubes provides insight into the designability of filamentous peptide nanomaterials.Nat Commun. 2021 Jan 18;12(1):407. doi: 10.1038/s41467-020-20689-w. Nat Commun. 2021. PMID: 33462223 Free PMC article.
References
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
