Protein 3D structure computed from evolutionary sequence variation
- PMID: 22163331
- PMCID: PMC3233603
- DOI: 10.1371/journal.pone.0028766
Protein 3D structure computed from evolutionary sequence variation
Abstract
The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing.In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues, including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7-4.8 Å C(α)-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.
Conflict of interest statement
Figures
Similar articles
-
Three-dimensional structures of membrane proteins from genomic sequencing.Cell. 2012 Jun 22;149(7):1607-21. doi: 10.1016/j.cell.2012.04.012. Epub 2012 May 10. Cell. 2012. PMID: 22579045 Free PMC article.
-
Protein Structure from Experimental Evolution.Cell Syst. 2020 Jan 22;10(1):15-24.e5. doi: 10.1016/j.cels.2019.11.008. Epub 2019 Dec 11. Cell Syst. 2020. PMID: 31838147
-
All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences.Proc Natl Acad Sci U S A. 2015 Apr 28;112(17):5413-8. doi: 10.1073/pnas.1419956112. Epub 2015 Apr 9. Proc Natl Acad Sci U S A. 2015. PMID: 25858953 Free PMC article.
-
A Hybrid Approach for Protein Structure Determination Combining Sparse NMR with Evolutionary Coupling Sequence Data.Adv Exp Med Biol. 2018;1105:153-169. doi: 10.1007/978-981-13-2200-6_10. Adv Exp Med Biol. 2018. PMID: 30617828 Free PMC article. Review.
-
A case for evolutionary genomics and the comprehensive examination of sequence biodiversity.Mol Biol Evol. 2000 Dec;17(12):1776-88. doi: 10.1093/oxfordjournals.molbev.a026278. Mol Biol Evol. 2000. PMID: 11110893 Review.
Cited by
-
Assessing the role of evolutionary information for enhancing protein language model embeddings.Sci Rep. 2024 Sep 5;14(1):20692. doi: 10.1038/s41598-024-71783-8. Sci Rep. 2024. PMID: 39237735 Free PMC article.
-
Recent Advances in Protein Homology Detection Propelled by Inter-Residue Interaction Map Threading.Front Mol Biosci. 2021 May 11;8:643752. doi: 10.3389/fmolb.2021.643752. eCollection 2021. Front Mol Biosci. 2021. PMID: 34046429 Free PMC article. Review.
-
Network deconvolution as a general method to distinguish direct dependencies in networks.Nat Biotechnol. 2013 Aug;31(8):726-33. doi: 10.1038/nbt.2635. Epub 2013 Jul 14. Nat Biotechnol. 2013. PMID: 23851448 Free PMC article.
-
Overcoming Immunological Challenges Limiting Capsid-Mediated Gene Therapy With Machine Learning.Front Immunol. 2021 Apr 27;12:674021. doi: 10.3389/fimmu.2021.674021. eCollection 2021. Front Immunol. 2021. PMID: 33986759 Free PMC article.
-
Correlated rigid modes in protein families.Phys Biol. 2016 Apr 11;13(2):025003. doi: 10.1088/1478-3975/13/2/025003. Phys Biol. 2016. PMID: 27063781 Free PMC article.
References
-
- Altschuh D, Lesk AM, Bloomer AC, Klug A. Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J Mol Biol. 1987;193:693–707. - PubMed
-
- Altschuh D, Vernet T, Berti P, Moras D, Nagai K. Coordinated amino acid changes in homologous protein families. Protein Eng. 1988;2:193–199. - PubMed
-
- Göbel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins. 1994;18:309–317. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
