Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era
- PMID: 24009338
- PMCID: PMC3785744
- DOI: 10.1073/pnas.1314045110
Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era
Erratum in
- Proc Natl Acad Sci U S A. 2013 Nov 12;110(46):18734
Abstract
Recently developed methods have shown considerable promise in predicting residue-residue contacts in protein 3D structures using evolutionary covariance information. However, these methods require large numbers of evolutionarily related sequences to robustly assess the extent of residue covariation, and the larger the protein family, the more likely that contact information is unnecessary because a reasonable model can be built based on the structure of a homolog. Here we describe a method that integrates sequence coevolution and structural context information using a pseudolikelihood approach, allowing more accurate contact predictions from fewer homologous sequences. We rigorously assess the utility of predicted contacts for protein structure prediction using large and representative sequence and structure databases from recent structure prediction experiments. We find that contact predictions are likely to be accurate when the number of aligned sequences (with sequence redundancy reduced to 90%) is greater than five times the length of the protein, and that accurate predictions are likely to be useful for structure modeling if the aligned sequences are more similar to the protein of interest than to the closest homolog of known structure. These conditions are currently met by 422 of the protein families collected in the Pfam database.
Keywords: markov random field; maximum-entropy model; protein coevolution.
Conflict of interest statement
The authors declare no conflict of interest.
Figures
predicts GREMLINΔ: GREMLINΔ versus structural similarity of homolog to native structure computed by TM-align (14) (for homologs of all targets with high-resolution crystal structures < 2.1 Å). When
(blue bars), GREMLINΔ is rarely better than random (green bars, constructed by pooling 100 permutations of predicted scores for each target). When
(red bars), GREMLINΔ is significantly positive and contact scores successfully discriminate between native and homology model even when the homolog is likely to be from the same fold (similarity
). Error bars show mean and SD of distributions in all cases.
to the closest protein of known structure is shown in the lower panel. In cases where the difference in profiles is large (
: right bar in each group, Lower), these predictions are likely to improve on comparative models.Similar articles
-
De novo structure prediction of globular proteins aided by sequence variation-derived contacts.PLoS One. 2014 Mar 17;9(3):e92197. doi: 10.1371/journal.pone.0092197. eCollection 2014. PLoS One. 2014. PMID: 24637808 Free PMC article.
-
Protein structure determination using metagenome sequence data.Science. 2017 Jan 20;355(6322):294-298. doi: 10.1126/science.aah4043. Science. 2017. PMID: 28104891 Free PMC article.
-
Prediction of Structures and Interactions from Genome Information.Adv Exp Med Biol. 2018;1105:123-152. doi: 10.1007/978-981-13-2200-6_9. Adv Exp Med Biol. 2018. PMID: 30617827 Review.
-
Direct-coupling analysis of residue coevolution captures native contacts across many protein families.Proc Natl Acad Sci U S A. 2011 Dec 6;108(49):E1293-301. doi: 10.1073/pnas.1111471108. Epub 2011 Nov 21. Proc Natl Acad Sci U S A. 2011. PMID: 22106262 Free PMC article.
-
Prediction of contacts from correlated sequence substitutions.Curr Opin Struct Biol. 2013 Jun;23(3):473-9. doi: 10.1016/j.sbi.2013.04.001. Epub 2013 May 14. Curr Opin Struct Biol. 2013. PMID: 23680395 Review.
Cited by
-
Effect of Leu277 on Disproportionation and Hydrolysis Activity in Bacillus stearothermophilus NO2 Cyclodextrin Glucosyltransferase.Appl Environ Microbiol. 2021 May 26;87(12):e0315120. doi: 10.1128/AEM.03151-20. Epub 2021 May 26. Appl Environ Microbiol. 2021. PMID: 33837009 Free PMC article.
-
Assessing the functional roles of coevolving PHD finger residues.Protein Sci. 2024 Jul;33(7):e5065. doi: 10.1002/pro.5065. Protein Sci. 2024. PMID: 38923615
-
Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis.Proc Natl Acad Sci U S A. 2016 Oct 25;113(43):12186-12191. doi: 10.1073/pnas.1607570113. Epub 2016 Oct 11. Proc Natl Acad Sci U S A. 2016. PMID: 27729520 Free PMC article.
-
Protein-protein interaction prediction with deep learning: A comprehensive review.Comput Struct Biotechnol J. 2022 Sep 19;20:5316-5341. doi: 10.1016/j.csbj.2022.08.070. eCollection 2022. Comput Struct Biotechnol J. 2022. PMID: 36212542 Free PMC article. Review.
-
LcSAO1, an Unconventional DOXB Clade 2OGD Enzyme from Ligusticum chuanxiong Catalyzes the Biosynthesis of Plant-Derived Natural Medicine Butylphthalide.Int J Mol Sci. 2023 Dec 13;24(24):17417. doi: 10.3390/ijms242417417. Int J Mol Sci. 2023. PMID: 38139246 Free PMC article.
References
-
- Tress ML, Valencia A. Predicted residue–residue contacts can help the scoring of 3d models. Proteins. Struct Funct Bioinf. 2010;78(8):1980–1991. - PubMed
-
- Jones DT, Buchan DWA, Cozzetto D, Pontil M. PSICOV: Precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics. 2012;28(2):184–190. - PubMed
-
- Balakrishnan S, Kamisetty H, Carbonell JG, Lee SI, Langmead CJ. Learning generative models for protein fold families. Protiens Struct Funct Bioinf. 2011;79(4):1061–1078. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous
