Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13
- PMID: 31407406
- PMCID: PMC6851483
- DOI: 10.1002/prot.25798
Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13
Abstract
We report the results of residue-residue contact prediction of a new pipeline built purely on the learning of coevolutionary features in the CASP13 experiment. For a query sequence, the pipeline starts with the collection of multiple sequence alignments (MSAs) from multiple genome and metagenome sequence databases using two complementary Hidden Markov Model (HMM)-based searching tools. Three profile matrices, built on covariance, precision, and pseudolikelihood maximization respectively, are then created from the MSAs, which are used as the input features of a deep residual convolutional neural network architecture for contact-map training and prediction. Two ensembling strategies have been proposed to integrate the matrix features through end-to-end training and stacking, resulting in two complementary programs called TripletRes and ResTriplet, respectively. For the 31 free-modeling domains that do not have homologous templates in the PDB, TripletRes and ResTriplet generated comparable results with an average accuracy of 0.640 and 0.646, respectively, for the top L/5 long-range predictions, where 71% and 74% of the cases have an accuracy above 0.5. Detailed data analyses showed that the strength of the pipeline is due to the sensitive MSA construction and the advanced strategies for coevolutionary feature ensembling. Domain splitting was also found to help enhance the contact prediction performance. Nevertheless, contact models for tail regions, which often involve a high number of alignment gaps, and for targets with few homologous sequences are still suboptimal. Development of new approaches where the model is specifically trained on these regions and targets might help address these problems.
Keywords: CASP; coevolution analysis; contact-map prediction; deep learning; protein folding.
© 2019 Wiley Periodicals, Inc.
Figures
Similar articles
-
Deep-learning contact-map guided protein structure prediction in CASP13.Proteins. 2019 Dec;87(12):1149-1164. doi: 10.1002/prot.25792. Epub 2019 Aug 14. Proteins. 2019. PMID: 31365149 Free PMC article.
-
Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks.PLoS Comput Biol. 2021 Mar 26;17(3):e1008865. doi: 10.1371/journal.pcbi.1008865. eCollection 2021 Mar. PLoS Comput Biol. 2021. PMID: 33770072 Free PMC article.
-
Analysis of distance-based protein structure prediction by deep learning in CASP13.Proteins. 2019 Dec;87(12):1069-1081. doi: 10.1002/prot.25810. Epub 2019 Sep 13. Proteins. 2019. PMID: 31471916
-
Deep Learning-Based Advances in Protein Structure Prediction.Int J Mol Sci. 2021 May 24;22(11):5553. doi: 10.3390/ijms22115553. Int J Mol Sci. 2021. PMID: 34074028 Free PMC article. Review.
-
Toward the solution of the protein structure prediction problem.J Biol Chem. 2021 Jul;297(1):100870. doi: 10.1016/j.jbc.2021.100870. Epub 2021 Jun 11. J Biol Chem. 2021. PMID: 34119522 Free PMC article. Review.
Cited by
-
Recent Progress of Protein Tertiary Structure Prediction.Molecules. 2024 Feb 13;29(4):832. doi: 10.3390/molecules29040832. Molecules. 2024. PMID: 38398585 Free PMC article. Review.
-
Integrating deep learning, threading alignments, and a multi-MSA strategy for high-quality protein monomer and complex structure prediction in CASP15.Proteins. 2023 Dec;91(12):1684-1703. doi: 10.1002/prot.26585. Epub 2023 Aug 31. Proteins. 2023. PMID: 37650367
-
Microbiome-based enrichment pattern mining has enabled a deeper understanding of the biome-species-function relationship.Commun Biol. 2023 Apr 10;6(1):391. doi: 10.1038/s42003-023-04753-x. Commun Biol. 2023. PMID: 37037946 Free PMC article.
-
Prediction of inter-chain distance maps of protein complexes with 2D attention-based deep neural networks.Nat Commun. 2022 Nov 15;13(1):6963. doi: 10.1038/s41467-022-34600-2. Nat Commun. 2022. PMID: 36379943 Free PMC article.
-
Protein Function Analysis through Machine Learning.Biomolecules. 2022 Sep 6;12(9):1246. doi: 10.3390/biom12091246. Biomolecules. 2022. PMID: 36139085 Free PMC article. Review.
References
-
- Browne WJ, North AC, Phillips DC, Brew K, Vanaman TC, Hill RL. A possible three-dimensional structure of bovine alpha-lactalbumin based on that of hen’s egg-white lysozyme. J Mol Biol 1969;42(1):65–86. - PubMed
-
- Levitt M, Warshel A. Computer-Simulation of Protein Folding. Nature 1975;253(5494):694–698. - PubMed
-
- Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 1993;234(3):779–815. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
