Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec;87(12):1082-1091.
doi: 10.1002/prot.25798. Epub 2019 Aug 22.

Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13

Affiliations

Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13

Yang Li et al. Proteins. 2019 Dec.

Abstract

We report the results of residue-residue contact prediction of a new pipeline built purely on the learning of coevolutionary features in the CASP13 experiment. For a query sequence, the pipeline starts with the collection of multiple sequence alignments (MSAs) from multiple genome and metagenome sequence databases using two complementary Hidden Markov Model (HMM)-based searching tools. Three profile matrices, built on covariance, precision, and pseudolikelihood maximization respectively, are then created from the MSAs, which are used as the input features of a deep residual convolutional neural network architecture for contact-map training and prediction. Two ensembling strategies have been proposed to integrate the matrix features through end-to-end training and stacking, resulting in two complementary programs called TripletRes and ResTriplet, respectively. For the 31 free-modeling domains that do not have homologous templates in the PDB, TripletRes and ResTriplet generated comparable results with an average accuracy of 0.640 and 0.646, respectively, for the top L/5 long-range predictions, where 71% and 74% of the cases have an accuracy above 0.5. Detailed data analyses showed that the strength of the pipeline is due to the sensitive MSA construction and the advanced strategies for coevolutionary feature ensembling. Domain splitting was also found to help enhance the contact prediction performance. Nevertheless, contact models for tail regions, which often involve a high number of alignment gaps, and for targets with few homologous sequences are still suboptimal. Development of new approaches where the model is specifically trained on these regions and targets might help address these problems.

Keywords: CASP; coevolution analysis; contact-map prediction; deep learning; protein folding.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The pipeline of TripletRes and ResTriplet for contact-map prediction in CASP13.
Figure 2.
Figure 2.
Illustration of the effect of MSAs on the performance of TripletRes and ResTriplet. (A) and (B) Comparison of top L long-range contact prediction results using MSAs by DeepMSA versus those by the routine HHblits search for TripletRes and ResTriplet, respectively. (C) and (D) Precision of long-range top L/5 contact prediction versus Nf of MSAs for TripletRes and ResTriplet, respectively.
Figure 3.
Figure 3.
Mean precisions of long-range top L and top L/5 contacts of TripletRes and ResTriplet on FM targets, compared to the predictors trained on the component features from the covariance matrix feature (COV), the precision matrix feature (PRE) and the coupling matrix of the inverse Potts model feature (PLM).
Figure 4.
Figure 4.
Comparison of precisions of long-range top L contact predictions with domain partitioning versus those without using domain partitioning. (A) TripletRes; (B) ResTriplet.
Figure 5.
Figure 5.
An illustrative example of CASP13 domain T0957s-D1 showing false positive contact prediction in the N-terminal tail region due to the higher number of gaps in the alignment. (A) Bar plot of the number of gaps along the query sequence. (B) Contacts from the native structure (lower-right triangle section) versus predicted contacts by ResTriplet (upper-left section) where gray circles and black squares denote false and true positive predictions respectively. (C) 3D experimental structure of the T0957s-D1 with the N-terminal tail marked in black.

Similar articles

Cited by

References

    1. Browne WJ, North AC, Phillips DC, Brew K, Vanaman TC, Hill RL. A possible three-dimensional structure of bovine alpha-lactalbumin based on that of hen’s egg-white lysozyme. J Mol Biol 1969;42(1):65–86. - PubMed
    1. Levitt M, Warshel A. Computer-Simulation of Protein Folding. Nature 1975;253(5494):694–698. - PubMed
    1. Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 1993;234(3):779–815. - PubMed
    1. Wu S, Szilagyi A, Zhang Y. Improving protein structure prediction using multiple sequence-based contact predictions. Structure 2011;19(8):1182–1191. - PMC - PubMed
    1. Ovchinnikov S, Kim DE, Wang RY, Liu Y, DiMaio F, Baker D. Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta. Proteins 2016;84 Suppl 1:67–75. - PMC - PubMed

Publication types