Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 15;13(1):6963.
doi: 10.1038/s41467-022-34600-2.

Prediction of inter-chain distance maps of protein complexes with 2D attention-based deep neural networks

Affiliations

Prediction of inter-chain distance maps of protein complexes with 2D attention-based deep neural networks

Zhiye Guo et al. Nat Commun. .

Abstract

Residue-residue distance information is useful for predicting tertiary structures of protein monomers or quaternary structures of protein complexes. Many deep learning methods have been developed to predict intra-chain residue-residue distances of monomers accurately, but few methods can accurately predict inter-chain residue-residue distances of complexes. We develop a deep learning method CDPred (i.e., Complex Distance Prediction) based on the 2D attention-powered residual network to address the gap. Tested on two homodimer datasets, CDPred achieves the precision of 60.94% and 42.93% for top L/5 inter-chain contact predictions (L: length of the monomer in homodimer), respectively, substantially higher than DeepHomo's 37.40% and 23.08% and GLINTER's 48.09% and 36.74%. Tested on the two heterodimer datasets, the top Ls/5 inter-chain contact prediction precision (Ls: length of the shorter monomer in heterodimer) of CDPred is 47.59% and 22.87% respectively, surpassing GLINTER's 23.24% and 13.49%. Moreover, the prediction of CDPred is complementary with that of AlphaFold2-multimer.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The histogram of the precision of the top L/10 contact predictions for the heterodimers in the HeteroTest2 dataset.
The X-axis is the four precision intervals from 0 to 100%. The Y-axis is the number of heterodimers whose contact precision falls in each interval. Each interval has 40, 2, 1, and 12 heterodimers, respectively.
Fig. 2
Fig. 2. Comparison between CDPred_PLM (blue) and CDPred_ESM (orange) on four different test datasets.
The y-axis is the top L/10 contact prediction precision, and the x-axis is the four different test datasets.
Fig. 3
Fig. 3. Comparison of using AlphaFold-predicted tertiary structure (blue) and true tertiary structure (yellow) to generate intra-chain distance maps as input for predicting inter-chain distance maps on the four datasets.
Top L/2 contact prediction precision on the datasets is reported.
Fig. 4
Fig. 4. The plot of inter-chain contact prediction precision against average contact probability.
The y-axis is the precision of top L/5 inter-chain contact predictions made by CDPred for a target, and the x-axis is the average probability of the top L/5 contact predictions for the target. Each point represents a dimer target in the four test datasets (HomoTest1, HomoTest2, HeteroTest1 and HeteroTest2).
Fig. 5
Fig. 5. The prediction for homodimer T0991 with a shallow MSA.
a The intra-chain distance map of the monomer predicted by AlphaFold. b The true intra-chain distance map of the monomer. c The inter-chain contact map predicted by CDPred. d the true inter-chain contact map.
Fig. 6
Fig. 6. Overview of the CDPred architecture.
CDPred simultaneously uses the tertiary structural information (i.e., intra-chain distance map of monomers), sequential information (PSSM), and residue-residue co-evolutionary information (i.e., co-evolutionary scores calculated by CCMpred and attention maps by MSA transformer) as input to predict inter-chain distance maps. The dimension of the input for the homomer dimer is L × L × 186 (L is the length of the monomer sequence), while the dimension of the input for the heterodimer is (L1 + L2) x (L1 + L2) × 186 (L1 and L2 are the length of the two different monomers in the heterodimer). Each of the two output matrices has the same dimension as the input except for the number of output channels. The number of the output channels of the output layer is 42, storing the predicted probability of the distance in 42 distance bins. Two output matrices are generated, representing the two kinds of predicted inter-chain distance maps.

Similar articles

Cited by

References

    1. Utsumi S, Matsumura Y. Structure-function relationships. Food Proteins Appl. 1997;80:257.
    1. Spirin V, Mirny LA. Protein complexes and functional modules in molecular networks. Proc. Natl Acad. Sci. USA. 2003;100:12123–12128. doi: 10.1073/pnas.2032324100. - DOI - PMC - PubMed
    1. Eickholt J, Cheng J. Predicting protein residue–residue contacts using deep networks and boosting. Bioinformatics. 2012;28:3066–3072. doi: 10.1093/bioinformatics/bts598. - DOI - PMC - PubMed
    1. Adhikari B, Hou J, Cheng J. DNCON2: improved protein contact prediction using two-level deep convolutional neural networks. Bioinformatics. 2018;34:1466–1472. doi: 10.1093/bioinformatics/btx781. - DOI - PMC - PubMed
    1. Wang S, Sun S, Li Z, Zhang R, Xu J. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol. 2017;13:e1005324. doi: 10.1371/journal.pcbi.1005324. - DOI - PMC - PubMed

Publication types

MeSH terms