Integration of pre-trained protein language models into geometric deep learning networks
- PMID: 37626165
- PMCID: PMC10457366
- DOI: 10.1038/s42003-023-05133-1
Integration of pre-trained protein language models into geometric deep learning networks
Abstract
Geometric deep learning has recently achieved great success in non-Euclidean domains, and learning on 3D structures of large biomolecules is emerging as a distinct research area. However, its efficacy is largely constrained due to the limited quantity of structural data. Meanwhile, protein language models trained on substantial 1D sequences have shown burgeoning capabilities with scale in a broad range of applications. Several preceding studies consider combining these different protein modalities to promote the representation power of geometric neural networks but fail to present a comprehensive understanding of their benefits. In this work, we integrate the knowledge learned by well-trained protein language models into several state-of-the-art geometric networks and evaluate a variety of protein representation learning benchmarks, including protein-protein interface prediction, model quality assessment, protein-protein rigid-body docking, and binding affinity prediction. Our findings show an overall improvement of 20% over baselines. Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin and can be generalized to complex tasks.
© 2023. Springer Nature Limited.
Conflict of interest statement
The authors declare no competing interests.
Figures
Similar articles
-
Depressing time: Waiting, melancholia, and the psychoanalytic practice of care.In: Kirtsoglou E, Simpson B, editors. The Time of Anthropology: Studies of Contemporary Chronopolitics. Abingdon: Routledge; 2020. Chapter 5. In: Kirtsoglou E, Simpson B, editors. The Time of Anthropology: Studies of Contemporary Chronopolitics. Abingdon: Routledge; 2020. Chapter 5. PMID: 36137063 Free Books & Documents. Review.
-
On the objectivity, reliability, and validity of deep learning enabled bioimage analyses.Elife. 2020 Oct 19;9:e59780. doi: 10.7554/eLife.59780. Elife. 2020. PMID: 33074102 Free PMC article.
-
Qualitative evidence synthesis informing our understanding of people's perceptions and experiences of targeted digital communication.Cochrane Database Syst Rev. 2019 Oct 23;10(10):ED000141. doi: 10.1002/14651858.ED000141. Cochrane Database Syst Rev. 2019. PMID: 31643081 Free PMC article.
-
Australia in 2030: what is our path to health for all?Med J Aust. 2021 May;214 Suppl 8:S5-S40. doi: 10.5694/mja2.51020. Med J Aust. 2021. PMID: 33934362
-
Prognostic models for predicting clinical disease progression, worsening and activity in people with multiple sclerosis.Cochrane Database Syst Rev. 2023 Sep 8;9(9):CD013606. doi: 10.1002/14651858.CD013606.pub2. Cochrane Database Syst Rev. 2023. PMID: 37681561 Free PMC article. Review.
Cited by
-
Pair-EGRET: enhancing the prediction of protein-protein interaction sites through graph attention networks and protein language models.Bioinformatics. 2024 Oct 1;40(10):btae588. doi: 10.1093/bioinformatics/btae588. Bioinformatics. 2024. PMID: 39360982 Free PMC article.
-
SSEmb: A joint embedding of protein sequence and structure enables robust variant effect predictions.Nat Commun. 2024 Nov 7;15(1):9646. doi: 10.1038/s41467-024-53982-z. Nat Commun. 2024. PMID: 39511177 Free PMC article.
-
EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks.Nucleic Acids Res. 2024 Mar 21;52(5):e27. doi: 10.1093/nar/gkae039. Nucleic Acids Res. 2024. PMID: 38281252 Free PMC article.
-
Pairing interacting protein sequences using masked language modeling.Proc Natl Acad Sci U S A. 2024 Jul 2;121(27):e2311887121. doi: 10.1073/pnas.2311887121. Epub 2024 Jun 24. Proc Natl Acad Sci U S A. 2024. PMID: 38913900 Free PMC article.
-
Protein language model-embedded geometric graphs power inter-protein contact prediction.Elife. 2024 Apr 2;12:RP92184. doi: 10.7554/eLife.92184. Elife. 2024. PMID: 38564241 Free PMC article.
References
-
- Xu, M. et al. Geodiff: a geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations (ICLR, 2022).
-
- Townshend, R. J. et al. Atom3d: tasks on molecules in three dimensions. 35th Conference on Neural Information Processing Systems (NeurIPS 2021).
-
- Liu, Y., Yuan, H., Cai, L. & Ji, S. Deep learning of high-order interactions for protein interface prediction. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 679–687 (ACM, 2020).
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
