Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 25;6(1):876.
doi: 10.1038/s42003-023-05133-1.

Integration of pre-trained protein language models into geometric deep learning networks

Affiliations

Integration of pre-trained protein language models into geometric deep learning networks

Fang Wu et al. Commun Biol. .

Abstract

Geometric deep learning has recently achieved great success in non-Euclidean domains, and learning on 3D structures of large biomolecules is emerging as a distinct research area. However, its efficacy is largely constrained due to the limited quantity of structural data. Meanwhile, protein language models trained on substantial 1D sequences have shown burgeoning capabilities with scale in a broad range of applications. Several preceding studies consider combining these different protein modalities to promote the representation power of geometric neural networks but fail to present a comprehensive understanding of their benefits. In this work, we integrate the knowledge learned by well-trained protein language models into several state-of-the-art geometric networks and evaluate a variety of protein representation learning benchmarks, including protein-protein interface prediction, model quality assessment, protein-protein rigid-body docking, and binding affinity prediction. Our findings show an overall improvement of 20% over baselines. Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin and can be generalized to complex tasks.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Illustration of our framework to strengthen GGNNs with knowledge of protein language models.
The protein sequence is first forwarded into a pretrained protein language model to extract per-residue representations, which are then used as node features in 3D protein graphs for GGNNs.
Fig. 2
Fig. 2. Some ablation studies.
a Results of PPI with and without PLMs. b Performance of GGNNs on MQA with ESM-2 at different scales.
Fig. 3
Fig. 3. Illustration of the sequence recovery problem.
a Protein residue graph construction. Here we draw graphs in 2D for better visualization but study 3D graphs for GGNNs. b Two sequence recovery tasks. The first requires GGNNs to predict the absolute position index for each residue in the protein sequence. The second aims to forecast the minimum distance of each amino acid to the two sides of the protein sequence.

Similar articles

Cited by

References

    1. Xu, M. et al. Geodiff: a geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations (ICLR, 2022).
    1. Townshend, R. J. et al. Atom3d: tasks on molecules in three dimensions. 35th Conference on Neural Information Processing Systems (NeurIPS 2021).
    1. Wu Z, et al. Moleculenet: a benchmark for molecular machine learning. Chem. Sci. 2018;9:513–530. doi: 10.1039/C7SC02664A. - DOI - PMC - PubMed
    1. Lim J, et al. Predicting drug–target interaction using a novel graph neural network with 3d structure-embedded graph representation. J Chem. Inf. Model. 2019;59:3981–3988. doi: 10.1021/acs.jcim.9b00387. - DOI - PubMed
    1. Liu, Y., Yuan, H., Cai, L. & Ji, S. Deep learning of high-order interactions for protein interface prediction. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 679–687 (ACM, 2020).

Publication types