SPIN-CGNN: Improved fixed backbone protein design with contact map-based graph construction and contact graph neural network

Xing Zhang; Hongmei Yin; Fei Ling; Jian Zhan; Yaoqi Zhou

doi:10.1371/journal.pcbi.1011330

SPIN-CGNN: Improved fixed backbone protein design with contact map-based graph construction and contact graph neural network

PLoS Comput Biol. 2023 Dec 7;19(12):e1011330. doi: 10.1371/journal.pcbi.1011330. eCollection 2023 Dec.

Authors

Xing Zhang^{1

2}, Hongmei Yin², Fei Ling¹, Jian Zhan², Yaoqi Zhou²

Affiliations

¹ School of Biology and Biological Engineering, South China University of Technology, Guangzhou, People's Republic of China.
² Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, People's Republic of China.

Abstract

Recent advances in deep learning have significantly improved the ability to infer protein sequences directly from protein structures for the fix-backbone design. The methods have evolved from the early use of multi-layer perceptrons to convolutional neural networks, transformers, and graph neural networks (GNN). However, the conventional approach of constructing K-nearest-neighbors (KNN) graph for GNN has limited the utilization of edge information, which plays a critical role in network performance. Here we introduced SPIN-CGNN based on protein contact maps for nearest neighbors. Together with auxiliary edge updates and selective kernels, we found that SPIN-CGNN provided a comparable performance in refolding ability by AlphaFold2 to the current state-of-the-art techniques but a significant improvement over them in term of sequence recovery, perplexity, deviation from amino-acid compositions of native sequences, conservation of hydrophobic positions, and low complexity regions, according to the test by unseen structures, "hallucinated" structures and diffusion models. Results suggest that low complexity regions in the sequences designed by deep learning, for generated structures in particular, remain to be improved, when compared to the native sequences.

Copyright: © 2023 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

MeSH terms

Amino Acid Sequence
Amino Acids*
Cluster Analysis
Diffusion
Neural Networks, Computer*

Substances

Amino Acids

Grants and funding

This work was supported by National Key Research and Development Program of China [NO.2021YFF1200400] (XZ, HY, JZ, YZ) and Major Program of Shenzhen Bay Laboratory [S201101001] (XZ, HY, JZ, YZ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.