ProALIGN: Directly Learning Alignments for Protein Structure Prediction via Exploiting Context-Specific Alignment Motifs

Lupeng Kong; Fusong Ju; Wei-Mou Zheng; Jianwei Zhu; Shiwei Sun; Jinbo Xu; Dongbo Bu

doi:10.1089/cmb.2021.0430

ProALIGN: Directly Learning Alignments for Protein Structure Prediction via Exploiting Context-Specific Alignment Motifs

J Comput Biol. 2022 Feb;29(2):92-105. doi: 10.1089/cmb.2021.0430. Epub 2022 Jan 21.

Authors

Lupeng Kong^{1

2

3}, Fusong Ju^{1

2}, Wei-Mou Zheng⁴, Jianwei Zhu⁵, Shiwei Sun^{1

2}, Jinbo Xu³, Dongbo Bu^{1

2}

Affiliations

¹ Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.
² University of Chinese Academy of Sciences, Beijing, China.
³ Toyota Technological Institute, Chicago, Illinois, USA.
⁴ Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China.
⁵ Microsoft Research Asia, Beijing, China.

Abstract

Template-based modeling (TBM), including homology modeling and protein threading, is one of the most reliable techniques for protein structure prediction. It predicts protein structure by building an alignment between the query sequence under prediction and the templates with solved structures. However, it is still very challenging to build the optimal sequence-template alignment, especially when only distantly related templates are available. Here we report a novel deep learning approach ProALIGN that can predict much more accurate sequence-template alignment. Like protein sequences consisting of sequence motifs, protein alignments are also composed of frequently occurring alignment motifs with characteristic patterns. Alignment motifs are context-specific as their characteristic patterns are tightly related to sequence contexts of the aligned regions. Inspired by this observation, we represent a protein alignment as a binary matrix (in which 1 denotes an aligned residue pair) and then use a deep convolutional neural network to predict the optimal alignment from the query protein and its template. The trained neural network implicitly but effectively encodes an alignment scoring function, which reduces inaccuracies in the handcrafted scoring functions widely used by the current threading approaches. For a query protein and a template, we apply the neural network to directly infer likelihoods of all possible residue pairs in their entirety, which could effectively consider the correlations among multiple residues. We further construct the alignment with maximum likelihood, and finally build a structure model according to the alignment. Tested on three independent data sets with a total of 6688 protein alignment targets and 80 CASP13 TBM targets, our method achieved much better alignments and 3D structure models than the existing methods, including HHpred, CNFpred, CEthreader, and DeepThreader. These results clearly demonstrate the effectiveness of exploiting the context-specific alignment motifs by deep learning for protein threading.

Keywords: deep learning and protein threading; protein alignment; protein structure prediction.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Amino Acid Motifs
Amino Acid Sequence
Computational Biology
Deep Learning*
Models, Molecular
Neural Networks, Computer
Protein Conformation
Proteins / chemistry*
Proteins / genetics
Sequence Alignment / statistics & numerical data*
Sequence Analysis, Protein / statistics & numerical data
Software

Substances

Proteins

Grants and funding

R01 GM089753/GM/NIGMS NIH HHS/United States