A deep learning model for plant lncRNA-protein interaction prediction with graph attention

Mol Genet Genomics. 2020 Sep;295(5):1091-1102. doi: 10.1007/s00438-020-01682-w. Epub 2020 May 15.

Abstract

Long non-coding RNAs (lncRNAs) play a broad spectrum of distinctive regulatory roles through interactions with proteins. However, only a few plant lncRNAs have been experimentally characterized. We propose GPLPI, a graph representation learning method, to predict plant lncRNA-protein interaction (LPI) from sequence and structural information. GPLPI employs a generative model using long short-term memory (LSTM) with graph attention. Evolutionary features are extracted using frequency chaos game representation (FCGR). Manifold regularization and l2-norm are adopted to obtain discriminant feature representations and mitigate overfitting. The model captures locality preserving and reconstruction constraints that lead to better generalization ability. Finally, potential interactions between lncRNAs and proteins are predicted by integrating catboost and regularized Logistic regression based on L-BFGS optimization algorithm. The method is trained and tested on Arabidopsis thaliana and Zea mays datasets. GPLPI achieves accuracies of 85.76% and 91.97% respectively. The results show that our method consistently outperforms other state-of-the-art methods.

Keywords: Deep learning; Graph attention; Interaction; Prediction; Protein; lncRNA.

MeSH terms

  • Algorithms
  • Arabidopsis / metabolism
  • Computational Biology / methods*
  • Deep Learning
  • Logistic Models
  • Models, Molecular
  • Plant Proteins / chemistry
  • Plant Proteins / metabolism*
  • Plants / metabolism*
  • RNA, Long Noncoding / chemistry
  • RNA, Long Noncoding / metabolism*
  • RNA, Plant / chemistry
  • RNA, Plant / metabolism
  • Zea mays / metabolism

Substances

  • Plant Proteins
  • RNA, Long Noncoding
  • RNA, Plant