PRPI-SC: an ensemble deep learning model for predicting plant lncRNA-protein interactions

BMC Bioinformatics. 2021 Aug 24;22(Suppl 3):415. doi: 10.1186/s12859-021-04328-9.

Abstract

Background: Plant long non-coding RNAs (lncRNAs) play vital roles in many biological processes mainly through interactions with RNA-binding protein (RBP). To understand the function of lncRNAs, a fundamental method is to identify which types of proteins interact with the lncRNAs. However, the models or rules of interactions are a major challenge when calculating and estimating the types of RBP.

Results: In this study, we propose an ensemble deep learning model to predict plant lncRNA-protein interactions using stacked denoising autoencoder and convolutional neural network based on sequence and structural information, named PRPI-SC. PRPI-SC predicts interactions between lncRNAs and proteins based on the k-mer features of RNAs and proteins. Experiments proved good results on Arabidopsis thaliana and Zea mays datasets (ATH948 and ZEA22133). The accuracy rates of ATH948 and ZEA22133 datasets were 88.9% and 82.6%, respectively. PRPI-SC also performed well on some public RNA protein interaction datasets.

Conclusions: PRPI-SC accurately predicts the interaction between plant lncRNA and protein, which plays a guiding role in studying the function and expression of plant lncRNA. At the same time, PRPI-SC has a strong generalization ability and good prediction effect for non-plant data.

Keywords: Convolutional neural network; Stacked denoising autoencoder; k-Mer; lncRNA-protein.

MeSH terms

  • Computational Biology
  • Deep Learning*
  • Neural Networks, Computer
  • RNA, Long Noncoding* / genetics
  • RNA-Binding Proteins

Substances

  • RNA, Long Noncoding
  • RNA-Binding Proteins