DeepciRGO: functional prediction of circular RNAs through hierarchical deep neural networks using heterogeneous network features

BMC Bioinformatics. 2020 Nov 12;21(1):519. doi: 10.1186/s12859-020-03748-3.

Abstract

Background: Circular RNAs (circRNAs) are special noncoding RNA molecules with closed loop structures. Compared with the traditional linear RNA, circRNA is more stable and not easily degraded. Many studies have shown that circRNAs are involved in the regulation of various diseases and cancers. Determining the functions of circRNAs in mammalian cells is of great significance for revealing their mechanism of action in physiological and pathological processes, diagnosis and treatment of diseases. However, determining the functions of circRNAs on a large scale is a challenging task because of the high experimental costs.

Results: In this paper, we present a hierarchical deep learning model, DeepciRGO, which can effectively predict gene ontology functions of circRNAs. We build a heterogeneous network containing circRNA co-expressions, protein-protein interactions and protein-circRNA interactions. The topology features of proteins and circRNAs are calculated using a novel representation learning approach HIN2Vec across the heterogeneous network. Then, a deep multi-label hierarchical classification model is trained with the topology features to predict the biological process function in the gene ontology for each circRNA. In particular, we manually curated a benchmark dataset containing 185 GO annotations for 62 circRNAs, namely, circRNA2GO-62. The DeepciRGO achieves promising performance on the circRNA2GO-62 dataset with a maximum F-measure of 0.412, a recall score of 0.400, and an accuracy of 0.425, which are significantly better than other state-of-the-art RNA function prediction methods. In addition, we demonstrate the considerable potential of integrating multiple interactions and association networks.

Conclusions: DeepciRGO will be a useful tool for accurately annotating circRNAs. The experimental results show that integrating multi-source data can help to improve the predictive performance of DeepciRGO. Moreover, The model also can combine RNA structure and sequence information to further optimize predictive performance.

Keywords: Gene ontology; HIN2Vec; Multi-label hierarchical classification; Representation learning.

MeSH terms

  • Cell Line, Tumor
  • Cell Movement
  • Cell Proliferation
  • DNA Helicases / genetics
  • Gene Ontology
  • Genetic Loci
  • Humans
  • Neural Networks, Computer*
  • Protein Interaction Maps / genetics
  • RNA, Circular / metabolism*
  • Ubiquitin-Protein Ligases / genetics

Substances

  • RNA, Circular
  • SHPRH protein, human
  • Ubiquitin-Protein Ligases
  • DNA Helicases