Improving circRNA-disease association prediction by sequence and ontology representations with convolutional and recurrent neural networks

Bioinformatics. 2021 Apr 5;36(24):5656-5664. doi: 10.1093/bioinformatics/btaa1077.

Abstract

Motivation: Emerging studies indicate that circular RNAs (circRNAs) are widely involved in the progression of human diseases. Due to its special structure which is stable, circRNAs are promising diagnostic and prognostic biomarkers for diseases. However, the experimental verification of circRNA-disease associations is expensive and limited to small-scale. Effective computational methods for predicting potential circRNA-disease associations are regarded as a matter of urgency. Although several models have been proposed, over-reliance on known associations and the absence of characteristics of biological functions make precise predictions are still challenging.

Results: In this study, we propose a method for predicting CircRNA-disease associations based on sequence and ontology representations, named CDASOR, with convolutional and recurrent neural networks. For sequences of circRNAs, we encode them with continuous k-mers, get low-dimensional vectors of k-mers, extract their local feature vectors with 1D CNN and learn their long-term dependencies with bi-directional long short-term memory. For diseases, we serialize disease ontology into sentences containing the hierarchy of ontology, obtain low-dimensional vectors for disease ontology terms and get terms' dependencies. Furthermore, we get association patterns of circRNAs and diseases from known circRNA-disease associations with neural networks. After the above steps, we get circRNAs' and diseases' high-level representations, which are informative to improve the prediction. The experimental results show that CDASOR provides an accurate prediction. Importing the characteristics of biological functions, CDASOR achieves impressive predictions in the de novo test. In addition, 6 of the top-10 predicted results are verified by the published literature in the case studies.

Availability and implementation: The code and data of CDASOR are freely available at https://github.com/BioinformaticsCSU/CDASOR.