Annotating Diseases Using Human Phenotype Ontology Improves Prediction of Disease-Associated Long Non-coding RNAs

J Mol Biol. 2018 Jul 20;430(15):2219-2230. doi: 10.1016/j.jmb.2018.05.006. Epub 2018 May 24.

Abstract

Recently, many long non-coding RNAs (lncRNAs) have been identified and their biological function has been characterized; however, our understanding of their underlying molecular mechanisms related to disease is still limited. To overcome the limitation in experimentally identifying disease-lncRNA associations, computational methods have been proposed as a powerful tool to predict such associations. These methods are usually based on the similarities between diseases or lncRNAs since it was reported that similar diseases are associated with functionally similar lncRNAs. Therefore, prediction performance is highly dependent on how well the similarities can be captured. Previous studies have calculated the similarity between two diseases by mapping exactly each disease to a single Disease Ontology (DO) term, and then use a semantic similarity measure to calculate the similarity between them. However, the problem of this approach is that a disease can be described by more than one DO terms. Until now, there is no annotation database of DO terms for diseases except for genes. In contrast, Human Phenotype Ontology (HPO) is designed to fully annotate human disease phenotypes. Therefore, in this study, we constructed disease similarity networks/matrices using HPO instead of DO. Then, we used these networks/matrices as inputs of two representative machine learning-based and network-based ranking algorithms, that is, regularized least square and heterogeneous graph-based inference, respectively. The results showed that the prediction performance of the two algorithms on HPO-based is better than that on DO-based networks/matrices. In addition, our method can predict 11 novel cancer-associated lncRNAs, which are supported by literature evidence.

Keywords: Disease Ontology; Human Phenotype Ontology; disease-associated lncRNA; ranking algorithms; semantic similarity.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Databases, Genetic
  • Disease / classification
  • Disease / genetics*
  • Gene Ontology*
  • Gene Regulatory Networks
  • Humans
  • Molecular Sequence Annotation*
  • Neoplasms / genetics
  • Phenotype
  • RNA, Long Noncoding / genetics*
  • Reproducibility of Results

Substances

  • RNA, Long Noncoding