EGBMMDA: Extreme Gradient Boosting Machine for MiRNA-Disease Association prediction

Cell Death Dis. 2018 Jan 5;9(1):3. doi: 10.1038/s41419-017-0003-x.

Abstract

Associations between microRNAs (miRNAs) and human diseases have been identified by increasing studies and discovering new ones is an ongoing process in medical laboratories. To improve experiment productivity, researchers computationally infer potential associations from biological data, selecting the most promising candidates for experimental verification. Predicting potential miRNA-disease association has become a research area of growing importance. This paper presents a model of Extreme Gradient Boosting Machine for MiRNA-Disease Association (EGBMMDA) prediction by integrating the miRNA functional similarity, the disease semantic similarity, and known miRNA-disease associations. The statistical measures, graph theoretical measures, and matrix factorization results for each miRNA-disease pair were calculated and used to form an informative feature vector. The vector for known associated pairs obtained from the HMDD v2.0 database was used to train a regression tree under the gradient boosting framework. EGBMMDA was the first decision tree learning-based model used for predicting miRNA-disease associations. Respectively, AUCs of 0.9123 and 0.8221 in global and local leave-one-out cross-validation proved the model's reliable performance. Moreover, the 0.9048 ± 0.0012 AUC in fivefold cross-validation confirmed its stability. We carried out three different types of case studies of predicting potential miRNAs related to Colon Neoplasms, Lymphoma, Prostate Neoplasms, Breast Neoplasms, and Esophageal Neoplasms. The results indicated that, respectively, 98%, 90%, 98%, 100%, and 98% of the top 50 predictions for the five diseases were confirmed by experiments. Therefore, EGBMMDA appears to be a useful computational resource for miRNA-disease association prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Area Under Curve
  • Colonic Neoplasms / genetics
  • Colonic Neoplasms / pathology*
  • Computational Biology / methods*
  • Databases, Factual
  • Esophageal Neoplasms / genetics
  • Esophageal Neoplasms / pathology*
  • Genetic Predisposition to Disease
  • Humans
  • Male
  • MicroRNAs / metabolism*
  • Prostatic Neoplasms / genetics
  • Prostatic Neoplasms / pathology*
  • ROC Curve

Substances

  • MicroRNAs