Time-resolved evaluation of compound repositioning predictions on a text-mined knowledge network

BMC Bioinformatics. 2019 Dec 11;20(1):653. doi: 10.1186/s12859-019-3297-0.

Abstract

Background: Computational compound repositioning has the potential for identifying new uses for existing drugs, and new algorithms and data source aggregation strategies provide ever-improving results via in silico metrics. However, even with these advances, the number of compounds successfully repositioned via computational screening remains low. New strategies for algorithm evaluation that more accurately reflect the repositioning potential of a compound could provide a better target for future optimizations.

Results: Using a text-mined database, we applied a previously described network-based computational repositioning algorithm, yielding strong results via cross-validation, averaging 0.95 AUROC on test-set indications. However, to better approximate a real-world scenario, we built a time-resolved evaluation framework. At various time points, we built networks corresponding to prior knowledge for use as a training set, and then predicted on a test set comprised of indications that were subsequently described. This framework showed a marked reduction in performance, peaking in performance metrics with the 1985 network at an AUROC of .797. Examining performance reductions due to removal of specific types of relationships highlighted the importance of drug-drug and disease-disease similarity metrics. Using data from future timepoints, we demonstrate that further acquisition of these kinds of data may help improve computational results.

Conclusions: Evaluating a repositioning algorithm using indications unknown to input network better tunes its ability to find emerging drug indications, rather than finding those which have been randomly withheld. Focusing efforts on improving algorithmic performance in a time-resolved paradigm may further improve computational repositioning predictions.

Keywords: Compound repositioning; Drug central; Heterogeneous network; Machine learning; Semantic Medline database; Semantic network; Unified medical language system.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Data Mining*
  • Disease
  • Drug Repositioning*
  • Humans
  • Knowledge Bases*
  • Machine Learning
  • Reproducibility of Results
  • Time Factors