Non-Negative Matrix Factorization for Drug Repositioning: Experiments with the repoDB Dataset

AMIA Annu Symp Proc. 2020 Mar 4;2019:238-247. eCollection 2019.


Computational methods for drug repositioning are gaining mainstream attention with the availability of experimental gene expression datasets and manually curated relational information in knowledge bases. When building repurpos-ing tools, a fundamental limitation is the lack of gold standard datasets that contain realistic true negative examples of drug-disease pairs that were shown to be non-indications. To address this gap, the repoDB dataset was created in 2017 as a first of its kind realistic resource to benchmark drug repositioning methods - its positive examples are drawn from FDA approved indications and negatives examples are derivedfrom failed clinical trials. In this paper, we present the first effort for repositioning that directly tests against repoDB instances. By using hand-curated drug-disease indications from the UMLS Metathesaurus and automatically extracted relations from the SemMedDB database, we employ non-negative matrix factorization (NMF) methods to recover repoDB positive indications. Among recoverable approved indications, our NMF methods achieve 96% recall with 80% precision providing further evidence that hand-curated knowledge and matrix completion methods can be exploited for hypothesis generation.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, N.I.H., Intramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Databases, Factual*
  • Datasets as Topic*
  • Drug Repositioning*
  • Knowledge Bases
  • Mathematical Concepts
  • Unified Medical Language System