Discovery and explanation of drug-drug interactions via text mining

Pac Symp Biocomput. 2012:410-21.


Drug-drug interactions (DDIs) can occur when two drugs interact with the same gene product. Most available information about gene-drug relationships is contained within the scientific literature, but is dispersed over a large number of publications, with thousands of new publications added each month. In this setting, automated text mining is an attractive solution for identifying gene-drug relationships and aggregating them to predict novel DDIs. In previous work, we have shown that gene-drug interactions can be extracted from Medline abstracts with high fidelity - we extract not only the genes and drugs, but also the type of relationship expressed in individual sentences (e.g. metabolize, inhibit, activate and many others). We normalize these relationships and map them to a standardized ontology. In this work, we hypothesize that we can combine these normalized gene-drug relationships, drawn from a very broad and diverse literature, to infer DDIs. Using a training set of established DDIs, we have trained a random forest classifier to score potential DDIs based on the features of the normalized assertions extracted from the literature that relate two drugs to a gene product. The classifier recognizes the combinations of relationships, drugs and genes that are most associated with the gold standard DDIs, correctly identifying 79.8% of assertions relating interacting drug pairs and 78.9% of assertions relating noninteracting drug pairs. Most significantly, because our text processing method captures the semantics of individual gene-drug relationships, we can construct mechanistic pharmacological explanations for the newly-proposed DDIs. We show how our classifier can be used to explain known DDIs and to uncover new DDIs that have not yet been reported.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Aryl Hydrocarbon Hydroxylases / genetics
  • Aryl Hydrocarbon Hydroxylases / metabolism
  • Computational Biology
  • Cytochrome P-450 CYP2C9
  • Cytochrome P-450 CYP3A / genetics
  • Cytochrome P-450 CYP3A / metabolism
  • Data Mining / methods*
  • Drug Interactions*
  • Humans
  • Knowledge Bases
  • Pharmacogenetics / statistics & numerical data
  • Verapamil / metabolism
  • Warfarin / metabolism


  • Warfarin
  • Verapamil
  • CYP2C9 protein, human
  • Cytochrome P-450 CYP2C9
  • Aryl Hydrocarbon Hydroxylases
  • Cytochrome P-450 CYP3A
  • CYP3A4 protein, human