Prediction and evaluation of combination pharmacotherapy using natural language processing, machine learning and patient electronic health records

J Biomed Inform. 2022 Sep:133:104164. doi: 10.1016/j.jbi.2022.104164. Epub 2022 Aug 17.

Abstract

Combination pharmacotherapy targets key disease pathways in a synergistic or additive manner and has high potential in treating complex diseases. Computational methods have been developed to identifying combination pharmacotherapy by analyzing large amounts of biomedical data. Existing computational approaches are often underpowered due to their reliance on our limited understanding of disease mechanisms. On the other hand, observable phenotypic inter-relationships among thousands of diseases often reflect their underlying shared genetic and molecular underpinnings, therefore can offer unique opportunities to design computational models to discover novel combinational therapies by automatically transferring knowledge among phenotypically related diseases. We developed a novel phenome-driven drug discovery system, named TuSDC, which leverages knowledge of existing drug combinations, disease comorbidities, and disease treatments of thousands of disease and drug entities extracted from over 31.5 million biomedical research articles using natural language processing techniques. TuSDC predicts combination pharmacotherapy by extracting representations of diseases and drugs using tensor factorization approaches. In external validation, TuSDC achieved an average precision of 0.77 for top ranked candidates, outperforming a state of art mechanism-based method for discovering drug combinations in treating hypertension. We evaluated top ranked anti-hypertension drug combinations using electronic health records of 84.7 million unique patients and showed that a novel drug combination hydrochlorothiazide-digoxin was associated with significantly lower hazards of subsequent hypertension as compared to the monotherapy hydrochlorothiazide alone (HR: 0.769, 95% CI [0.732, 0.807]) and digoxin alone (0.857, 95% CI [0.785, 0.936]). Data-driven informatics analyses reveal that the renin-angiotensin system is involved in the synergistical interactions of hydrochlorothiazide and digoxin on regulating hypertension. The prediction model's code with PyTorch version 1.5 is available at http://nlp.case.edu/public/data/TuSDC/.

Keywords: Combination pharmacotherapy; Hypertension; Retrospective Cohort study; Tensor factorization.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural

MeSH terms

  • Digoxin
  • Drug Combinations
  • Electronic Health Records
  • Humans
  • Hydrochlorothiazide
  • Hypertension* / drug therapy
  • Machine Learning
  • Natural Language Processing*
  • Phenotype

Substances

  • Drug Combinations
  • Hydrochlorothiazide
  • Digoxin