Understanding the downstream consequences of pharmacologically targeted proteins is essential to drug design. Current approaches investigate molecular effects under tissue-naïve assumptions. Many target proteins, however, have tissue-specific expression. A systematic study connecting drugs to target pathways in in vivo human tissues is needed. We introduced a data-driven method that integrates drug-target relationships with gene expression, protein-protein interaction, and pathway annotation data. We applied our method to four independent genomewide expression datasets and built 467,396 connections between 1,034 drugs and 954 pathways in 259 human tissues or cell lines. We validated our results using data from L1000 and Pharmacogenomics Knowledgebase (PharmGKB), and observed high precision and recall. We predicted and tested anticoagulant effects of 22 compounds experimentally that were previously unknown, and used clinical data to validate these effects retrospectively. Our systematic study provides a better understanding of the cellular response to drugs and can be applied to many research topics in systems pharmacology.
© 2018 The Authors CPT: Pharmacometrics & Systems Pharmacology published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.
Tissue‐specificity of distinct target classes in four datasets. (
a) Pie chart showing the proportion of 8 protein classes among all 5,016 drug ∼ target proteins pairs from DrugBank. ( b–e) Boxplot showing the tissue‐specificity of distinct target classes in four datasets. The tissue‐specificity of a target protein is defined as the proportion of tissues in which the target is highly expressed when compared to the median of all the genes. To account for the variation in the absolute expression of different genes, the expression of each gene is normalized by the baseline level. Each box on the Y‐axis represents one target class. The X‐axis shows the tissue‐specificity of proteins belonging to the target class. “All_drug_targets” represents the combination of all the target classes. “All_genes” represents all the genes in the human genome.
Workflow of Drugs to target pAthways by the Tissue Expression (DATE). A drug is first mapped to its target proteins using DrugBank. Then, tissue expression data are used to find the target protein that is highly expressed in each tissue. Next, two different processes were followed depending on whether the target protein is a G‐protein coupled receptor (GPCR) or not (GPCRs do not participate in the cellular activities directly, they pass the signals down to transducers). GPCRs are connected to downstream pathways using our previously developed method predicting GPCR downstream signaling pathways using the tissue expression (GOTE). In GOTE, the target GPCR is first mapped to the highly expressed transducers (G‐proteins or β‐arrestins) in the tissue. Then, for each transducer (G‐protein or β‐arrestin), a list of tissue‐specific binding proteins is obtained by combining BioGRID protein‐protein interaction data with the tissue expression of binding proteins. Pathway enrichment analysis is then performed based on the tissue‐specific binding proteins of each transducer using Fisher's exact test. For each pathway, the Z‐scores of all G‐proteins (or β‐arrestins) are combined into a single Z‐score using Stouffer's Method. Eventually, pathways with significant Z‐scores are connected to the drug in the tissue as G‐protein dependent pathways (GDPs; those that are associated with G‐proteins) or G‐protein independent pathways (GIPs; those that are associated with β‐arrestins). Non‐GPCRs are first connected to the annotated pathways. Then an expression Z‐score will be calculated for each annotated pathway to determine whether the pathway is highly expressed in the tissue, and the pathways with significant Z‐scores are connected to the drug in the tissue as non‐GPCR target pathways (NGPs).
Visualization of drug‐pathway‐tissue connections built from expression datasets derived from normal human tissues. (
a–d) Consistency of results in the three datasets derived from normal human tissues: U133A (microarray), HPM_PRT (mass spectrometry), and GTEx (sequencing). In a and b, the Venn diagrams show the number of drug‐pathway a or drug‐pathway‐tissue b connections overlapped among three datasets. In c and d, pairwise comparison was performed among three datasets. The bar plot shows the average Jaccard similarity (X‐axis) of pathways c or pathway‐tissues d connected to each drug. “Random” represents null distribution generated by randomly assigning pathways c or pathway‐tissues d to each drug. The error bar indicates 95% confidence interval of average calculated by bootstrap. ( e) Heatmap showing the tissue‐specificity of distinct Anatomical Therapeutic Chemical (ATC) classification system drug classes (in U133A dataset). Each column represents an ATC drug class, whereas each row represents a tissue. Each cell is colored in purple or white depending on whether drugs in the ATC class are connected to this tissue or not. The scale of purple is proportional to the tissue‐specificity score. ( f,g) Heatmap showing the enrichment of pathway categories by drug class, either defined by ATC code f or the class of target proteins g. Each column represents a drug class, whereas each row represents a Reactome pathway category. Each cell is colored from white to purple, which is proportional to the percentage of drug‐pathway connections (in the HPM_PRT dataset) that belong to the corresponding drug class and pathway category. An asterisk “*” in a cell indicates the pathway category is significantly enriched in the drug class by Fisher's exact test (false discovery rate <0.01).
Correlation between drug identities and target pathways. (
a–h) Line graphs showing the similarity of target proteins (pink line) or target pathways (green line) increases as the similarity of drug identities: chemical structure a–d or indication e–h increases. Pairwise similarity of chemical structure or indication was calculated among all drugs and grouped into 10 (for chemical structure) or 6 (for indication) bins on the X‐axis. The Y‐axis shows the average target similarity (log conversion performed in a–d) of all the drug pairs in each bin. The error bar indicates 95% confidence interval of average calculated by bootstrap. On the X‐axis, “*” indicates that the drug pairs in the bin have higher similarity of target pathways compared with target proteins ( P < 0.05). ( i–l) The receiver operating characteristic (ROC) curves showing the performance of trained classifiers using target proteins or pathways as features to predict four common adverse events caused by modern drugs: gastrointestinal bleeding i, acute kidney failure j, acute liver failure k, and myocardial infarction l. In each plot, the ROC curves of five classifiers are shown along with their area under the ROC curve (AUROC) values: four classifiers using the target pathways of drugs derived from four datasets as features (U133A: red; NCI60: green; HPM_PRT: yellow; and GTEX: blue), and one classifier using the target proteins of drugs as features (gray).
Validation of drug‐pathway‐tissue connections built by Drugs to target pAthways by the Tissue Expression (DATE). (
a,b) Validation of drug‐pathway‐tissue connections (NCI60) using a reference standard created from L1000 drug‐induced expression data. A positive standard of drug‐pathway‐tissue connection was defined as a significant change in pathway expression after drug treatment in the tissue. The Bar plot shows the average precision a and recall b of validated drugs. Precision = TP/(TP+FP), Recall = TP/(TP+FN). “Random” represents null distribution generated by randomly assigning pathway‐tissue to each drug. The error bar indicates 95% confidence interval of average calculated by bootstrap. ( c,d) Validation of drug‐pathway connections using a reference standard from Pharmacogenomics Knowledgebase (PharmGKB), which provides mapping between drugs and pharmacodynamic and pharmacokinetic pathways. The barplot shows the average precision c and recall d of validated drugs. “Combined” represents all drug‐pathway connections from four datasets. “Recurrent” represents drug‐pathway connections that appear in at least two datasets. “Random” represents null distribution generated by randomly assigning pathway‐tissue to each drug. The error bar indicates 95% confidence interval of average calculated by bootstrap.
Experimental validation of drugs predicted with anticoagulation activity. (
a) Boxplot with jitter showing the coagulation activity of 6 groups: (1) positive control: argatroban; (2) Reactome: 26 predicted drugs connected to Reactome pathways; (3) Pharmacogenomics Knowledgebase (PharmGKB): 41 predicted drugs connected to PharmGKB pathways; (4) tissue‐naïve: 50 drugs that can be predicted by tissue‐naïve methods, but not DATE; (5) others: the other unpredicted 325 drugs; and (6) Negative control: DMSO. The Y‐axis shows the coagulation activity of drugs represented by “maximum ratio” score. A red dashed line was drawn at 0.775 on Y‐axis, representing the threshold of significant anticoagulation activity. The proportion of compounds with significant anticoagulation activity (maximum ratio <0.775) in each group was shown in red numbers on X‐axis. ( b) Boxplot with jitter showing the coagulation activity of three groups: (1) positive control: argatroban; (2) 22 newly predicted compounds that have not been screened on the coagulation activity in a; and (3) negative control: DMSO. ( c) A table of predicted drugs with significant anticoagulation activity in a and b. Only the top five drugs in each group (PharmGKB, Reactome, and 22 compounds) were shown here. Full results can be found in Supplementary Table S10. P, P value; RR, reporting ratio; NA, compound was not studied in SIDER or OFFSIDES. Because only OFFSIDES provides a P value for each pair of drug and side effect, whereas SIDER only provides the mapping between them, some compounds will have blank profiles in the last two column if they were only reported in SIDER.
PharmGKB: the Pharmacogenomics Knowledge Base.
Methods Mol Biol. 2013;1015:311-20. doi: 10.1007/978-1-62703-435-7_20.
Methods Mol Biol. 2013.
23824865 Free PMC article.
Pharmacogenomics, regulation and signaling pathways of phase I and II drug metabolizing enzymes.
Curr Drug Metab. 2002 Oct;3(5):481-90. doi: 10.2174/1389200023337171.
Curr Drug Metab. 2002.
Pharmacogenomics knowledge for personalized medicine.
Clin Pharmacol Ther. 2012 Oct;92(4):414-7. doi: 10.1038/clpt.2012.96.
Clin Pharmacol Ther. 2012.
22992668 Free PMC article.
Pharmacogenomics and bioinformatics: PharmGKB.
Pharmacogenomics. 2010 Apr;11(4):501-5. doi: 10.2217/pgs.10.15.
20350130 Free PMC article.
Informatics and Computational Methods in Natural Product Drug Discovery: A Review and Perspectives.
Front Genet. 2019 Apr 30;10:368. doi: 10.3389/fgene.2019.00368. eCollection 2019.
Front Genet. 2019.
31114606 Free PMC article.
Hopkins, A.L. & Groom, C.R. The druggable genome. Nat. Rev. Drug Discov 1, 727–730 (2002).
Golan, D.E., Tashjian, A.H. & Armstrong, E.J. Principles of Pharmacology: The Pathophysiologic Basis of Drug Therapy. (Lippincott Williams & Wilkins, New York, NY, 2011).
Digby, G.J., Lober, R.M., Sethi, P.R. & Lambert, N.A. Some G protein heterotrimers physically dissociate in living cells. Proc. Natl. Acad. Sci. USA 103, 17789–17794 (2006).
Métayé T., Gibelin H., Perdrisot R. & Kraimps J.L. Pathophysiological roles of G‐protein‐coupled receptor kinases. Cell Signal. 17, 917–928 (2005).
Zeng H., Qiu C. & Cui Q. Drug‐Path: a database for drug‐induced pathways. Database (Oxford) 2015, bav061 (2015).
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Anticoagulants / pharmacology
Drug-Related Side Effects and Adverse Reactions
Pharmacogenetics / methods*
Reproducibility of Results