Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jul 28;166(3):740-754.
doi: 10.1016/j.cell.2016.06.017. Epub 2016 Jul 7.

A Landscape of Pharmacogenomic Interactions in Cancer

Affiliations
Free PMC article

A Landscape of Pharmacogenomic Interactions in Cancer

Francesco Iorio et al. Cell. .
Free PMC article

Abstract

Systematic studies of cancer genomes have provided unprecedented insights into the molecular nature of cancer. Using this information to guide the development and application of therapies in the clinic is challenging. Here, we report how cancer-driven alterations identified in 11,289 tumors from 29 tissues (integrating somatic mutations, copy number alterations, DNA methylation, and gene expression) can be mapped onto 1,001 molecularly annotated human cancer cell lines and correlated with sensitivity to 265 drugs. We find that cell lines faithfully recapitulate oncogenic alterations identified in tumors, find that many of these associate with drug sensitivity/resistance, and highlight the importance of tissue lineage in mediating drug response. Logic-based modeling uncovers combinations of alterations that sensitize to drugs, while machine learning demonstrates the relative importance of different data types in predicting drug response. Our analysis and datasets are rich resources to link genotypes with cellular phenotypes and to identify therapeutic options for selected cancer sub-populations.

Figures

None
Figure 1
Figure 1
Overview of Data and Analyses (A) Publicly available genomic data for a large cohort of primary tumors were analyzed to identify clinically relevant features called cancer functional events. (B) A panel of 1,001 genomically characterized human cancer cell lines. (C) The catalog of CFEs from patient tumors was used to filter the set of molecular alterations identified in cell lines and subsequently was used for pharmacogenomic modeling. (D) Cancer cell lines were screened for differential sensitivity against 265 anti-cancer compounds. (E) The resultant datasets were used for pharmacogenomic modeling. See also Figure S1 and Table S1.
Figure 2
Figure 2
Representation of Cancer Functional Events in Cancer Cell Lines (A) First bar chart: the percentage coverage of cancer functional events (CFEs) in the pan-cancer dataset occurring in at least one cell line. Coverage for each class of CFEs individually and when combined is shown. Second bar chart: the median coverage by cancer type of frequently occurring (>5% of tumor samples) cancer-specific CFEs in at least one cell line. The solid line indicates coverage of CFEs occurring in >2 cell lines. Third bar chart: coverage in each cancer type of frequently occurring cancer genes (CGs). Missing cancer genes are grouped by the level of evidence supporting their classification as a cancer gene. The number of cell lines for each cancer-type and the full name of each cancer-type and associated acronym are shown. (B) Matrix of Pearson correlations of CFE frequency between cell lines and patient tumors for each cancer-type and class of CFEs. Box and whisker plots show the correlations of CFEs within the same (on-diagonal) and between different (off-diagonal) cancer-types. See also Figure S2, Table S2, and Data S1.
Figure 3
Figure 3
Comparative Analysis of Pathway Alterations and Global CFE Signatures in Cell Lines and Tumors (A) Concordance of CFEs in cancer-associated pathways between cell lines and tumors. (B) Enrichments of the dominant CFE type across four global classes. (C) Classification of primary tumors and cell lines from each cancer type into global classes based on CFEs. Segment lengths are the percentage of samples (cell lines or primary tumors) falling within each global class. For primary tumors, results are compared to published classifications (Ciriello et al., 2013) (top diagram), and for cell lines, the comparison is with primary tumors from the same cancer type (bottom diagram). The classification of concordance is based on the identity of the predominant class of CFEs. See also Figure S3, Table S3, and Data S1.
Figure 4
Figure 4
Pharmacogenomic Modeling of Drug Sensitivity (A) Pan-cancer and cancer-specific ANOVA analyses for statistically significant interactions between differential drug sensitivity and CFEs. Cancer-specific interactions are divided into those identified in a single or multiple cancer-specific analyses. (B) A summary of established pharmacogenomic interactions detected in this analysis including a subset of clinically approved markers. The total number of significant and significant large-effect interactions for each cancer type is provided. Testable interactions that were validated on the CTRP datasets are also indicated. (C) Volcano plot with effect size (x axis) and significance (y axis) of large-effect cancer-specific pharmacogenomic interactions. Each circle corresponds to a significant CFE-drug interaction. Circle size is proportional to the number of altered cell lines, and the color indicates cancer type. A subset of interactions is labeled with drug name, target (italics), and name of the associated CFE (bold). (D) Examples of cancer-specific pharmacogenomic interactions identified by our systematic ANOVA. Each circle represents the IC50 of an individual cell line. The co-incident resistance-associated EGFR T790M mutation is labeled. See also Figure S4 and Table S4.
Figure 5
Figure 5
Logic Models of CFEs Explain Drug Sensitivity (A) The number of predictive LOBICO models from the pan-cancer and cancer-specific analyses. The number of cell lines for each cancer type is given in brackets. (B) Optimal model complexity for each of the predictive logic models. (C) Strong AND/OR model combinations involving clinically approved drugs from the pan-cancer and cancer-specific analyses. Each arrow goes from the precision (x axis) and recall (y axis) of the single-predictor model to that of the logic combination. The arrow color reflects cancer type, and drug names and nominal targets (italics) are shown. (D) Distribution of IC50 values of all cell lines (gray) in response to Trametinib with respect to the KRAS mutant single-predictor model (red line) and the KRAS OR BRAF mutant combination (blue line). The dashed line is the IC50 threshold used to classify cell lines as sensitive and resistant. The inset table shows the number of cell lines classified as sensitive or resistant for each model and the associated precision (pr.) and recall (re.). (E) HNSC cell lines response to Afatinib with respect to EGFR amplification and the combination of EGFR amplification OR a SMAD4 mutation. (F) BRCA cell lines response to Lapatinib with respect to lack of the FAT1/IRF2 deletion and the logical TP53 mutant AND lack of the FAT1/IRF2 deletion combination. See also Figure S5, Table S5, and Data S1.
Figure 6
Figure 6
Predictive Ability of Combinations of Molecular Data Types (A) Predictive performances of individual pan-cancer pharmacogenomic models using elastic net modeling and the indicated single data types. Selected outlier predictive models are labeled. (B) The number of molecular data types included in the best-performing models (lead models) across the pan-cancer and cancer-specific analyses. The best-performing models use combinations of multiple data types. Absolute counts of best performing models are given. (C) Absolute counts of lead models from the pan-cancer and cancer-specific analyses and the number of molecular data types used in the models. (D) A heat map of the percentage of lead models identified in the pan-cancer and cancer-specific analyses incorporating different combinations of molecular data types. (E) Absolute count of lead models identified in pan-cancer and cancer-specific analyses incorporating different combinations of molecular data types. Data types are ordered from most (top) to least (bottom) predictive in the cancer-specific analysis. See also Figure S6 and Table S6.
Figure 7
Figure 7
A Precision Medicine Landscape (A) Percentages of primary tumor samples for each cancer type harboring a sensitivity marker to a given compound and the accumulate percentage of patients for all compounds. (B) Percentages of primary tumors whose genomic features satisfy the logic model for sensitivity for a given drug. Corresponding logic circuits are shown to the right of the bars. See also Table S7.
Figure S1
Figure S1
Screened Compound Duplicates, Related to Figure 1 Histograms, scatter plots and Pearson correlation scores between IC50 profiles for 7 compounds screened in biological duplicates. In all cases replicate data were generated at least one year apart. Superimposed to each scatter plot is a contingency table (and a corresponding Fisher exact test p-value) showing consistency of sensitive (IC50 ≤ maximal tested concentration) and resistant (IC50 > maximal tested concentration) cell lines across replicates.
Figure S2
Figure S2
Cancer Functional Events on Cancer Cell Lines, Related to Figure 2 (A) Status of 1,273 Cancer Functional Events (CFEs) identified from primary tumor data in 1,001 cancer cell lines. Each column is a cell line, colors at the top indicate different cancer types, and each row is a CFE. The heatmap is horizontally divided in three parts with (i) high confidence cancer driver genes; (ii) focal recurrently aberrant copy number segments and (iii) informative CpG islands. A white space denotes absence of the functional events, whereas presence is indicated using the color schemes in the adjacent legends. (B) Number of cancer-specific CFEs occurring in at least one cell line from the corresponding tissue, across the three molecular data types. Box plots on the right show the frequency of the missing CFEs in the primary tumors for each cancer type. Percentages of missing cancer genes for each cancer types are grouped based on their confidence (i.e., A = more than two signals of positive selection, B = two signals of positive selection, C = one signal of positive selection). (C) Example of CFE frequency scatter plot for COAD/READ. Each circle is a CFE whose occurrence frequency across cell lines and primary tumors is given by its coordinates, respectively on the x- and y axis. Different CFE types are indicated by color and corresponding correlation scores are reported in the inset. (D) Nearest neighbor analysis for similarities among cell lines and primary tumors based on frequency profiles accounting for all the CFEs. The proximity of two points is proportional to the correlation across the two corresponding CFE frequency profiles. A line connects a point to its closest neighbor (indicated by the small black dot). (E) Performance of a k-nearest-neighbor classifier based on a comprehensive correlation distance between cell lines and primary tumors, accounting for all the CFEs.
Figure S3
Figure S3
Enrichment of Cancer Functional Events in Global Signatures, Related to Figure 3 Enrichment analysis for global signatures of cancer functional events (CFEs) across different molecular data types identifies 4 classes of CFEs and cell-line/primary-tumor samples (on different rows). Pie charts on the left indicate the proportions of individually enriched CFE data types within each class (orange color indicates generic RACSs, both amplified and deleted). Bar diagrams on the right indicate, for each class and each CFE data type (on different columns), enrichment results for individual cancer functional events. Selected CFEs are highlighted.
Figure S4
Figure S4
ANOVA Result Summaries, Down-Sampled ANOVA Result Summaries, and ANOVA Validation Using CCLE and CTRP Datasets, Related to Figure 4 (A) The number of statistically significant CFE-drug interactions for each cancer type. (B) Example of ANOVA down-sampling analysis outcomes. Each point is a tested drug-CFE interaction, with position on the x-/y axis indicating significance and effect size, respectively. The vertical line correspond to the significance level p = 0.05. The effect size increment observed in the BRCA specific ANOVA is more evident and less variable than that observed in the down-sampled pan-cancer ANOVA. (C) Effect-size variation for 4 different levels of statistical significance (indicated by the 4 groups of three box-plots) across pan-cancer, down-sampled, and cancer-specific ANOVAs. Each plot refers to a different cancer type (as indicated also by different colors). The effect size increment with respect to the pan-cancer analyses is consistently and significantly greater in the cancer-specific analyses than the down-sampled pan-cancer analyses. The total numbers of significant interactions (and the same value averaged across the sub-sampling simulations) according to the p-value threshold under consideration are reported. (D) Number of significant (dashed lines) and significant large-effect (solid lines) pharmacogenomic interactions identified across 18 cancer-specific ANOVAs (using the whole panel of cell lines) that are retained in simulated down-sampled cancer-specific ANOVAs involving 500, 300, 160 and 60 cell lines. A missing dot means that, for the cancer type under consideration, a cancer-specific analysis is not possible due to reduced sample sizes. (E) Average number of significant large-effect pharmacogenomic interactions identified across 18 cancer-specific ANOVAs (using the whole panel of cell lines) that are identifiable in simulated down-sampled cancer-specific ANOVAs. (F) Number of significant (top plot) and significant large-effect (bottom plot) pan-cancer pharmacogenomic interactions that are identifiable in simulated down-sampled pan-cancer ANOVAs. (G) Proportions of cancer functional event (CFE) types involved in significant pharmacogenomic interactions for each cancer type. (H) Percentage of drugs involved in at least one significant CFE-drug interaction (pan-cancer or cancer-specific) across drugs classified into cancer associated pathways and processes. (I) Pathway-centric overview of the identified pharmacogenomic interactions. Cells are color-coded according to corresponding –log10 p-values. Compounds are identified by the nominal therapeutic target. (J) ANOVA results on overlapping GDSC-CCLE datasets. Each circle represents a drug-CFE association. The y axis is the signed log10 p-values of the identified interactions on the CCLE and the x axis that on the GDSC. Markers highlighted in red or green are significant in both studies. FET: Fisher exact test of consistency of marker behavior on all or only significant associations. A subset of associations is labeled with cancer-type, drug name, drug target (italics) and associated CFE (bold text). (K) ANOVA results on overlapping GDSC-CTRP datasets.
Figure S5
Figure S5
LOBICO Performance and Validation of LOBICO Models on CTRP, Related to Figure 5 (A) Pearson correlation of SPEED pathway activity scores across all cell lines using the original publication cutoffs (left) and our optimized cutoffs (right). (B) Multi-predictor models outperform single predictor models. Scatter plot with the 5-fold cross-validation (CV) error for single predictor models (x axis) and the best (lowest CV error) multi-predictor model (y axis) averaged across 10 repeats for the cancer-specific datasets and 5 repeats for the pan-cancer dataset. Each point represents one of the 390 predictive logic models. The CV errors for the pan-cancer dataset (n = 182) are on the left; the CV errors from the 18 cancer-specific datasets (n = 208) are on the right. (C) CV errors across cancer types and drug classes. Left: Number of drugs for which LOBICO was run, i.e., the drugs with 5 or more sensitive cell lines, number of drugs where a predictive model was inferred, and number of drugs, where the predictive model was a multi-predictor model, for the pan-cancer and each cancer-specific analysis. Center: CV error averaged across all drugs in a drug class (columns) for which LOBICO was run on the pan-cancer or cancer-specific dataset (rows). Grey indicates that no LOBICO models were run for the drugs in a drug class. (D) Feature importance scores across data types: Normalized feature importance (FI) scores for each cancer type grouped into four categories (amplified RACSs; deleted RACSs; mutations in CGs; SPEED pathway activity). These scores were averaged across the drugs for which the LOBICO analysis was performed. (E) t test p-values for LOBICO models on GDSC and CTRP. The scatter plot depicts the −log10 p-values for t tests that quantify the difference between cell lines predicted to be sensitive and resistant according to LOBICO. The x axis depicts p-values for the difference between these two groups based on the IC50s within GDSC. The y axis depicts p-values for the difference between these two groups based on the AUCs within CTRP. Drugs with a p-value lower than 10−7 are annotated. (F) t test p-values on GDSC and CTRP for predictive LOBICO models. The scatter plot depicts the −log10 p-values for t tests that quantify the difference between cell lines predicted to be sensitive and resistant according to LOBICO. The 43 drugs are sorted based on the t test p-value derived from the GDSC IC50s. P-values are considered significant at p < 0.023 (1/43).
Figure S6
Figure S6
Predictive Ability Assessment of Individual Molecular Feature Layers and Layer Combinations, Related to Figure 6 (A) Predictive performance (Pearson correlation of predicted versus observed IC50 values) of tissue label versus other feature layers in pan-cancer analysis with Elastic Net. (B) Percentages of all the predictive models (Rpan-cancer0.2 and Rcancer-specific0.25) across different cancer types and molecular data type. Absolute counts of best performing models are indicated above the bars. (C) Absolute counts of pan-cancer and cancer-specific models separated by number of feature layers. (D) Heatmap split by cancer types and possible feature combination, showing the percentage of all predictive models. (E) Count of all predictive models by data type combination separated in pan-cancer and cancer-specific analysis. (F) Comparison of Random Forests versus Elastic Net performances in the pan-cancer analysis. (G) Deriving pan-cancer threshold of predictive models by fitting a mixed Gaussian distribution across all build models, while assuming that one distribution is informative and the other one is not. A model is considered predictive if the ratio of informative to non-informative is at least 9, resulting in a minimal Pearson correlation of ∼0.2 pan-cancer models achieving high performances due to tissue bias. (H) Deriving cancer-specific threshold in same manner as for pan-cancer, resulting in minimal Pearson correlation of ∼0.25. Negative correlations result from overfitting and too small sample sizes.

Comment in

Similar articles

See all similar articles

Cited by 297 articles

See all "Cited by" articles

References

    1. Babur Ö., Gönen M., Aksoy B.A., Schultz N., Ciriello G., Sander C., Demir E. Systematic identification of cancer driving signaling pathways based on mutual exclusivity of genomic alterations. Genome Biol. 2015;16:45. - PMC - PubMed
    1. Barretina J., Caponigro G., Stransky N., Venkatesan K., Margolin A.A., Kim S., Wilson C.J., Lehár J., Kryukov G.V., Sonkin D. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–607. - PMC - PubMed
    1. Basu A., Bodycombe N.E., Cheah J.H., Price E.V., Liu K., Schaefer G.I., Ebright R.Y., Stewart M.L., Ito D., Wang S. An interactive resource to identify cancer genetic and lineage dependencies targeted by small molecules. Cell. 2013;154:1151–1161. - PMC - PubMed
    1. Cancer Cell Line Encyclopedia Consortium. Genomics of Drug Sensitivity in Cancer Consortium Pharmacogenomic agreement between two cancer cell line data sets. Nature. 2015;528:84–87. - PMC - PubMed
    1. Chapman P.B., Hauschild A., Robert C., Haanen J.B., Ascierto P., Larkin J., Dummer R., Garbe C., Testori A., Maio M., BRIM-3 Study Group Improved survival with vemurafenib in melanoma with BRAF V600E mutation. N. Engl. J. Med. 2011;364:2507–2516. - PMC - PubMed

Publication types

Substances

Feedback