Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study

Pharmacogenomic agreement between two cancer cell line data sets

Cancer Cell Line Encyclopedia Consortium et al. Nature. .

Abstract

Large cancer cell line collections broadly capture the genomic diversity of human cancers and provide valuable insight into anti-cancer drug response. Here we show substantial agreement and biological consilience between drug sensitivity measurements and their associated genomic predictors from two publicly available large-scale pharmacogenomics resources: The Cancer Cell Line Encyclopedia and the Genomics of Drug Sensitivity in Cancer databases.

PubMed Disclaimer

Figures

Extended data Figure 1
Extended data Figure 1. Comparison of pharmacologic data from the CCLE and GDSC studies.
Scatter plots (blue dots) represent the drug sensitivity measured as the Area Under the dose-response Curve (a) and IC50 (b) in overlapping cell lines between CCLE and GDSC studies. For this analysis, IC50 values for insensitive compounds were set to the highest concentration tested in both datasets. The number of overlapping cell lines n for each drug is indicated, as well as the Pearson correlation coefficient R and p-value. In this representation, lower values denote insensitive cell lines. The full distribution of sensitivity values for each drug and study is depicted as ‘violin plots’ (green: CCLE; purple: GDSC) and accounts for all tested cell lines, as opposed to the overlapping set; the grey dot represents the median, thick black line represents the first to third quartile range, and shape of the plot represents the kernel density of the distribution.
Extended Data Figure 2
Extended Data Figure 2. Power Analysis of Spearman and Pearson correlation tests.
a, Example of a clear signal that appears in only 2% (20 out of 1000) data points using synthetic data. The Spearman statistic completely fails to detect such a signal which is typical for selective cancer therapeutics. b-c, Expected Spearman and Pearson correlation coefficients between the two datasets assuming different percentages of drug sensitive cell lines (alpha=2%,5%,10%, and 50%) and different number of overlapping cell lines. The error bars depict +/- one standard deviation. d-e, Estimated statistical power for Spearman and Pearson correlation tests using a P-Value cutoff of 0.05 for rejecting the null hypothesis. This analysis was done using synthetic data as described in the Methods section.
Extended Data Figure 3
Extended Data Figure 3. Waterfall analysis for categorization of cell lines.
a, Schematic of the waterfall analysis methodology and example of outcome for PLX4720. b, Consistency in cell line sensitivity categorization for all drugs. The waterfall method using all data available was used to determine thresholds between “sensitive” and “resistant” cell lines (Blue). Alternatively a 1 uM threshold was used (Green).
Extended Data Figure 4
Extended Data Figure 4. Overlap in ANOVA genomic correlates of drug sensitivity.
Volcano plots showing analysis of variance (ANOVA) outcomes using drug responses from CCLE (left panels: a, c) or GDSC (right panels, b, d) dataset from overlapping set of cell lines, and mutational status of 71 cancer genes from the GDSC. a-b, Analyses using AUC values. c-d, Analyses using IC50 values. Points represent drug–gene interactions (with sizes proportional to the number of screened mutant cell lines). Positions on x-axis indicate effect size magnitudes: negative values (green circle) indicate mutations associated with increase in sensitivity, positive values (red circle) mutations associated with increased resistance. Positions on y-axis indicate association significances (corrected p-values) and the horizontal dashed line indicates a significance threshold (FDR 20%). Corresponding drug name, target(s) and cancer gene are reported for a subset of therapeutically relevant interactions.
Extended Data Figure 5
Extended Data Figure 5. Consistency of drug sensitivity/tissue-of-origin associations between the CCLE and GDSC datasets.
Each point is a tested association between drug response and a given cell lines’ tissue of origin. Positions of the points on the two axes correspond to 'signed log q-values' of the corresponding tests, for the two datasets respectively. Point labels indicate drug names and targets (in italic) and tested tissue (among round brackets). The sign indicates the effect of the marker (neg = increased sensitivity and pos = increased resistance) and the magnitude indicates the log p-value of the corresponding t-test, after correcting for multiple hypothesis testing. Fisher exact test p-values for independence of columns and rows of the contingency table determined by sign and significance of the associations are also reported (over all the tests and for significant associations only, respectively).
Extended Data Figure 6
Extended Data Figure 6. Comparison of genomic features selected by Elastic Net between the CCLE and GDSC datasets.
a, consistency in predictors of response identified by elastic net regression across 21,013 genome features (copy number variations, mRNA expression and sequence variants). Statistical significance of the number of genomic features identified in common (χ2 test) using the GDSC and CCLE drug sensitivity datasets. Only drugs where features were found in both studies are represented. b, corresponding contingency tables. Out of the 4,957 drug/gene associations with nonzero elastic net weight coefficients, only one divergent result was found (weight coefficient with opposite signs) corresponding to a feature with the lowest possible frequency (nonzero coefficient in 1 out of 100 bootstrap trials in the elastic net analysis).
Extended Data Figure 7
Extended Data Figure 7. Comparison of genomic feature-drug associations in the CCLE and GDSC datasets.
Ridge regression coefficients for all the drugs with successful elastic net regression in the indicated dataset are plotted using either, a, overlapping or b, all available cell lines. To select cell line features, elastic net was performed using the indicated dataset. Then, ridge regression was performed on each dataset using the selected features. For plotting, the weights associated with the features were multiplied by the standard deviation of the features as in Garnett et al., and then standardized per drug. Color scale indicates the number of times a feature is selected in 100 independent runs of the elastic net. Green and red coloring indicate features associated respectively with sensitivity or resistance.
Extended Data Figure 8
Extended Data Figure 8. Agreement in genomic predictors of drug response identified by elastic net regression in the GDSC and CCLE studies.
EN selection of genomic features was performed on the indicated dataset and their effects were computed using a non-selective regression (ridge). Total number of features selected by EN is reported above the bars. Number of cell lines used in the regression is in parentheses on the x axis. Consistency is reported as the proportion of features with the overall same direction of effect (association with sensitivity or resistance): proportion of features with same sign, using either the cosine correlation that takes into account the sign associated with the features or the Pearson’s correlation that does not.
Extended Data Figure 9
Extended Data Figure 9. Gene expression correlates of drug response identified in Haibe-Kains et al. have better agreement when using more stringent FDR Cutoffs.
a, Scatter plots of the IC50 based gene-drug association statistic (column “stat” in Haibe-Kains et al. suppl. Data sets 2 and 3 and Figure S6) with FDR between 0 and 0.01 (purple), 0.01 and 0.05 (cyan), 0.05 and 0.2 (green). In each panel the two black lines intersect at the origin and define the agreement quadrants (top right and bottom left quadrants). b, Proportion of genes in the agreement quadrants (same sign between the two studies). c, Additional measures of agreement between the two studies: Agreement measures increase with more stringent FDR cutoff, suggesting that false discovery drives agreement down. Uncentered measures (cosine correlation, uncentered covariance, agreement quadrant proportion) yield better agreement between the studies (see Supplementary Text for details).
Extended Data Figure 10
Extended Data Figure 10. Example of significant change in observed correlation by addition of few sensitive cell lines:
For Lapatinib sensitivity data, there are 86 overlapping cell lines between CCLE and GDSC datasets. Left panel is an excerpt from Haibe-Kains et al. Figure 2 comparing the sensitivity data to Lapatinib for the two datasets. Right panel shows the two sensitive cell lines (BT-474 and NCI-H1648) that were missed in Haibe-Kains et al. analysis. The inclusion of these two cell lines drastically changes the observed Pearson correlation (from 0.25 to 0.53). This is consistent with the simulation results (Extended Data Figure 4B) that show high variability in the observed Pearson correlation for low sample numbers.
Figure 1
Figure 1. Comparison of pharmacologic data from the CCLE and GDSC studies.
a, Overlap of datasets. b-c, Comparison of drug sensitivity (AUC) measured in (n) overlapping cell lines between the studies for drugs with good (b) or poor (c) correlation. R: Pearson correlation coefficient p: p-value. Violin plots: distribution of sensitivity values for all lines in each study. Grey dot: median, black line: interquartile range, shape: kernel density of the distribution. d-e, Correlation coefficients between GDSC and CCLE datasets, x-axis: Spearman, Haibe-Kains et al, y-axis: Pearson, present analysis. Dot sizes are proportional to the number of overlapping cell lines. Dots above the dashed y=x line denote an improved correlation compared to Haibe-Kains et al. f. Comparisons of Cohen’s Kappa coefficient testing studies’ agreement in Haibe-Kains et al. (x axis) and the present study (y axis) for sensitivity/resistance calling using a waterfall plot analysis.
Figure 2
Figure 2. Consistency of drug sensitivity prediction markers between the CCLE and GDSC datasets.
a, ANOVA on overlapping dataset (1-AUC). Coordinates: 'signed log q-values'. Negative sign: gene associated with increased sensitivity, positive: increased resistance. Distance from 0: q-value. Fisher ET: Fisher exact test of consistency of marker behavior on all or only significant associations. Markers in grey are not significant; markers highlighted are significant in both the studies. b-d, Elastic net and ridge regression analysis. b, Analytical strategy. c, Proportion of genomic features with consistent effect on drug response in both studies (total number of features tested displayed above the bar and number cell lines indicated in parentheses). d, Ridge regression using predictors selected by elastic net. Contrast: frequency of selection in 100 independent elastic net runs. Green and red: association with sensitivity or resistance respectively.

Similar articles

  • Evaluating the consistency of large-scale pharmacogenomic studies.
    Rahman R, Dhruba SR, Matlock K, De-Niz C, Ghosh S, Pal R. Rahman R, et al. Brief Bioinform. 2019 Sep 27;20(5):1734-1753. doi: 10.1093/bib/bby046. Brief Bioinform. 2019. PMID: 31846027 Free PMC article. Review.
  • The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.
    Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehár J, Kryukov GV, Sonkin D, Reddy A, Liu M, Murray L, Berger MF, Monahan JE, Morais P, Meltzer J, Korejwa A, Jané-Valbuena J, Mapa FA, Thibault J, Bric-Furlong E, Raman P, Shipway A, Engels IH, Cheng J, Yu GK, Yu J, Aspesi P Jr, de Silva M, Jagtap K, Jones MD, Wang L, Hatton C, Palescandolo E, Gupta S, Mahan S, Sougnez C, Onofrio RC, Liefeld T, MacConaill L, Winckler W, Reich M, Li N, Mesirov JP, Gabriel SB, Getz G, Ardlie K, Chan V, Myer VE, Weber BL, Porter J, Warmuth M, Finan P, Harris JL, Meyerson M, Golub TR, Morrissey MP, Sellers WR, Schlegel R, Garraway LA. Barretina J, et al. Nature. 2012 Mar 28;483(7391):603-7. doi: 10.1038/nature11003. Nature. 2012. PMID: 22460905 Free PMC article.
  • Integrating heterogeneous drug sensitivity data from cancer pharmacogenomic studies.
    Pozdeyev N, Yoo M, Mackie R, Schweppe RE, Tan AC, Haugen BR. Pozdeyev N, et al. Oncotarget. 2016 Aug 9;7(32):51619-51625. doi: 10.18632/oncotarget.10010. Oncotarget. 2016. PMID: 27322211 Free PMC article.
  • Inconsistency in large pharmacogenomic studies.
    Haibe-Kains B, El-Hachem N, Birkbak NJ, Jin AC, Beck AH, Aerts HJ, Quackenbush J. Haibe-Kains B, et al. Nature. 2013 Dec 19;504(7480):389-93. doi: 10.1038/nature12831. Epub 2013 Nov 27. Nature. 2013. PMID: 24284626 Free PMC article.
  • Cancer pharmacogenomics: implications on ethnic diversity and drug response.
    Patel JN. Patel JN. Pharmacogenet Genomics. 2015 May;25(5):223-30. doi: 10.1097/FPC.0000000000000134. Pharmacogenet Genomics. 2015. PMID: 25751395 Review.

Cited by

References

    1. Sharma SV, Haber DA, Settleman J. Cell line-based platforms to evaluate the therapeutic efficacy of candidate anticancer agents. Nat Rev Cancer. 2010;10:241–253. doi: 10.1038/nrc2820. nrc2820 [pii] - DOI - PubMed
    1. Neve RM, et al. A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell. 2006;10:515–527. doi: 10.1016/j.ccr.2006.10.008. S1535-6108(06)00314-X [pii] - DOI - PMC - PubMed
    1. Caponigro G, Sellers WR. Advances in the preclinical testing of cancer therapeutic hypotheses. Nat Rev Drug Discov. 2011;10:179–187. doi: 10.1038/nrd3385. nrd3385 [pii] - DOI - PubMed
    1. Garraway LA, et al. Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in malignant melanoma. Nature. 2005;436:117–122. - PubMed
    1. Solit DB, et al. BRAF mutation predicts sensitivity to MEK inhibition. Nature. 2006;439:358–362. - PMC - PubMed

Publication types