Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Erratum in


The systematic translation of cancer genomic data into knowledge of tumour biology and therapeutic possibilities remains challenging. Such efforts should be greatly aided by robust preclinical model systems that reflect the genomic diversity of human cancers and for which detailed genetic and pharmacological annotation is available. Here we describe the Cancer Cell Line Encyclopedia (CCLE): a compilation of gene expression, chromosomal copy number and massively parallel sequencing data from 947 human cancer cell lines. When coupled with pharmacological profiles for 24 anticancer drugs across 479 of the cell lines, this collection allowed identification of genetic, lineage, and gene-expression-based predictors of drug sensitivity. In addition to known predictors, we found that plasma cell lineage correlated with sensitivity to IGF1 receptor inhibitors; AHR expression was associated with MEK inhibitor efficacy in NRAS-mutant lines; and SLFN11 expression predicted sensitivity to topoisomerase inhibitors. Together, our results indicate that large, annotated cell-line collections may help to enable preclinical stratification schemata for anticancer agents. The generation of genetic predictions of drug response in the preclinical setting and their incorporation into cancer clinical trial design could speed the emergence of 'personalized' therapeutic regimens.

Conflict of interest statement

Competing financial interests

Multiple authors are employees of Novartis, Inc., as noted in the affiliations. T.R.G., M.M., and L.A.G. are consultants for and equity holders in Foundation Medicine, Inc. M.M. and L.A.G. are consultants for and receive sponsored research from Novartis, Inc.


Figure 1
Figure 1. The Cancer Cell Line Encyclopedia (CCLE)
a. Distribution of cancer types in the CCLE by lineage. b. Comparison of DNA copy-number profiles (GISTIC G-scores) between cell lines and primary tumors. The diagonal of the heatmap shows the Pearson correlation between corresponding sample types. Because cell lines and tumors are separate datasets, the correlation matrix is asymmetric: the top left showing how well the tumor features correlate with the average of the cell lines in a lineage, and the bottom right showing the converse. c. Comparison of mRNA expression profiles between cell lines and primary tumors. For each tumor type, the log-fold-change of the 5,000 most variable genes is calculated between that tumor type and all others. Pearson correlations between tumor type fold-changes from primary tumors and cell lines are shown as a heatmap. d. Comparison of point mutation frequencies between cell lines and primary tumors in COSMIC (v56), restricted to genes that are well represented in both sample sets but excluding TP53 which is highly prevalent in most tumor types. Pairwise Pearson correlations are shown as a heatmap. *The correlations of esophageal, liver, and head and neck cancer mutation frequencies are restored when including TP53.
Figure 2
Figure 2. Predictive modeling of pharmacologic sensitivity using CCLE genomic data
a. Drug responses for Panobinostat (green) and PLX4720 (orange/purple) represented by the high-concentration effect level (Amax) and transitional concentration (EC50) for a sigmoidal fit to the response curve (b). c. Elastic net regression modeling of genomic features that predict sensitivity to PD-0325901. The bottom curve indicates drug response, measured as the area over the dose-response curve (activity area), for each cell line. The central heatmap shows the CCLE features in the model (continuous z-score for expression and copy-number, dark red for discrete mutation calls), across all cell lines (x-axis). Bar plot (left): weight of the top predictive features for sensitivity (bottom) or insensitivity (top). Parenthesis indicate features present in >80% of models after bootstrapping. d. Specificity and sensitivity (ROC curves) of cross-validated categorical models predicting the response to a MEK inhibitor, PD-0325901 (activity area). Mean true positive rate and standard deviation (n=5) are shown when models are built using all lines (“Global categorical model” in blue and orange), or within only melanoma lines (green). e. Activity area values for LBH589 (panobinostat) between cell lines derived from hematopoietic (n=61) and solid tumors (n=387). The middle bar = median, box = inter-quartile range, and bars extend to 1.5x the inter-quartile range. f. Distribution of activity area values for AEW541 relative to IGF1 mRNA expression. Orange dots: multiple myeloma cell lines (n=14); blue dots: cell lines from other tumor types (n=434). Box-and-whisker plots show the activity area or mRNA expression distributions relative to each cell line type (line = median and box = inter-quartile range), with bars extending to 1.5x the inter-quartile range.
Figure 3
Figure 3. AHR expression may denote a tumor dependency targeted by MEK inhibitors in NRAS-mutant cell lines
a. Predictive features for PD-0325901 sensitivity (varying baseline activity area) in validated NRAS-mutant cell lines. b. Growth inhibition curves for NRAS-mutant cell lines expressing high (red) or low (blue) levels of AHR mRNA in the presence of the MEK inhibitor PD-0325901. c. Relative AHR mRNA expression across a panel of NRAS-mutant cell lines (arrows indicate cell lines where AHR dependency was analyzed). d–h. Proliferation of NRAS-mutant cell lines displaying high (d–f) and low (g–h) AHR mRNA expression, after introduction of shRNAs against AHR (red lines) or luciferase (blue lines). i. (left) Proliferation of IPC-298 cells (high AHR) after introduction of additional shRNAs against AHR (shAHR_1 and shAHR_4; green and purple lines, respectively) or luciferase (control shLuc; blue line); (right) corresponding immunoblot analysis of AHR protein. j. Equivalent studies as in (i) with using SK-MEL-2 cells (high AHR). k. Endogenous CYP1A1 mRNA expression in the neuroblastoma line CHP-212 or the melanoma lines IPC-298 and SK-MEL-2 after exposure to vehicle (blue) or MEK inhibitors (PD-0325901, green or PD-98059, purple). Error bars: standard deviation between replicates, with n=12 (b), n=3 (c), n=6 (d–k).
Figure 4
Figure 4. Predicting sensitivity to topoisomerase I inhibitors
a. Elastic net regression analysis of genomic correlates of irinotecan sensitivity is shown for 250 cell lines. b. Dose-response curves for three Ewing’s sarcoma cell lines (MSS-ES-1, SK-ES-1, and TC-71) and two control cell lines with low SLFN11 expression (HCC-56, and SK-HEP-1). Grey vertical bars: standard deviation of the mean growth inhibition (n=2). c. SLFN11 expression across 4103 primary tumors. Box-and-whisker plots show the distribution of mRNA expression for each subtype, ordered by the median SLFN11 expression level (line), the inter-quartile range (box) and up to 1.5x the inter-quartile range (bars). Sample numbers (n) are indicated in parentheses.

Comment in

Similar articles

See all similar articles

Cited by 2,240 PubMed Central articles

See all "Cited by" articles


    1. Caponigro G, Sellers WR. Advances in the preclinical testing of cancer therapeutic hypotheses. Nat Rev Drug Discov. 2011;10:179–187. - PubMed
    1. Macconaill LE, Garraway LA. Clinical implications of the cancer genome. J Clin Oncol. 2010;28:5219–5228. - PMC - PubMed
    1. Lin WM, et al. Modeling genomic diversity and tumor dependency in malignant melanoma. Cancer Res. 2008;68:664–673. - PubMed
    1. Neve RM, et al. A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell. 2006;10:515–527. - PMC - PubMed
    1. Sos ML, et al. Predicting drug susceptibility of non-small cell lung cancers based on genetic lesions. J Clin Invest. 2009;119:1727–1740. - PMC - PubMed

Publication types

MeSH terms


Associated data