Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb 13;31(2):225-239.
doi: 10.1016/j.ccell.2017.01.005.

Characterization of Human Cancer Cell Lines by Reverse-phase Protein Arrays

Free PMC article

Characterization of Human Cancer Cell Lines by Reverse-phase Protein Arrays

Jun Li et al. Cancer Cell. .
Free PMC article


Cancer cell lines are major model systems for mechanistic investigation and drug development. However, protein expression data linked to high-quality DNA, RNA, and drug-screening data have not been available across a large number of cancer cell lines. Using reverse-phase protein arrays, we measured expression levels of ∼230 key cancer-related proteins in >650 independent cell lines, many of which have publically available genomic, transcriptomic, and drug-screening data. Our dataset recapitulates the effects of mutated pathways on protein expression observed in patient samples, and demonstrates that proteins and particularly phosphoproteins provide information for predicting drug sensitivity that is not available from the corresponding mRNAs. We also developed a user-friendly bioinformatic resource, MCLP, to help serve the biomedical research community.

Keywords: biomarker; cancer cell lines; data portal; drug sensitivity; proteomics; reverse-phase protein array; signaling pathways.


Figure 1
Figure 1. Overview of the MCLP cell line dataset and associated molecular and drug data
(A) Venn diagram of the MCLP cell line set with other large public cell line resources, including CCLE, COSMIC Cell Lines Project, and Genentech Cell Lines Project. (B) Distribution of MCLP cell lines in various lineages. (C) Heatmaps summarizing the publically available mRNA expression, copy number alteration, single nucleotide variation and drug sensitivity data. In the heatmaps, each vertical line in the top row represents a cell line in the MCLP set, and each line in other rows indicates the corresponding molecular data is available for that specific data type. The CTRPv2 drug sensitivity data were based on CCLE cell lines, and the GDSC data were based on COSMIC cell lines. (D) RPPA data reproducibility based on replicate samples of NCI60 cell lines. Random pairs were sampled from NCI60 cell lines only. (E) Correlations of derivative cell lines relative to random cell line pairs that were sampled from all cell lines surveyed. (F) Correlations of total- phosphorylated protein pairs relative to random protein pairs. Vertical dotted lines indicate the median values. See also Table S1 and Figure S1.
Figure 2
Figure 2. Comparison of protein and mRNA expression in MCLP cell lines
(A) Box plots of the expression correlations of matched mRNA and protein pairs in different lineages. Box boundaries mark the first and third quartiles, with the median in the center, and whiskers extending to 1.5 interquartile range from the boundaries. The striped box plots were based on the protein sets after excluding the 20% of proteins with the lowest coefficient of variation within each lineage. (B) Distribution of the number of lineages in which the mRNA and protein pair show a significant correlation. Three protein groups are shown in different colors. (C) Co-expression network of protein–protein expression. See also Tables S2-4 and Figure S2.
Figure 3
Figure 3. Clustered heatmap of MCLP cell lines based on RPPA protein expression data
(A) Distribution of different lineages in clusters based on protein expression and a heatmap showing clustered patterns of 651 MCLP cell lines based on >200 protein markers. Mutation data in key cancer genes are shown in the bars (red, mutation; white, no mutation; and grey, NA) above the heatmap, with corrected p values (FDRs) indicating the significance of correlations with the clusters. (B) Box plots of key protein markers that distinct a cluster of interest from other clusters. (C) The alignment of the RPPA clusters and the tumor subtype of breast cancer cell lines. (D) Heatmap showing pathway scores across different protein clusters, with corrected p values (FDRs) indicating the significance of correlations with the clusters. A high-resolution, interactive clustered heat map is available at the MCLP data portal. See also Table S5 and Figure S3.
Figure 4
Figure 4. Effects of mutated pathways on protein expression
(A) Pattern of frequently mutated pathways in TCGA patient cohorts; red bars indicate presence of mutations in a sample. (B) Profiles of frequently mutated pathways in MCLP cell line lineages; red bars indicate presence of mutations in a sample. (C) Given the mutations of a p53 signaling pathway, a Circos plot showing proteins differentially expressed between TCGA WT and mutated breast cancer patient samples (FDR < 0.05) in the external layer and those differentially expressed between MCLP WT and mutated breast cancer cell lines in the middle layer. Color-coded fold changes: blue indicates downregulation relative to WT samples; red indicates upregulation. The inner layer: consistently up and down-regulated markers in TCGA and MCLP samples are indicated by red and blue respectively; inconsistently regulated markers are indicated by green. (D) Examples of individual proteins differentially expressed between WT and mutated samples in TCGA patients and MCLP cell lines. Box boundaries mark first and third quartiles, with the median in the center, and whiskers extending to 1.5 interquartile range from the boundaries. See also Tables S6, S7 and Figure S4.
Figure 5
Figure 5. Predictive power of protein markers on drug sensitivity
(A) Numbers of only protein, only mRNA and both mRNA and protein markers significantly associated with different drug families (FDR < 0.1). (B) A heatmap showing the correlations of the sensitivity of EGFR pathway targeted drugs with protein, phosphoprotein and mRNA markers of their targeted genes. The color is based on the correlation direction and statistical significance, and insignificant correlations are shown in white. (C) Predictive power comparison of proteins vs. mRNAs based on multiple-marker classifiers using the AUC scores. (D) Volcano plot for EGFR_pY1068. (E) Volcano plot of EMT pathway score. Significant nodes (FDR < 0.1) are highlighted with green representing negative correlations and red representing positive correlations. See also Figure S5.
Figure 6
Figure 6. Utility illustration of the MCLP web platform through the example of PDL1
(A) Overview. (B) The PDL1 protein expression across lineages. (C) A positive correlation between PDL1 expression and CD49B. (D) The differential expression of PDL1 protein between the mutant and wild-type groups based on the mutation status of CCDC50. (E) Volcano plot of PDL1. (F) The co-expression pattern of PDL1 and its interacting partners in a protein–protein network view. (G) A snapshot of dynamic heatmap of the RPPA dataset.

Similar articles

See all similar articles

Cited by 50 articles

See all "Cited by" articles