Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 6;16(1):116.
doi: 10.1186/s12943-017-0691-y.

Multi-omics of 34 Colorectal Cancer Cell Lines - A Resource for Biomedical Studies

Affiliations
Free PMC article

Multi-omics of 34 Colorectal Cancer Cell Lines - A Resource for Biomedical Studies

Kaja C G Berg et al. Mol Cancer. .
Free PMC article

Abstract

Background: Colorectal cancer (CRC) cell lines are widely used pre-clinical model systems. Comprehensive insights into their molecular characteristics may improve model selection for biomedical studies.

Methods: We have performed DNA, RNA and protein profiling of 34 cell lines, including (i) targeted deep sequencing (n = 612 genes) to detect single nucleotide variants and insertions/deletions; (ii) high resolution DNA copy number profiling; (iii) gene expression profiling at exon resolution; (iv) small RNA expression profiling by deep sequencing; and (v) protein expression analysis (n = 297 proteins) by reverse phase protein microarrays.

Results: The cell lines were stratified according to the key molecular subtypes of CRC and data were integrated at two or more levels by computational analyses. We confirm that the frequencies and patterns of DNA aberrations are associated with genomic instability phenotypes and that the cell lines recapitulate the genomic profiles of primary carcinomas. Intrinsic expression subgroups are distinct from genomic subtypes, but consistent at the gene-, microRNA- and protein-level and dominated by two distinct clusters; colon-like cell lines characterized by expression of gastro-intestinal differentiation markers and undifferentiated cell lines showing upregulation of epithelial-mesenchymal transition and TGFβ signatures. This sample split was concordant with the gene expression-based consensus molecular subtypes of primary tumors. Approximately ¼ of the genes had consistent regulation at the DNA copy number and gene expression level, while expression of gene-protein pairs in general was strongly correlated. Consistent high-level DNA copy number amplification and outlier gene- and protein- expression was found for several oncogenes in individual cell lines, including MYC and ERBB2.

Conclusions: This study expands the view of CRC cell lines as accurate molecular models of primary carcinomas, and we present integrated multi-level molecular data of 34 widely used cell lines in easily accessible formats, providing a resource for preclinical studies in CRC.

Keywords: Colorectal cancer cell lines; Consensus molecular subtypes; Copy number aberrations; Gene expression; Genomics; Methylation; Microsatellite instability; Mutations; Protein expression; miRNA.

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Overview of the 34 CRC cell lines analyzed and key findings. a The cell lines are grouped according to the gene expression-based CMSs (except Colo320, which has a neuroendocrine origin), and MSI, POLE and CIMP status are indicated. In general, the morphologic appearance of cell lines in CMS1 and CMS4 (for example LoVo and RKO) was mesenchymal, whereas cell lines in CMS2 and CMS3 (for example IS3 and WiDr) appeared more epithelial-like. b The cell lines were analyzed on the DNA, RNA and protein levels as indicated (blue background). Bioinformatic analyses (grey) were performed both on individual data levels and by integration of two or more data levels. Key findings (white) and references to figures and tables with detailed results are given (green). CIMP: CpG island methylator phenotype, CMS: consensus molecular subtypes, CNA: copy number aberrations, MSI/MSS: microsatellite instable/stable, OG: oncogene, TF: transcription factor, TS: tumor suppressor, SNV: single nucleotide variant
Fig. 2
Fig. 2
DNA aberrations reflect the type of genomic instability. a We investigated the frequencies (vertical axes) of SNVs in each of six categories (indicated in the top panels) grouped according to sequence motif (flanking nucleotides are indicated on the horizontal axes). MSI cell lines (n = 8, excluding DLD1 and HCT15) and the POLE mutated cell line HCC2998 displayed different mutation signatures associated with the respective types of hypermutation. The MSI cell lines DLD-1 and HCT15 had a distinct mutation signature with a combination of deficient mismatch repair and POLD1 mutation. b Overview of detected SNVs/indels in 37 genes included in the Cosmic Cancer Gene Census and that were mutated in at least four MSI cell lines or one MSS cell line among the 27 cell lines analyzed by targeted deep sequencing. Most genes showed clear mutation frequency differences between MSS and MSI/POLE mutated cell lines. c There was an inverse relationship between the CNA load (horizontal axis; percent of basepairs with aberrant copy number) and the SNV/indel load (vertical axis) in the cell lines, reflecting their molecular subtype, as indicated. The neuroendocrine cell line Colo320 (green circle) grouped along with the MSS cell lines, and had few SNVs/indels and a moderate number of CNAs, including gain of 8q and 13q. d MSI/POLE mutated cell lines had a lower frequency of CNAs (vertical axis) along the genome than e MSS cell lines. In each plot, chromosomes are indicated on the horizontal axes and separated by vertical lines (whole and dashed lines for chromosomes and chromosome arms, respectively). Frequent aberrations are highlighted, including gains on 7p, 7q, 8q, 12p, 13q, 20q and losses on 4p, 4q, 17p, 18q and 22q, which are chromosome arms known to be frequently affected by CNAs in primary CRCs. CNA: copy number aberration, MSI/MSS: microsatellite instable/stable, POLE: POLE mutated, SNV: single nucleotide variant
Fig. 3
Fig. 3
Gene expression based classification of CRC cell lines revealed a separation between colon-like and undifferentiated cell lines associated with the consensus molecular subtypes (CMS). a PCA of cell line mRNA expression data (plotted as sample-wise PC1 versus PC2) showed that the cell lines had a bimodal density distribution along PC1 (bottom plot), indicating two distinct subgroups largely separating CMS2/3 from CMS1/4. Each point represents one cell line, and is colored according to the CMS class and with point type indicating MSI-status. Dashed vertical line (red) indicates the least frequent value between the two density modes of PC1, and was used as a threshold to separate the cell lines into the two subgroups. b PC1 (horizontal axis) was strongly correlated with the sample-wise enrichment score for a set of gastro-intestinal tissue enhanced genes (vertical axis), and cell lines with high enrichment scores, left of the red dashed line, were termed “colon-like” and the remaining “undifferentiated”. c Gene set enrichment analyses comparing colon-like and undifferentiated cell lines showed that colon-like cell lines had higher expression of genes upregulated by HNF4A and lower expression of genes related to colorectal cancer stemness. Undifferentiated cell lines had higher expression of genes related to epithelial to mesenchymal transition and genes upregulated by TGFβ. The plot includes the top 15 gene sets tested (ranked by p-value) and the -log10 p-value is plotted on the horizontal axis. d Top 5 differentially expressed transcription factors and kinases (mRNA level), miRNAs and proteins between colon-like and undifferentiated cell lines. mRNAs and miRNAs are ranked by p-value while proteins are ranked by absolute log2 fold-change. The log2 fold-changes (log2FC) between the sample groups are indicated. e Classification of the individual cell lines according to the colon-like and undifferentiated subgroups. CRC: colorectal cancer, CMS: consensus molecular subtypes, log2FC: log2 fold-change, MSI/MSS: microsatellite instable/stable, PCA: principal component analysis
Fig. 4
Fig. 4
CNAs and SNVs/indels in cancer-critical genes. Among genes in the Cancer Gene Census (n = 83 genes included in the targeted sequencing panel, ranked vertically in alphabetical order), simultaneous mutations and CNAs in individual cell lines (grouped horizontally according to genomic phenotypes as indicated) were detected in CRC relevant oncogenes, including KRAS and EGFR, and tumor suppressor genes, including TP53 and APC. The cell line Colo320, which has a neuroendocrine origin, is marked by an asterisk. CNA: copy number aberration, CRC: colorectal cancer, MSI/MSS: microsatellite instable/stable, POLE: POLE mutated, SNV: single nucleotide variant
Fig. 5
Fig. 5
mRNA and protein expression levels are highly concordant among cell lines. a The density distribution (horizontal axis) of cross-cell line Pearson’s correlations (vertical axis) for expression of matched genes (microarray data) and proteins (Reverse Phase Protein Array data) (n = 194) shows an overall strong correlation. The horizontal line indicates the median correlation coefficient for all gene-protein pairs. b Differential expression analyses between colon-like and undifferentiated cell lines showed strong correspondence at the mRNA and protein level (plotted as the log2 fold-changes between the two groups of cell lines for matched protein on the vertical axis versus mRNA on the horizontal axis). The plot includes gene-protein pairs with adjusted p-value <0.1 from differential expression analysis in either mRNA or protein data. Gene-protein pairs with absolute log2 fold-change >0.5 (mRNA) between colon-like and undifferentiated cell lines are indicated by names and the rest by circles. Pearson correlation analysis (r2) indicated that 43% of the variance in the log2 fold-change at the protein level could be explained by mRNA-level log2 fold-change
Fig. 6
Fig. 6
Characteristics of individual cell lines at multiple molecular levels. The cell lines are ranked alphabetically within the colon-like (n = 18; top) and undifferentiated (n = 15; bottom) subgroups. The neuroendocrine Colo320 is found below the undifferentiated cell lines (marked by a dark grey box). a The heatmap shows standardized single sample gene set expression enrichment scores for the eight selected pathways indicated at the bottom (indicates how many standard deviations the score is above or below the mean). Red indicates relative upregulation and blue indicates relative downregulation among cell lines. b The table indicates selected molecular events characteristic of each cell line. Amp: DNA amplification, mut: “mutation” (single nucleotide variant or insertion/deletion), m: mRNA level, p: protein level, wt: wild type

Similar articles

See all similar articles

Cited by 44 articles

See all "Cited by" articles

References

    1. Boland CR, Goel A. Microsatellite instability in colorectal cancer. Gastroenterology. 2010;138:2073–2087. doi: 10.1053/j.gastro.2009.12.064. - DOI - PMC - PubMed
    1. Pino MS, Chung DC. The chromosomal instability pathway in colon cancer. Gastroenterology. 2010;138:2059–2072. doi: 10.1053/j.gastro.2009.12.065. - DOI - PMC - PubMed
    1. Toyota M, Ahuja N, Ohe-Toyota M, Herman JG, Baylin SB, Issa JP. CpG island methylator phenotype in colorectal cancer. Proc Natl Acad Sci U S A. 1999;96:8681–8686. doi: 10.1073/pnas.96.15.8681. - DOI - PMC - PubMed
    1. Weisenberger DJ, Siegmund KD, Campan M, Young J, Long TI, Faasse MA, et al. CpG island methylator phenotype underlies sporadic microsatellite instability and is tightly associated with BRAF mutation in colorectal cancer. Nat Genet. 2006;38:787–793. doi: 10.1038/ng1834. - DOI - PubMed
    1. Marisa L, de Reynies A, Duval A, Selves J, Gaub MP, Vescovo L, et al. Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value. PLoS Med. 2013;10:e1001453. doi: 10.1371/journal.pmed.1001453. - DOI - PMC - PubMed

Publication types

MeSH terms

Feedback