Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr 5;173(2):338-354.e15.
doi: 10.1016/j.cell.2018.03.034.

Machine Learning Identifies Stemness Features Associated With Oncogenic Dedifferentiation

Collaborators, Affiliations
Free PMC article

Machine Learning Identifies Stemness Features Associated With Oncogenic Dedifferentiation

Tathiane M Malta et al. Cell. .
Free PMC article


Cancer progression involves the gradual loss of a differentiated phenotype and acquisition of progenitor and stem-cell-like features. Here, we provide novel stemness indices for assessing the degree of oncogenic dedifferentiation. We used an innovative one-class logistic regression (OCLR) machine-learning algorithm to extract transcriptomic and epigenetic feature sets derived from non-transformed pluripotent stem cells and their differentiated progeny. Using OCLR, we were able to identify previously undiscovered biological mechanisms associated with the dedifferentiated oncogenic state. Analyses of the tumor microenvironment revealed unanticipated correlation of cancer stemness with immune checkpoint expression and infiltrating immune cells. We found that the dedifferentiated oncogenic phenotype was generally most prominent in metastatic tumors. Application of our stemness indices to single-cell data revealed patterns of intra-tumor molecular heterogeneity. Finally, the indices allowed for the identification of novel targets and possible targeted therapies aimed at tumor differentiation.

Keywords: The Cancer Genome Atlas; cancer stem cells; dedifferentiation; epigenomic; genomic; machine learning; pan-cancer; stemness.


Figure 1
Figure 1. Development and validation of the Stemness Indices
(A) Overall methodology. Highlighted are data sources Progenitor Cell Biology Consortium (PCBC), Roadmap and ENCODE databases, OCLR machine learning algorithm, and the resulting stemness indices mRNAsi, mDNAsi and EREG-mRNAsi. The indices for each TCGA tumor sample were correlated with known cancer biology, tumor pathology, clinical information, and drug sensitivity. (B) Stemness indices of the validation set derived using our stemness signature. (C) TCGA tumor types sorted by the stemness indices obtained from transcriptomic (mRNAsi) and epigenetic features (mDNAsi); indices were scaled from 0 (low) to 1 (high). The TCGA tumor types were grouped based on their histology and cell-of-origin into stem cell-like (SC), lympho-hematopoietic (Ly-Hem), Adenocarcinomas, Squamous Cell Carcinomas (Squamous), Neuronal lineage (Neuronal), Sarcomas (Sar), Kidney tumors (Kidney), and not belonging to any of the above (Misc) (Table S2). See also Figures S1 and S2; and Tables S1 and S2.
Figure 2
Figure 2. Biological processes associated with cancer stemness
(A) Gene Set Enrichment Analysis showing RNAseq-based stemness signature evaluated in the context of gene sets representative for Hallmarks of Stemness and Cancer. (B) Correlation between mRNAsi and mRNA expression for published hallmarks of stemness. (C) Correlation between mRNAsi and selected oncogenic processes. (D) Association between the epigenomic-based stemness signature (EREG-mDNAsi and EREG-mRNAsi) and enrichment in the transcription factor binding sites. See also Figure S2 and Table S2.
Figure 3
Figure 3. Molecular and clinical features associated with stemness in breast cancer, acute myeloid leukemia, and gliomas
(A) An overview of the association between known molecular and biological processes and stemness in BRCA (Left). Columns represent samples sorted by mRNAsi from low to high (top row). Rows represent molecular and biological processes associated with mRNAsi. Rows named “EDec CEp 2 and 4” represent estimated cell type proportions. Top right, boxplots of mRNAsi in individual samples, stratified by molecular subtype and histology. Bottom right, correlation of mRNAsi and representative protein expression and microRNA. (B) Similar to A, association of mRNAsi in AML. Top right, mRNAsi by mRNA-based molecular subtype and by FAB classification. Bottom right, correlation scores of mRNAsi and representative microRNA. (C) As in A and B, GBM and LGG sorted by mDNAsi. Top right, mDNAsi by molecular subtype and grade. Bottom right, correlation scores of mDNAsi and representative protein expression and microRNA. All molecular and clinical features shown are statistically significant. See also Figures S1, S3, S4, and S5.
Figure 4
Figure 4. Selected molecular and clinical features associated with the Stemness Indices in TCGA tumors
(A) Association of molecular and clinical features with stemness in LUAD. Top, mDNAsi by integrative molecular subtypes, smoking history, and mutations of TP53 and SETD2. Bottom, correlation scores of mDNAsi and representative protein expression. (B) Stemness in HNSC. Top, mDNAsi stratified by molecular subtypes and mutation of NSD1. Bottom, correlation scores of mDNAsi and representative protein and microRNA expression. (C) Stemness in LIHC. Top, mRNAsi stratified by grade and mutations of TP53, CTNNB1, and AXIN1. Bottom, correlation scores of mRNAsi and representative protein and microRNA expression. (D) Stemness in ACC. Top, mRNAsi stratified by mRNA molecular subtypes, clinical stage, and mutations of PRKAR1A and TP53. Bottom, correlation scores of mRNAsi and adrenal differentiation score. (E) Cox proportional hazards model analysis. Left, progression-free survival; right, overall survival. Hazard ratio greater than one denotes a trend toward higher stemness index with worse outcome. See also Figures S3, S4, and S5.
Figure 5
Figure 5. Analysis of cancer stemness in the context of metastatic state and intratumor heterogeneity
(A) mRNAsi is higher in cancer metastases in comparison to the TCGA primary tumors. (B) mDNAsi is higher in recurrent glioma samples compared to the primary glioma occurrence from the same patient. G-CIMP - glioma CpG methylator phenotype. (C) and (D) Application of mRNAsi to single-cell transcriptome of gliomas and breast cancer reveal intratumor heterogeneity and various degrees of the oncogenic dedifferentiation. (E) Correlation of mRNAsi and mRNA expression of CDH1 (epithelial marker) and CDH2 (mesenchymal marker) in the cancer metastases.
Figure 6
Figure 6. Association of stemness index with immune microenvironment
(A) mDNAsi and mRNAsi in the context of immune microenvironment. Each panel shows the Spearman correlation between the stemness index and PD-L1 protein expression plotted against Spearman correlation between the same stemness index and total leukocyte fraction, as estimated from DNA methylation data. (B) Highlight of tumor types that exhibit strong correlation between stemness and PD-L1 expression or total leukocyte fraction. See also Figure S6.
Figure 7
Figure 7. Correlation of cancer stemness with drug resistance – Connectivity map analysis
(A) Heatmap showing enrichment score (positive in blue, negative in red) of each compound from the CMap for each cancer type. Compounds sorted from right to left by descending number of cancer type significantly enriched. (B) Heatmap showing each compound (perturbagen) from the CMap that share Mechanism of actions (rows). Sorted by descending number of compound with shared mechanism of actions. See also Figure S7 and Tables S3 and S4.

Similar articles

See all similar articles

Cited by 91 articles

See all "Cited by" articles

Publication types

Grant support