Unsupervised machine learning reveals risk stratifying glioblastoma tumor cells

Elife. 2020 Jun 23;9:e56879. doi: 10.7554/eLife.56879.


A goal of cancer research is to reveal cell subsets linked to continuous clinical outcomes to generate new therapeutic and biomarker hypotheses. We introduce a machine learning algorithm, Risk Assessment Population IDentification (RAPID), that is unsupervised and automated, identifies phenotypically distinct cell populations, and determines whether these populations stratify patient survival. With a pilot mass cytometry dataset of 2 million cells from 28 glioblastomas, RAPID identified tumor cells whose abundance independently and continuously stratified patient survival. Statistical validation within the workflow included repeated runs of stochastic steps and cell subsampling. Biological validation used an orthogonal platform, immunohistochemistry, and a larger cohort of 73 glioblastoma patients to confirm the findings from the pilot cohort. RAPID was also validated to find known risk stratifying cells and features using published data from blood cancer. Thus, RAPID provides an automated, unsupervised approach for finding statistically and biologically significant cells using cytometry data from patient samples.

Keywords: brain tumors; computational biology; glioblastoma; human; human biology; machine learning; mass cytomtery; medicine; phoshpo-proteins; single cell; systems biology.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms
  • Glioblastoma / physiopathology*
  • Humans
  • Pilot Projects
  • Tumor Cells, Cultured
  • Unsupervised Machine Learning*