Application of a novel machine learning framework for predicting non-metastatic prostate cancer-specific mortality in men using the Surveillance, Epidemiology, and End Results (SEER) database

Lancet Digit Health. 2021 Mar;3(3):e158-e165. doi: 10.1016/S2589-7500(20)30314-9. Epub 2021 Feb 3.


Background: Accurate prognostication is crucial in treatment decisions made for men diagnosed with non-metastatic prostate cancer. Current models rely on prespecified variables, which limits their performance. We aimed to investigate a novel machine learning approach to develop an improved prognostic model for predicting 10-year prostate cancer-specific mortality and compare its performance with existing validated models.

Methods: We derived and tested a machine learning-based model using Survival Quilts, an algorithm that automatically selects and tunes ensembles of survival models using clinicopathological variables. Our study involved a US population-based cohort of 171 942 men diagnosed with non-metastatic prostate cancer between Jan 1, 2000, and Dec 31, 2016, from the prospectively maintained Surveillance, Epidemiology, and End Results (SEER) Program. The primary outcome was prediction of 10-year prostate cancer-specific mortality. Model discrimination was assessed using the concordance index (c-index), and calibration was assessed using Brier scores. The Survival Quilts model was compared with nine other prognostic models in clinical use, and decision curve analysis was done.

Findings: 647 151 men with prostate cancer were enrolled into the SEER database, of whom 171 942 were included in this study. Discrimination improved with greater granularity, and multivariable models outperformed tier-based models. The Survival Quilts model showed good discrimination (c-index 0·829, 95% CI 0·820-0·838) for 10-year prostate cancer-specific mortality, which was similar to the top-ranked multivariable models: PREDICT Prostate (0·820, 0·811-0·829) and Memorial Sloan Kettering Cancer Center (MSKCC) nomogram (0·787, 0·776-0·798). All three multivariable models showed good calibration with low Brier scores (Survival Quilts 0·036, 95% CI 0·035-0·037; PREDICT Prostate 0·036, 0·035-0·037; MSKCC 0·037, 0·035-0·039). Of the tier-based systems, the Cancer of the Prostate Risk Assessment model (c-index 0·782, 95% CI 0·771-0·793) and Cambridge Prognostic Groups model (0·779, 0·767-0·791) showed higher discrimination for predicting 10-year prostate cancer-specific mortality. c-indices for models from the National Comprehensive Cancer Care Network, Genitourinary Radiation Oncologists of Canada, American Urological Association, European Association of Urology, and National Institute for Health and Care Excellence ranged from 0·711 (0·701-0·721) to 0·761 (0·750-0·772). Discrimination for the Survival Quilts model was maintained when stratified by age and ethnicity. Decision curve analysis showed an incremental net benefit from the Survival Quilts model compared with the MSKCC and PREDICT Prostate models currently used in practice.

Interpretation: A novel machine learning-based approach produced a prognostic model, Survival Quilts, with discrimination for 10-year prostate cancer-specific mortality similar to the top-ranked prognostic models, using only standard clinicopathological variables. Future integration of additional data will likely improve model performance and accuracy for personalised prognostics.

Funding: None.

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Algorithms*
  • Databases, Factual
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Nomograms
  • Prognosis
  • Prostate / pathology*
  • Prostatic Neoplasms / diagnosis*
  • Prostatic Neoplasms / mortality
  • Prostatic Neoplasms / pathology
  • Retrospective Studies
  • Risk Assessment
  • Survival Analysis
  • United States / epidemiology