Machine-Learning and Stochastic Tumor Growth Models for Predicting Outcomes in Patients With Advanced Non-Small-Cell Lung Cancer

JCO Clin Cancer Inform. 2019 Sep;3:1-11. doi: 10.1200/CCI.19.00046.


Purpose: The prediction of clinical outcomes for patients with cancer is central to precision medicine and the design of clinical trials. We developed and validated machine-learning models for three important clinical end points in patients with advanced non-small-cell lung cancer (NSCLC)-objective response (OR), progression-free survival (PFS), and overall survival (OS)-using routinely collected patient and disease variables.

Methods: We aggregated patient-level data from 17 randomized clinical trials recently submitted to the US Food and Drug Administration evaluating molecularly targeted therapy and immunotherapy in patients with advanced NSCLC. To our knowledge, this is one of the largest studies of NSCLC to consider biomarker and inhibitor therapy as candidate predictive variables. We developed a stochastic tumor growth model to predict tumor response and explored the performance of a range of machine-learning algorithms and survival models. Models were evaluated on out-of-sample data using the standard area under the receiver operating characteristic curve and concordance index (C-index) performance metrics.

Results: Our models achieved promising out-of-sample predictive performances of 0.79 area under the receiver operating characteristic curve (95% CI, 0.77 to 0.81), 0.67 C-index (95% CI, 0.66 to 0.69), and 0.73 C-index (95% CI, 0.72 to 0.74) for OR, PFS, and OS, respectively. The calibration plots for PFS and OS suggested good agreement between actual and predicted survival probabilities. In addition, the Kaplan-Meier survival curves showed that the difference in survival between the low- and high-risk groups was significant (log-rank test P < .001) for both PFS and OS.

Conclusion: Biomarker status was the strongest predictor of OR, PFS, and OS in patients with advanced NSCLC treated with immune checkpoint inhibitors and targeted therapies. However, single biomarkers have limited predictive value, especially for programmed death-ligand 1 immunotherapy. To advance beyond the results achieved in this study, more comprehensive data on composite multiomic signatures is required.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Carcinoma, Non-Small-Cell Lung / mortality*
  • Carcinoma, Non-Small-Cell Lung / pathology*
  • Carcinoma, Non-Small-Cell Lung / therapy
  • Combined Modality Therapy
  • Humans
  • Kaplan-Meier Estimate
  • Lung Neoplasms / mortality*
  • Lung Neoplasms / pathology*
  • Lung Neoplasms / therapy
  • Machine Learning*
  • Models, Biological*
  • Molecular Targeted Therapy
  • Neoplasm Metastasis
  • Neoplasm Staging
  • Prognosis
  • Randomized Controlled Trials as Topic
  • Stochastic Processes*
  • Treatment Outcome
  • Tumor Burden