Explainable machine learning and feature interpretation to predict survival outcomes in the treatment of lung cancer

Semin Oncol. 2025 Jun;52(3):152364. doi: 10.1016/j.seminoncol.2025.152364. Epub 2025 May 24.

Abstract

The treatment outcomes of lung cancer are highly variable, and machine learning (ML) models provide valuable insights into how clinical and biochemical factors influence survival across different treatments. This study will investigate the survival of patients after four major treatments for lung cancer by interpreting the impact of biomarkers on survival using SHapley Additive exPlanations (SHAP). We analyzed 23,658 lung cancer patient records derived from a Kaggle dataset. Using the most relevant clinical and biochemical variables, ML models were employed to study survival outcomes for different treatments. SHAP analysis revealed major survival predictors in each treatment. Survival outcomes are visualized as f(x) (predicted survival) and E[f(x)] (baseline expectation) in SHAP waterfall plots. The most performed model is Gradient Boosting with an accuracy of 88.99%, precision of 89.06%, recall of 88.99%, F1-score of 88.91%, and Receiver Operating Characteristic Curve (AUC-ROC) score of 0.9332. Chemotherapy treatment was positive for survival, the key for survival was phosphorus levels (+0.05), low Alanine Aminotransferase levels (+0.04) and low glucose levels (+0.04). Targeted therapy and radiation had worse survival, while surgery was favorable, especially in cases with high white blood cell and Lactate Dehydrogenase (LDH) levels. SHAP-based ML analysis aptly underlines how clinical and biochemical factors influence the survival rate. It indicates that ML-driven interpretability might drive personalized treatment approaches in lung cancer.

Keywords: Ensemble techniques; Lung cancer; Machine learning models; Precision oncology; SHAP analysis; Survival prediction.

MeSH terms

  • Aged
  • Biomarkers, Tumor
  • Female
  • Humans
  • Lung Neoplasms* / mortality
  • Lung Neoplasms* / therapy
  • Machine Learning*
  • Male
  • Middle Aged
  • Prognosis
  • Treatment Outcome

Substances

  • Biomarkers, Tumor