Machine learning based survival prediction in Glioma using large-scale registry data

Health Informatics J. 2022 Oct-Dec;28(4):14604582221135427. doi: 10.1177/14604582221135427.

Abstract

Gliomas are the most common central nervous system tumors exhibiting poor clinical outcomes. The ability to estimate prognosis is crucial for both patients and providers in order to select the most appropriate treatment. Machine learning (ML) allows for sophisticated approaches to survival prediction using real world clinical parameters needed to achieve superior predictive accuracy. We employed Cox Proportional hazards (CPH) model, Support Vector Machine (SVM) model, Random Forest (RF) model in a large glioma dataset (3462 patients, diagnosed 2000-2018) to explore the most optimal approach to survival prediction. Features employed were age, sex, surgical resection status, tumor histology and tumor site, administration of radiation therapy (RT) and chemotherapy status. Concordance index (c-index) was employed to assess the accuracy of survival time prediction. All three models performed well with prediction accuracy (CI 0.767, 0.771, 0.57 for CPH, SVM, RF models respectively) with the best performance achieved when incorporating RT and chemotherapy administration status which emerged as key predictive features. Within the subset of glioblastoma patients, similar prediction accuracy was achieved. These findings should prompt stricter clinician oversight over registry data accuracy through quality assurance as we move towards meaningful predictive ability using ML approaches in glioma.

Keywords: Artificial intelligence; cancer registry; data Mining; electronic health records; machine learning.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Intramural

MeSH terms

  • Glioma* / diagnosis
  • Glioma* / therapy
  • Humans
  • Machine Learning
  • Prognosis
  • Registries
  • Support Vector Machine