Machine learning approach for hemorrhagic transformation prediction: Capturing predictors' interaction

Front Neurol. 2022 Nov 24:13:951401. doi: 10.3389/fneur.2022.951401. eCollection 2022.

Abstract

Background and purpose: Patients with ischemic stroke frequently develop hemorrhagic transformation (HT), which could potentially worsen the prognosis. The objectives of the current study were to determine the incidence and predictors of HT, to evaluate predictor interaction, and to identify the optimal predicting models.

Methods: A prospective study included 360 patients with ischemic stroke, of whom 354 successfully continued the study. Patients were subjected to thorough general and neurological examination and T2 diffusion-weighted MRI, at admission and 1 week later to determine the incidence of HT. HT predictors were selected by a filter-based minimum redundancy maximum relevance (mRMR) algorithm independent of model performance. Several machine learning algorithms including multivariable logistic regression classifier (LRC), support vector classifier (SVC), random forest classifier (RFC), gradient boosting classifier (GBC), and multilayer perceptron classifier (MLPC) were optimized for HT prediction in a randomly selected half of the sample (training set) and tested in the other half of the sample (testing set). The model predictive performance was evaluated using receiver operator characteristic (ROC) and visualized by observing case distribution relative to the models' predicted three-dimensional (3D) hypothesis spaces within the testing dataset true feature space. The interaction between predictors was investigated using generalized additive modeling (GAM).

Results: The incidence of HT in patients with ischemic stroke was 19.8%. Infarction size, cerebral microbleeds (CMB), and the National Institute of Health stroke scale (NIHSS) were identified as the best HT predictors. RFC (AUC: 0.91, 95% CI: 0.85-0.95) and GBC (AUC: 0.91, 95% CI: 0.86-0.95) demonstrated significantly superior performance compared to LRC (AUC: 0.85, 95% CI: 0.79-0.91) and MLPC (AUC: 0.85, 95% CI: 0.78-0.92). SVC (AUC: 0.90, 95% CI: 0.85-0.94) outperformed LRC and MLPC but did not reach statistical significance. LRC and MLPC did not show significant differences. The best models' 3D hypothesis spaces demonstrated non-linear decision boundaries suggesting an interaction between predictor variables. GAM analysis demonstrated a linear and non-linear significant interaction between NIHSS and CMB and between NIHSS and infarction size, respectively.

Conclusion: Cerebral microbleeds, NIHSS, and infarction size were identified as HT predictors. The best predicting models were RFC and GBC capable of capturing nonlinear interaction between predictors. Predictor interaction suggests a dynamic, rather than, fixed cutoff risk value for any of these predictors.

Keywords: NIHSS; cerebral microbleeds; hemorrhagic transformation; infarction size; ischemic stroke; machine learning.