Predicting 30-day mortality after ST elevation myocardial infarction: Machine learning- based random forest and its external validation using two independent nationwide datasets

J Cardiol. 2021 Nov;78(5):439-446. doi: 10.1016/j.jjcc.2021.06.002. Epub 2021 Jun 19.

Abstract

Background: Various prognostic models for mortality prediction following ST-segment elevation myocardial infarction (STEMI) have been developed over the past two decades. Our group has previously demonstrated that machine learning (ML)-based models can outperform known risk scores for 30-day mortality post-STEMI. The study aimed to redevelop an ML-based random forest prediction model for 30-day mortality post-STEMI and externally validate it on a large cohort.

Methods: This was a retrospective, supervised learning, data mining study developed on the Acute Coronary Syndrome Israeli Survey (ACSIS) registry and the Myocardial Ischemia National Audit Project (MINAP) for external validation. Patients included received reperfusion therapy for STEMI between 2006 and 2016. Discrimination and calibration performances were assessed for two developed models and compared with the Global Registry of Acute Cardiac Events (GRACE) score.

Results: The ACSIS cohort (2,782 included /15,212 total) and MINAP cohort (22,693 included/735,000 total) were significantly different in most variables, yet similar in 30-day mortality rate (4.3-4.4%). Random forest models were developed on the ACSIS cohort with a full model including all 32 variables and a simple model including the 10 most important ones. Features' importance was calculated using the varImp function measuring how much each feature contributes to the data's homogeneity. Applying the optimized models on the MINAP validation cohort showed high discrimination of area under the curve (AUC) = 0.804 (0.786-0.822) for the full model, and AUC = 0.787 (0.748-0.780) using the simple model, compared with the GRACE risk score discrimination of AUC = 0.764 (0.748-0.780). All models were not well calibrated for the MINAP data. Following Platt scaling on 20% of the MINAP data, the random forest models calibration improved while the GRACE calibration did not change.

Conclusions: The random forest predictive model for 30-day mortality post STEMI, developed on the ACSIS national registry, has been validated in the MINAP large external cohort and can be applied early at admission for risk stratification. The model performed better than the commonly used GRACE score. Furthermore, to the best of our knowledge, this is the first externally validated ML-based model for STEMI.

Keywords: Data mining; Machine learning; Mortality; Outcome; ST-segment elevation myocardial infarction.

MeSH terms

  • Acute Coronary Syndrome*
  • Humans
  • Machine Learning
  • Registries
  • Retrospective Studies
  • Risk Assessment
  • Risk Factors
  • ST Elevation Myocardial Infarction*