Background: Accurate mortality risk prediction is crucial for effective cardiovascular risk management. Recent advancements in artificial intelligence (AI) have demonstrated potential in this specific medical field. Qwen-2 and Llama-3 are high-performance, open-source large language models (LLMs) available online. An artificial neural network (ANN) algorithm derived from the SWEDEHEART (Swedish Web System for Enhancement and Development of Evidence-Based Care in Heart Disease Evaluated According to Recommended Therapies) registry, termed SWEDEHEART-AI, can predict patient prognosis following acute myocardial infarction (AMI).
Objective: This study aims to evaluate the 3 models mentioned above in predicting 1-year all-cause mortality in critically ill patients with AMI.
Methods: The Medical Information Mart for Intensive Care IV (MIMIC-IV) database is a publicly available data set in critical care medicine. We included 2758 patients who were first admitted for AMI and discharged alive. SWEDEHEART-AI calculated the mortality rate based on each patient's 21 clinical variables. Qwen-2 and Llama-3 analyzed the content of patients' discharge records and directly provided a 1-decimal value between 0 and 1 to represent 1-year death risk probabilities. The patients' actual mortality was verified using follow-up data. The predictive performance of the 3 models was assessed and compared using the Harrell C-statistic (C-index), the area under the receiver operating characteristic curve (AUROC), calibration plots, Kaplan-Meier curves, and decision curve analysis.
Results: SWEDEHEART-AI demonstrated strong discrimination in predicting 1-year all-cause mortality in patients with AMI, with a higher C-index than Qwen-2 and Llama-3 (C-index 0.72, 95% CI 0.69-0.74 vs C-index 0.65, 0.62-0.67 vs C-index 0.56, 95% CI 0.53-0.58, respectively; all P<.001 for both comparisons). SWEDEHEART-AI also showed high and consistent AUROC in the time-dependent ROC curve. The death rates calculated by SWEDEHEART-AI were positively correlated with actual mortality, and the 3 risk classes derived from this model showed clear differentiation in the Kaplan-Meier curve (P<.001). Calibration plots indicated that SWEDEHEART-AI tended to overestimate mortality risk, with an observed-to-expected ratio of 0.478. Compared with the LLMs, SWEDEHEART-AI demonstrated positive and greater net benefits at risk thresholds below 19%.
Conclusions: SWEDEHEART-AI, a trained ANN model, demonstrated the best performance, with strong discrimination and clinical utility in predicting 1-year all-cause mortality in patients with AMI from an intensive care cohort. Among the LLMs, Qwen-2 outperformed Llama-3 and showed moderate predictive value. Qwen-2 and SWEDEHEART-AI exhibited comparable classification effectiveness. The future integration of LLMs into clinical decision support systems holds promise for accurate risk stratification in patients with AMI; however, further research is needed to optimize LLM performance and address calibration issues across diverse patient populations.
Keywords: artificial neural network; large language model; myocardial infarction; prediction model; risk assessment.
©Boqun Shi, Liangguo Chen, Shuo Pang, Yue Wang, Shen Wang, Fadong Li, Wenxin Zhao, Pengrong Guo, Leli Zhang, Chu Fan, Yi Zou, Xiaofan Wu. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 12.05.2025.