Comparison of correctly and incorrectly classified patients for in-hospital mortality prediction in the intensive care unit

Eline Stenwig; Giampiero Salvi; Pierluigi Salvo Rossi; Nils Kristian Skjærvold

doi:10.1186/s12874-023-01921-9

Comparison of correctly and incorrectly classified patients for in-hospital mortality prediction in the intensive care unit

BMC Med Res Methodol. 2023 Apr 24;23(1):102. doi: 10.1186/s12874-023-01921-9.

Authors

Eline Stenwig¹, Giampiero Salvi^{2

3}, Pierluigi Salvo Rossi², Nils Kristian Skjærvold^{4

5}

Affiliations

¹ Department of Circulation and Medical Imaging, The Norwegian University of Science and Technology, Trondheim, Norway. eline.stenwig@ntnu.no.
² Department of Electronic Systems, The Norwegian University of Science and Technology, Trondheim, Norway.
³ KTH, Royal Institute of Technology, EECS, Stockholm, Sweden.
⁴ Department of Circulation and Medical Imaging, The Norwegian University of Science and Technology, Trondheim, Norway.
⁵ Clinic of Anaesthesia and Intensive Care Medicine, St. Olav's University Hospital, Trondheim, Norway.

Abstract

Background: The use of machine learning is becoming increasingly popular in many disciplines, but there is still an implementation gap of machine learning models in clinical settings. Lack of trust in models is one of the issues that need to be addressed in an effort to close this gap. No models are perfect, and it is crucial to know in which use cases we can trust a model and for which cases it is less reliable.

Methods: Four different algorithms are trained on the eICU Collaborative Research Database using similar features as the APACHE IV severity-of-disease scoring system to predict hospital mortality in the ICU. The training and testing procedure is repeated 100 times on the same dataset to investigate whether predictions for single patients change with small changes in the models. Features are then analysed separately to investigate potential differences between patients consistently classified correctly and incorrectly.

Results: A total of 34 056 patients (58.4%) are classified as true negative, 6 527 patients (11.3%) as false positive, 3 984 patients (6.8%) as true positive, and 546 patients (0.9%) as false negatives. The remaining 13 108 patients (22.5%) are inconsistently classified across models and rounds. Histograms and distributions of feature values are compared visually to investigate differences between groups.

Conclusions: It is impossible to distinguish the groups using single features alone. Considering a combination of features, the difference between the groups is clearer. Incorrectly classified patients have features more similar to patients with the same prediction rather than the same outcome.

Keywords: Explainability; Machine learning; Mortality prediction; SHAP values; eICU.

MeSH terms

APACHE
Algorithms
Hospital Mortality
Humans
Intensive Care Units*
Machine Learning*