Machine Learning Versus Usual Care for Diagnostic and Prognostic Prediction in the Emergency Department: A Systematic Review

Hashim Kareemi; Christian Vaillancourt; Hans Rosenberg; Karine Fournier; Krishan Yadav

doi:10.1111/acem.14190

Machine Learning Versus Usual Care for Diagnostic and Prognostic Prediction in the Emergency Department: A Systematic Review

Acad Emerg Med. 2021 Feb;28(2):184-196. doi: 10.1111/acem.14190. Epub 2021 Jan 2.

Authors

Hashim Kareemi¹, Christian Vaillancourt^{1

2}, Hans Rosenberg¹, Karine Fournier³, Krishan Yadav^{1

2}

Affiliations

¹ From the, Department of Emergency Medicine, University of Ottawa, Ottawa, Ontario, Canada.
² and the, Ottawa Hospital Research Institute, Ottawa, Ontario, Canada.
³ and the, Health Sciences Library, University of Ottawa, Ottawa, Ontario, Canada.

PMID: 33277724
DOI: 10.1111/acem.14190

Abstract

Objective: Having shown promise in other medical fields, we sought to determine whether machine learning (ML) models perform better than usual care in diagnostic and prognostic prediction for emergency department (ED) patients.

Methods: In this systematic review, we searched MEDLINE, Embase, Central, and CINAHL from inception to October 17, 2019. We included studies comparing diagnostic and prognostic prediction of ED patients by ML models to usual care methods (triage-based scores, clinical prediction tools, clinician judgment) using predictor variables readily available to ED clinicians. We extracted commonly reported performance metrics of model discrimination and classification. We used the PROBAST tool for risk of bias assessment (PROSPERO registration: CRD42020158129).

Results: The search yielded 1,656 unique records, of which 23 studies involving 16,274,647 patients were included. In all seven diagnostic studies, ML models outperformed usual care in all performance metrics. In six studies assessing in-hospital mortality, the best-performing ML models had better discrimination (area under the receiver operating characteristic curve [AUROC] =0.74-0.94) than any clinical decision tool (AUROC =0.68-0.81). In four studies assessing hospitalization, ML models had better discrimination (AUROC =0.80-0.83) than triage-based scores (AUROC =0.68-0.82). Clinical heterogeneity precluded meta-analysis. Most studies had high risk of bias due to lack of external validation, low event rates, and insufficient reporting of calibration.

Conclusions: Our review suggests that ML may have better prediction performance than usual care for ED patients with a variety of clinical presentations and outcomes. However, prediction model reporting guidelines should be followed to provide clinically applicable data. Interventional trials are needed to assess the impact of ML models on patient-centered outcomes.

Publication types

Meta-Analysis
Systematic Review

MeSH terms

Emergency Service, Hospital*
Hospital Mortality
Humans
Machine Learning*
Prognosis
Triage