Development and Evaluation of a Machine Learning Model for the Early Identification of Patients at Risk for Sepsis

Ryan J Delahanty; JoAnn Alvarez; Lisa M Flynn; Robert L Sherwin; Spencer S Jones

doi:10.1016/j.annemergmed.2018.11.036

Development and Evaluation of a Machine Learning Model for the Early Identification of Patients at Risk for Sepsis

Ann Emerg Med. 2019 Apr;73(4):334-344. doi: 10.1016/j.annemergmed.2018.11.036. Epub 2019 Jan 17.

Authors

Ryan J Delahanty¹, JoAnn Alvarez¹, Lisa M Flynn¹, Robert L Sherwin², Spencer S Jones³

Affiliations

¹ Tenet Healthcare Corporation, Nashville, TN.
² Department of Emergency Medicine, Wayne State University, Detroit, MI.
³ Tenet Healthcare Corporation, Nashville, TN. Electronic address: ssj1364@gmail.com.

PMID: 30661855
DOI: 10.1016/j.annemergmed.2018.11.036

Abstract

Study objective: The Third International Consensus Definitions (Sepsis-3) Task Force recommended the use of the quick Sequential [Sepsis-related] Organ Failure Assessment (qSOFA) score to screen patients for sepsis outside of the ICU. However, subsequent studies raise concerns about the sensitivity of qSOFA as a screening tool. We aim to use machine learning to develop a new sepsis screening tool, the Risk of Sepsis (RoS) score, and compare it with a slate of benchmark sepsis-screening tools, including the Systemic Inflammatory Response Syndrome, Sequential Organ Failure Assessment (SOFA), qSOFA, Modified Early Warning Score, and National Early Warning Score.

Methods: We used retrospective electronic health record data from adult patients who presented to 49 urban community hospital emergency departments during a 22-month period (N=2,759,529). We used the Rhee clinical surveillance criteria as our standard definition of sepsis and as the primary target for developing our model. The data were randomly split into training and test cohorts to derive and then evaluate the model. A feature selection process was carried out in 3 stages: first, we reviewed existing models for sepsis screening; second, we consulted with local subject matter experts; and third, we used a supervised machine learning called gradient boosting. Key metrics of performance included alert rate, area under the receiver operating characteristic curve, sensitivity, specificity, and precision. Performance was assessed at 1, 3, 6, 12, and 24 hours after an index time.

Results: The RoS score was the most discriminant screening tool at all time thresholds (area under the receiver operating characteristic curve 0.93 to 0.97). Compared with the next most discriminant benchmark (Sequential Organ Failure Assessment), RoS was significantly more sensitive (67.7% versus 49.2% at 1 hour and 84.6% versus 80.4% at 24 hours) and precise (27.6% versus 12.2% at 1 hour and 28.8% versus 11.4% at 24 hours). The sensitivity of qSOFA was relatively low (3.7% at 1 hour and 23.5% at 24 hours).

Conclusion: In this retrospective study, RoS was more timely and discriminant than benchmark screening tools, including those recommend by the Sepsis-3 Task Force. Further study is needed to validate the RoS score at independent sites.

Publication types

Evaluation Study
Multicenter Study
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Aged
Early Diagnosis
Female
Hospitals, Urban
Humans
Lactic Acid / metabolism
Machine Learning*
Male
Middle Aged
Organ Dysfunction Scores
Retrospective Studies
Sensitivity and Specificity
Sepsis / diagnosis*
Severity of Illness Index

Substances

Lactic Acid