Machine learning for early detection of sepsis: an internal and temporal validation study

Armando D Bedoya; Joseph Futoma; Meredith E Clement; Kristin Corey; Nathan Brajer; Anthony Lin; Morgan G Simons; Michael Gao; Marshall Nichols; Suresh Balu; Katherine Heller; Mark Sendak; Cara O'Brien

doi:10.1093/jamiaopen/ooaa006

Machine learning for early detection of sepsis: an internal and temporal validation study

JAMIA Open. 2020 Apr 11;3(2):252-260. doi: 10.1093/jamiaopen/ooaa006. eCollection 2020 Jul.

Authors

Armando D Bedoya¹, Joseph Futoma^{2

3}, Meredith E Clement⁴, Kristin Corey^{5

6}, Nathan Brajer^{5

6}, Anthony Lin^{5

6}, Morgan G Simons^{5

6}, Michael Gao⁵, Marshall Nichols⁵, Suresh Balu^{5

6}, Katherine Heller², Mark Sendak⁵, Cara O'Brien⁷

Affiliations

¹ Department of Medicine, Division of Pulmonary, Allergy, and Critical Care Medicine, Duke University, Durham, North Carolina, USA.
² Department of Statistics, Duke University, Durham, North Carolina, USA.
³ John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, USA.
⁴ Department of Medicine, Division of Infectious Diseases, Duke University, Durham, North Carolina, USA.
⁵ Duke Institute for Health Innovation, Durham, North Carolina, USA.
⁶ Duke University School of Medicine, Durham, North Carolina, USA.
⁷ Department of Medicine, Durham, North Carolina, USA.

Abstract

Objective: Determine if deep learning detects sepsis earlier and more accurately than other models. To evaluate model performance using implementation-oriented metrics that simulate clinical practice.

Materials and methods: We trained internally and temporally validated a deep learning model (multi-output Gaussian process and recurrent neural network [MGP-RNN]) to detect sepsis using encounters from adult hospitalized patients at a large tertiary academic center. Sepsis was defined as the presence of 2 or more systemic inflammatory response syndrome (SIRS) criteria, a blood culture order, and at least one element of end-organ failure. The training dataset included demographics, comorbidities, vital signs, medication administrations, and labs from October 1, 2014 to December 1, 2015, while the temporal validation dataset was from March 1, 2018 to August 31, 2018. Comparisons were made to 3 machine learning methods, random forest (RF), Cox regression (CR), and penalized logistic regression (PLR), and 3 clinical scores used to detect sepsis, SIRS, quick Sequential Organ Failure Assessment (qSOFA), and National Early Warning Score (NEWS). Traditional discrimination statistics such as the C-statistic as well as metrics aligned with operational implementation were assessed.

Results: The training set and internal validation included 42 979 encounters, while the temporal validation set included 39 786 encounters. The C-statistic for predicting sepsis within 4 h of onset was 0.88 for the MGP-RNN compared to 0.836 for RF, 0.849 for CR, 0.822 for PLR, 0.756 for SIRS, 0.619 for NEWS, and 0.481 for qSOFA. MGP-RNN detected sepsis a median of 5 h in advance. Temporal validation assessment continued to show the MGP-RNN outperform all 7 clinical risk score and machine learning comparisons.

Conclusions: We developed and validated a novel deep learning model to detect sepsis. Using our data elements and feature set, our modeling approach outperformed other machine learning methods and clinical scores.

Keywords: ROC curve; adult; clinical; decision; electronic health records/statistics and numerical data; emergency service; hospital/statistics and numerical data; hospitalization/statistics and numerical data; machine learning; retrospective studies; sepsis/mortality; support systems.