The Development and Validation of a Machine Learning Model to Predict Bacteremia and Fungemia in Hospitalized Patients Using Electronic Health Record Data

Crit Care Med. 2020 Nov;48(11):e1020-e1028. doi: 10.1097/CCM.0000000000004556.

Abstract

Objectives: Bacteremia and fungemia can cause life-threatening illness with high mortality rates, which increase with delays in antimicrobial therapy. The objective of this study is to develop machine learning models to predict blood culture results at the time of the blood culture order using routine data in the electronic health record.

Design: Retrospective analysis of a large, multicenter inpatient data.

Setting: Two academic tertiary medical centers between the years 2007 and 2018.

Subjects: All hospitalized patients who received a blood culture during hospitalization.

Interventions: The dataset was partitioned temporally into development and validation cohorts: the logistic regression and gradient boosting machine models were trained on the earliest 80% of hospital admissions and validated on the most recent 20%.

Measurements and main results: There were 252,569 blood culture days-defined as nonoverlapping 24-hour periods in which one or more blood cultures were ordered. In the validation cohort, there were 50,514 blood culture days, with 3,762 cases of bacteremia (7.5%) and 370 cases of fungemia (0.7%). The gradient boosting machine model for bacteremia had significantly higher area under the receiver operating characteristic curve (0.78 [95% CI 0.77-0.78]) than the logistic regression model (0.73 [0.72-0.74]) (p < 0.001). The model identified a high-risk group with over 30 times the occurrence rate of bacteremia in the low-risk group (27.4% vs 0.9%; p < 0.001). Using the low-risk cut-off, the model identifies bacteremia with 98.7% sensitivity. The gradient boosting machine model for fungemia had high discrimination (area under the receiver operating characteristic curve 0.88 [95% CI 0.86-0.90]). The high-risk fungemia group had 252 fungemic cultures compared with one fungemic culture in the low-risk group (5.0% vs 0.02%; p < 0.001). Further, the high-risk group had a mortality rate 60 times higher than the low-risk group (28.2% vs 0.4%; p < 0.001).

Conclusions: Our novel models identified patients at low and high-risk for bacteremia and fungemia using routinely collected electronic health record data. Further research is needed to evaluate the cost-effectiveness and impact of model implementation in clinical practice.

Publication types

  • Multicenter Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Aged
  • Bacteremia / blood
  • Bacteremia / diagnosis*
  • Bacteremia / etiology
  • Bacteremia / microbiology
  • Blood Culture
  • Electronic Health Records / statistics & numerical data*
  • Female
  • Fungemia / blood
  • Fungemia / diagnosis*
  • Fungemia / etiology
  • Fungemia / microbiology
  • Hospitalization / statistics & numerical data
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Models, Statistical
  • Reproducibility of Results
  • Retrospective Studies
  • Risk Factors