Background: Sepsis is a life-threatening condition that can rapidly lead to organ damage and death. Existing risk scores predict outcomes for patients who have already become acutely ill.
Objective: We aimed to develop a model for identifying patients at risk of getting sepsis within 2 years in order to support the reduction of sepsis morbidity and mortality.
Methods: Machine learning was applied to 2,683,049 electronic health records (EHRs) with over 64 million encounters across five states to develop models for predicting a patient's risk of getting sepsis within 2 years. Features were selected to be easily obtainable from a patient's chart in real time during ambulatory encounters.
Results: The models showed consistent prediction scores, with the highest area under the receiver operating characteristic curve of 0.82 and a positive likelihood ratio of 2.9 achieved with gradient boosting on all features combined. Predictive features included age, sex, ethnicity, average ambulatory heart rate, standard deviation of BMI, and the number of prior medical conditions and procedures. The findings identified both known and potential new risk factors for long-term sepsis. Model variations also illustrated trade-offs between incrementally higher accuracy, implementability, and interpretability.
Conclusions: Accurate implementable models were developed to predict the 2-year risk of sepsis, using EHR data that is easy to obtain from ambulatory encounters. These results help advance the understanding of sepsis and provide a foundation for future trials of risk-informed preventive care.
Keywords: clinical decision making; electronic health records; machine learning; prevention; risk factors; risk prediction; sepsis.
©Jewel Y Lee, Sevda Molani, Chen Fang, Kathleen Jade, D Shane O'Mahony, Sergey A Kornilov, Lindsay T Mico, Jennifer J Hadlock. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 08.07.2021.