Introduction: Delirium occurrence is common and preventive strategies are resource intensive. Screening tools can prioritize patients at risk. Using machine learning, we can capture time and treatment effects that pose a challenge to delirium prediction. We aim to develop a delirium prediction model that can be used as a screening tool.
Methods: From the eICU Collaborative Research Database (eICU-CRD) and the Medical Information Mart for Intensive Care version III (MIMIC-III) database, patients with one or more Confusion Assessment Method-Intensive Care Unit (CAM-ICU) values and intensive care unit (ICU) length of stay greater than 24 h were included in our study. We validated our model using 21 quantitative clinical parameters and assessed performance across a range of observation and prediction windows, using different thresholds and applied interpretation techniques. We evaluate our models based on stratified repeated cross-validation using 3 algorithms, namely Logistic Regression, Random Forest, and Bidirectional Long Short-Term Memory (BiLSTM). BiLSTM represents an evolution from recurrent neural network-based Long Short-Term Memory, and with a backward input, preserves information from both past and future. Model performance is measured using Area Under Receiver Operating Characteristic, Area Under Precision Recall Curve, Recall, Precision (Positive Predictive Value), and Negative Predictive Value metrics.
Results: We evaluated our results on 16 546 patients (47% female) and 6294 patients (44% female) from eICU-CRD and MIMIC-III databases, respectively. Performance was best in BiLSTM models where, precision and recall changed from 37.52% (95% confidence interval [CI], 36.00%-39.05%) to 17.45 (95% CI, 15.83%-19.08%) and 86.1% (95% CI, 82.49%-89.71%) to 75.58% (95% CI, 68.33%-82.83%), respectively as prediction window increased from 12 to 96 h. After optimizing for higher recall, precision and recall changed from 26.96% (95% CI, 24.99%-28.94%) to 11.34% (95% CI, 10.71%-11.98%) and 93.73% (95% CI, 93.1%-94.37%) to 92.57% (95% CI, 88.19%-96.95%), respectively. Comparable results were obtained in the MIMIC-III cohort.
Conclusions: Our model performed comparably to contemporary models using fewer variables. Using techniques like sliding windows, modification of threshold to augment recall and feature ranking for interpretability, we addressed shortcomings of current models.
Keywords: artificial intelligence; clinical decision support; delirium; machine learning; nursing assessment; predictive modeling.
© The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association.