Background: Machine learning models to predict hypoxia in patients could improve timely interventions. Due to the diversity and limited generalizability of approaches, external validation is required.
Objective: This study aimed to validate the generalizability of SpO2 Waveform ICU Forecasting Technique (SWIFT), an LSTM algorithm for predicting SpO2 5 and 30 min in advance, on two external datasets.
Methods: We trained the SWIFT model on eICU Collaborative Research Database (eICU-CRD) and validated it on Medical Information Mart for Intensive Care IV (MIMIC-IV) and Amsterdam University Medical Centers Database (UMCdb) data. We evaluated SWIFT-5 and SWIFT-30 for ventilated and non-ventilated populations.
Results: The sampling procedure resulted in substantial population size reduction for MIMIC-IV and UMCdb data due to differences in SpO2 measurement frequency. SWIFT performed well on eICU-CRD data but showed reduced performance on MIMIC-IV data, particularly for SWIFT-30. UMCdb validation demonstrated promise, with comparable performance to eICU-CRD for ventilated patients. All datasets exhibited high specificity and NPV, critical for gaining trust in alarms in clinical applications.
Conclusions: The study highlights challenges in generalizing prediction models across diverse ICU populations, emphasizing need for external validation. Further research should focus on improving model adaptability and interpretability, considering the practical application in clinical settings.
Keywords: Artificial intelligence; Deep learning; Hypoxemia; ICU; Machine learning.
Copyright © 2025 The Authors. Published by Elsevier Inc. All rights reserved.