Objectives: To derive and externally validate supervised machine learning (ML) models predictive of cardiac surgery-associated acute kidney injury (CS-AKI).
Design: Retrospective cohort analysis.
Setting: Multicenter (4), cardiac surgical centers from January 2019 to February 2022.
Patients: Seven days to 18 years old who had undergone cardiac surgery.
Interventions: None.
Measurements and main results: CS-AKI was defined using Kidney Disease: Improving Global Outcomes criteria, with stages 2/3 classified as severe, during the first 7 postoperative days. Data analysis followed two approaches: 1) combining three centers for derivation and using a fourth for external validation and 2) randomly dividing the entire dataset into derivation and validation cohorts in a 4:1 ratio. Forty ML models were developed across five derivation-validation pairs using four ML algorithms (light gradient-boosting machine, extreme gradient boosting, categorical boosting, and histogram gradient boosting) to predict two outcomes (any and severe CS-AKI) utilizing preoperative, intraoperative, and immediate postoperative variables. SHapley Additive exPlanations was used for input variable importance analysis. A cohort of 1100 patients was analyzed. Any CS-AKI and severe CS-AKI occurred in 49.1% and 23.1% patients, respectively. Wide range of variations in external validation of model performance were observed among all 40 ML models. For any CS-AKI, the range in metrics were: area under the receiver operating characteristic curve (AUROC) 0.64-0.83, sensitivity 0.29-0.86, specificity 0.46-0.95, positive predictive value (PPV) 0.50-0.85, and negative predictive value (NPV) 0.60-0.86. For severe CS-AKI, we found the range in metrics with AUROC 0.65-0.77, sensitivity 0.04-0.58, specificity 0.77-0.99, PPV 0.32-0.75, and NPV 0.78-0.90. Preoperative serum creatinine, cardiopulmonary bypass, aortic cross-clamp duration, weight, and age at surgery were the most important predictors associated with CS-AKI.
Conclusions: This analysis of a retrospective multicenter dataset shows that external performance of ML models vary, highlighting challenges in generalizability, which may be due to center-based differences in practice.
Keywords: acute kidney injury; cardiac surgery; congenital heart disease; pediatrics; supervised machine learning.
Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the Society of Critical Care Medicine and the World Federation of Pediatric Intensive and Critical Care Societies.