Improving mixed-integer temporal modeling by generating synthetic data using conditional generative adversarial networks: A case study of fluid overload prediction in the intensive care unit
- PMID: 38011778
- DOI: 10.1016/j.compbiomed.2023.107749
Improving mixed-integer temporal modeling by generating synthetic data using conditional generative adversarial networks: A case study of fluid overload prediction in the intensive care unit
Abstract
Objective: The challenge of mixed-integer temporal data, which is particularly prominent for medication use in the critically ill, limits the performance of predictive models. The purpose of this evaluation was to pilot test integrating synthetic data within an existing dataset of complex medication data to improve machine learning model prediction of fluid overload.
Materials and methods: This retrospective cohort study evaluated patients admitted to an ICU ≥ 72 h. Four machine learning algorithms to predict fluid overload after 48-72 h of ICU admission were developed using the original dataset. Then, two distinct synthetic data generation methodologies (synthetic minority over-sampling technique (SMOTE) and conditional tabular generative adversarial network (CTGAN)) were used to create synthetic data. Finally, a stacking ensemble technique designed to train a meta-learner was established. Models underwent training in three scenarios of varying qualities and quantities of datasets.
Results: Training machine learning algorithms on the combined synthetic and original dataset overall increased the performance of the predictive models compared to training on the original dataset. The highest performing model was the meta-model trained on the combined dataset with 0.83 AUROC while it managed to significantly enhance the sensitivity across different training scenarios.
Discussion: The integration of synthetically generated data is the first time such methods have been applied to ICU medication data and offers a promising solution to enhance the performance of machine learning models for fluid overload, which may be translated to other ICU outcomes. A meta-learner was able to make a trade-off between different performance metrics and improve the ability to identify the minority class.
Keywords: Critical care; Fluid overload; GAN; Machine learning; Mixed-integer temporal modeling; Synthetic data.
Copyright © 2023 Elsevier Ltd. All rights reserved.
Conflict of interest statement
Declaration of competing interest We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.
Update of
-
Improving irregular temporal modeling by integrating synthetic data to the electronic medical record using conditional GANs: a case study of fluid overload prediction in the intensive care unit.medRxiv [Preprint]. 2023 Jun 27:2023.06.20.23291680. doi: 10.1101/2023.06.20.23291680. medRxiv. 2023. Update in: Comput Biol Med. 2024 Jan;168:107749. doi: 10.1016/j.compbiomed.2023.107749. PMID: 37425768 Free PMC article. Updated. Preprint.
Similar articles
-
Improving irregular temporal modeling by integrating synthetic data to the electronic medical record using conditional GANs: a case study of fluid overload prediction in the intensive care unit.medRxiv [Preprint]. 2023 Jun 27:2023.06.20.23291680. doi: 10.1101/2023.06.20.23291680. medRxiv. 2023. Update in: Comput Biol Med. 2024 Jan;168:107749. doi: 10.1016/j.compbiomed.2023.107749. PMID: 37425768 Free PMC article. Updated. Preprint.
-
Machine learning vs. traditional regression analysis for fluid overload prediction in the ICU.Sci Rep. 2023 Nov 10;13(1):19654. doi: 10.1038/s41598-023-46735-3. Sci Rep. 2023. PMID: 37949982 Free PMC article.
-
Dynamic and explainable machine learning prediction of mortality in patients in the intensive care unit: a retrospective study of high-frequency data in electronic patient records.Lancet Digit Health. 2020 Apr;2(4):e179-e191. doi: 10.1016/S2589-7500(20)30018-2. Epub 2020 Mar 12. Lancet Digit Health. 2020. PMID: 33328078
-
Utilization of Synthetic Near-Infrared Spectra via Generative Adversarial Network to Improve Wood Stiffness Prediction.Sensors (Basel). 2024 Mar 21;24(6):1992. doi: 10.3390/s24061992. Sensors (Basel). 2024. PMID: 38544255 Free PMC article.
-
Dataset Design for Building Models of Chemical Reactivity.ACS Cent Sci. 2023 Dec 8;9(12):2196-2204. doi: 10.1021/acscentsci.3c01163. eCollection 2023 Dec 27. ACS Cent Sci. 2023. PMID: 38161380 Free PMC article. Review.
Cited by
-
Robust Meta-Model for Predicting the Likelihood of Receiving Blood Transfusion in Non-traumatic Intensive Care Unit Patients.Health Data Sci. 2024 Nov 6;4:0197. doi: 10.34133/hds.0197. eCollection 2024. Health Data Sci. 2024. PMID: 39507297 Free PMC article.
-
A common data model for the standardization of intensive care unit medication features.JAMIA Open. 2024 May 2;7(2):ooae033. doi: 10.1093/jamiaopen/ooae033. eCollection 2024 Jul. JAMIA Open. 2024. PMID: 38699649 Free PMC article.
-
Acute ischemic stroke prediction and predictive factors analysis using hematological indicators in elderly hypertensives post-transient ischemic attack.Sci Rep. 2024 Jan 6;14(1):695. doi: 10.1038/s41598-024-51402-2. Sci Rep. 2024. PMID: 38184714 Free PMC article.
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
