Data Mapping Challenges in Reproducibility of Machine Learning for Acute Kidney Injury Prediction

Stud Health Technol Inform. 2025 Jun 26:328:220-224. doi: 10.3233/SHTI250706.

Abstract

Reproducibility is essential in machine learning for healthcare (ML4H) research, particularly for its generalisability and external validation. This study investigates data mapping challenges in reproducing an acute kidney injury (AKI) prediction model within the local Electronic Health Record (EHR) system at St. James Hospital (SJH), Ireland. Key challenges include structural, syntactic, and semantic heterogeneity in EHR data and regulatory constraints. We employed a combination of expert-driven mapping, natural language processing (NLP) techniques, and standardised terminologies to align predictor variables. Despite these efforts, missing data and unit discrepancies required adaptations in feature selection and conversion methods. Our findings highlight the complexities of reproducibility in ML4H and underscore the necessity of domain expertise and standardised frameworks for cross-institutional model validation. Addressing these challenges is essential for improving generalisability and clinical impact.

Keywords: Acute Kidney Injury Prediction; Data heterogeneity in Electronic Health Records; Machine Learning for Healthcare; Reproducibility.

MeSH terms

  • Acute Kidney Injury* / diagnosis
  • Electronic Health Records*
  • Humans
  • Ireland
  • Machine Learning*
  • Natural Language Processing*
  • Reproducibility of Results