Reproducibility is essential in machine learning for healthcare (ML4H) research, particularly for its generalisability and external validation. This study investigates data mapping challenges in reproducing an acute kidney injury (AKI) prediction model within the local Electronic Health Record (EHR) system at St. James Hospital (SJH), Ireland. Key challenges include structural, syntactic, and semantic heterogeneity in EHR data and regulatory constraints. We employed a combination of expert-driven mapping, natural language processing (NLP) techniques, and standardised terminologies to align predictor variables. Despite these efforts, missing data and unit discrepancies required adaptations in feature selection and conversion methods. Our findings highlight the complexities of reproducibility in ML4H and underscore the necessity of domain expertise and standardised frameworks for cross-institutional model validation. Addressing these challenges is essential for improving generalisability and clinical impact.
Keywords: Acute Kidney Injury Prediction; Data heterogeneity in Electronic Health Records; Machine Learning for Healthcare; Reproducibility.