Cross-site imputation can recover missing variables in federated multicenter studies

J Clin Epidemiol. 2025 Aug:184:111820. doi: 10.1016/j.jclinepi.2025.111820. Epub 2025 May 30.

Abstract

Objectives: In multisite studies, it is common for some sites not to have recorded key variables. Although it is theoretically possible to use data from sites with recorded observations to impute the missing values, this process becomes challenging when data pooling is not feasible due to logistic or legal constraints. We, therefore, propose a multiple imputation approach-cross-site imputation-to recover any variables across sites without the need to pool individual-level data.

Methods: Cross-site imputation involves transporting predicted regression coefficients and variances from studies with observed data to impute missing variables at sites without data. The approach is illustrated in an applied example of recovering systematically missing confounders across Swedish hospitals, and theoretical considerations are outlined.

Results: Cross-site imputation successfully recovered systematically missing confounding variables independently at study sites where data were not recorded. The approach allowed us to include all hospitals in the fully adjusted analysis.

Conclusion: Given the increasing importance of multisite studies in observational research, cross-site imputation could offer a practical approach for imputing variables that have not been recorded in some study sites.

Keywords: Cross-site imputation; Distributed data network; Distributed learning; Federated analysis; Meta-analysis; Missing data; Multiple imputation.

MeSH terms

  • Confounding Factors, Epidemiologic
  • Data Interpretation, Statistical
  • Humans
  • Multicenter Studies as Topic* / methods
  • Multicenter Studies as Topic* / statistics & numerical data
  • Observational Studies as Topic* / methods
  • Sweden