Purpose: In distributed data networks, some data sites may be systematically missing important confounders that are captured by other sites in the network (eg, body mass index [BMI]). Multiple imputation may help repair bias in these scenarios. However, multiple imputation has not been described for distributed data networks where data access restrictions prevent centralized analysis.
Methods: We conducted a simulation study and a real-world analysis using the UK's Clinical Practice Research Datalink to evaluate multiple imputation for confounders that are systematically missing from a subset of data sites in mock distributed data networks. The simulation study addressed univariate missing data, while the real-world analysis addressed multivariate missing data. Both studies were designed as retrospective cohort studies of the effect of current statin use on the risk of myocardial infarction among patients with newly treated type 2 diabetes.
Results: In our simulation study, multiple imputation repaired bias from missing BMI in all scenarios, with a median bias reduction of 118% in the default scenario. In our real-world study, the multiply imputed analysis (hazard ratio [HR]: 0.86; 95% confidence interval [CI], 0.69-1.08) was closer to the analysis that considered the true confounder values (HR: 0.85; 95% CI, 0.66-1.10) than the analysis that ignored them (HR: 0.93; 95% CI, 0.73-1.20).
Conclusions: Multiple imputation adapted to distributed data settings is a feasible method to reduce bias from unmeasured but measurable confounders when at least one database contains the variables of interest. Further research is needed to evaluate its validity in real distributed data networks.
Keywords: bias; cohort study; confounding; distributed data network; missing data; multiple imputation; pharmacoepidemiology; simulation study.
© 2019 John Wiley & Sons, Ltd.