Learning from local to global: An efficient distributed algorithm for modeling time-to-event data

J Am Med Inform Assoc. 2020 Jul 1;27(7):1028-1036. doi: 10.1093/jamia/ocaa044.


Objective: We developed and evaluated a privacy-preserving One-shot Distributed Algorithm to fit a multicenter Cox proportional hazards model (ODAC) without sharing patient-level information across sites.

Materials and methods: Using patient-level data from a single site combined with only aggregated information from other sites, we constructed a surrogate likelihood function, approximating the Cox partial likelihood function obtained using patient-level data from all sites. By maximizing the surrogate likelihood function, each site obtained a local estimate of the model parameter, and the ODAC estimator was constructed as a weighted average of all the local estimates. We evaluated the performance of ODAC with (1) a simulation study and (2) a real-world use case study using 4 datasets from the Observational Health Data Sciences and Informatics network.

Results: On the one hand, our simulation study showed that ODAC provided estimates nearly the same as the estimator obtained by analyzing, in a single dataset, the combined patient-level data from all sites (ie, the pooled estimator). The relative bias was <0.1% across all scenarios. The accuracy of ODAC remained high across different sample sizes and event rates. On the other hand, the meta-analysis estimator, which was obtained by the inverse variance weighted average of the site-specific estimates, had substantial bias when the event rate is <5%, with the relative bias reaching 20% when the event rate is 1%. In the Observational Health Data Sciences and Informatics network application, the ODAC estimates have a relative bias <5% for 15 out of 16 log hazard ratios, whereas the meta-analysis estimates had substantially higher bias than ODAC.

Conclusions: ODAC is a privacy-preserving and noniterative method for implementing time-to-event analyses across multiple sites. It provides estimates on par with the pooled estimator and substantially outperforms the meta-analysis estimator when the event is uncommon, making it extremely suitable for studying rare events and diseases in a distributed manner.

Keywords: Cox proportional hazards model; data integration; distributed algorithm; electronic health record; meta-analysis.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Algorithms*
  • Bias
  • Computer Simulation
  • Datasets as Topic
  • Electronic Health Records*
  • Female
  • Humans
  • Likelihood Functions
  • Male
  • Middle Aged
  • Models, Statistical
  • Proportional Hazards Models*
  • Sample Size
  • Time Factors