Privacy-maintaining propensity score-based pooling of multiple databases applied to a study of biologics

Med Care. 2010 Jun;48(6 Suppl):S83-9. doi: 10.1097/MLR.0b013e3181d59541.

Abstract

Introduction: A large study on the safety of biologics required pooling of data from multiple data sources, but while extensive confounder adjustment was necessary, private, individual-level covariate information could not be shared.

Objectives: To describe the methods of pooling data that investigators considered, and to detail the strengths and limitations of the chosen method: a propensity score (PS)-based approach that allowed for full multivariate adjustment without compromising patient privacy.

Research design: The project had a central data coordinating center responsible for collection and analysis of data. Private data could not be transmitted to the data coordinating center. Investigators assessed 4 methods for pooled analyses: full covariate sharing, cell-aggregated sharing, meta-analysis, and the PS-based method. We evaluated each method for protection of private information, analytic integrity and flexibility, and ability to meet the study's operational and statistical needs.

Results: Analysis of 4 example datasets yielded substantially similar estimates if data were pooled with a PS versus individual covariates (0%-3% difference in point estimates). Several practical challenges arose. (1) PSs are best suited for dichotomous exposures but 6 or more exposure categories were desired; we chose a series of exposure contrasts with a common referent group. (2) Subgroup analyses had to be specified a priori. (3) Time-varying exposures and confounders required appropriate analytic handling including re-estimation of PSs. (4) Detection of heterogeneity among centers was necessary.

Conclusions: The PS-based pooling method offered strong protection of patient privacy and a reasonable balance between analytic integrity and flexibility of study execution. We would recommend its use in other studies that require pooling of databases, multivariate adjustment, and privacy protection.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Biological Products / adverse effects*
  • Confidentiality
  • Data Interpretation, Statistical*
  • Humans
  • Time Factors
  • Treatment Outcome

Substances

  • Biological Products