Statistical Methods to Preserve Patient Privacy When Sharing and Analyzing Data [Internet]

Review
Washington (DC): Patient-Centered Outcomes Research Institute (PCORI); 2020 Jun.

Excerpt

Background: Multicenter research networks support a wide range of patient-centered outcomes research (PCOR) activities. However, multicenter studies must address issues surrounding patient privacy and data security, in accordance with federal, state, and institutional requirements. Although these challenges can be addressed in part by governance, privacy-preserving analytic and data-sharing methods offer a new way to tackle them. Requiring only summary-level information to perform sophisticated analysis, these methods have potential to increase stakeholders' willingness and ability to collaborate efficiently in multisite studies. However, significant gaps remain because these methods are not understood by most stakeholders and have not been systematically assessed in the context of PCOR.

Objectives: This project aimed to (1) assess stakeholders' understanding of and preference for privacy-preserving analytic and data-sharing methods, and assess the benefits and limitations of implementing them in multisite PCOR studies; (2) develop or enhance a suite of privacy-preserving methods to perform rigorous analysis without sharing individual-level data; and (3) create freely available dissemination tools, including analytic code, educational materials, technical documentation, and user guides for these methods.

Aim 1: We developed tailored educational materials to introduce these methods to the following: 6 stakeholder groups, including 2 patient groups; health care system leaders; multicenter research governance experts; regulatory, compliance, and confidentiality board members and leaders; and researchers. We assessed their willingness and ability to collaborate if privacy-preserving methods were used, identified potential barriers to implementing these methods, and identified new analytic features that best fit stakeholders' needs, preferences, and priorities.

Aim 2: We assessed the statistical performance of several privacy-preserving methods using only aggregate-level information to conduct multivariable-adjusted analysis in PCOR, including case-centered analysis of risk-set data, meta-analysis of site-specific effect estimates, and stratified or matched analysis of confounder summary score-based information. Using simulated and real-world data, we evaluated the performance of these methods by comparing their results against the results from the pooled individual-level data analysis in various study settings, and their applicability within a 3-site distributed network.

Aim 3: We partnered with stakeholders in the development, implementation, and dissemination of user-friendly and freely available analytic tools and documentation for implementation of these privacy-preserving methods in multisite PCOR studies.

Aim 1: We completed 11 one-on-one or group interview sessions, involving the following: patients (n = 15); health care system leaders (n = 4); multicenter research governance experts (n = 2); regulatory, compliance, and confidentiality experts (n = 3); and researchers (n = 10). Perceptions of the benefits and value of research were the strongest influences toward data sharing; cost and security risks were primary influences against sharing. Privacy-preserving methods were acknowledged as being appealing, but there were concerns about increased cost and potential loss of research validity.

Aim 2: Both simulation and empirical studies showed that these privacy-preserving methods produced results that were identical to or highly comparable with results obtained from pooled individual-level data analysis in most of the scenarios examined.

Aim 3: We developed freely available analytic tools and documentation to implement these methods and disseminate our findings.

Conclusions: Stakeholders were open to data sharing in multicenter studies that offer value and minimize security risks. Several privacy-preserving analytic and data-sharing methods produced results highly consistent with those from conventional pooled individual-level data analysis and can help reduce the barriers to conducting multicenter studies.

Study Limitations: The stakeholders interviewed were relatively selective, so their perspectives may not generalize to all stakeholders. The scenarios examined in the simulation study and empirical analysis were not exhaustive. The study also did not examine all the available privacy-preserving methods (eg, distributed regression).

Publication types

  • Review