Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jan 1;20(1):109-16.
doi: 10.1136/amiajnl-2012-001032. Epub 2012 Oct 11.

SHARE: system design and case studies for statistical health information release

Affiliations

SHARE: system design and case studies for statistical health information release

James Gardner et al. J Am Med Inform Assoc. .

Abstract

Objectives: We present SHARE, a new system for statistical health information release with differential privacy. We present two case studies that evaluate the software on real medical datasets and demonstrate the feasibility and utility of applying the differential privacy framework on biomedical data.

Materials and methods: SHARE releases statistical information in electronic health records with differential privacy, a strong privacy framework for statistical data release. It includes a number of state-of-the-art methods for releasing multidimensional histograms and longitudinal patterns. We performed a variety of experiments on two real datasets, the surveillance, epidemiology and end results (SEER) breast cancer dataset and the Emory electronic medical record (EeMR) dataset, to demonstrate the feasibility and utility of SHARE.

Results: Experimental results indicate that SHARE can deal with heterogeneous data present in medical data, and that the released statistics are useful. The Kullback-Leibler divergence between the released multidimensional histograms and the original data distribution is below 0.5 and 0.01 for seven-dimensional and three-dimensional data cubes generated from the SEER dataset, respectively. The relative error for longitudinal pattern queries on the EeMR dataset varies between 0 and 0.3. While the results are promising, they also suggest that challenges remain in applying statistical data release using the differential privacy framework for higher dimensional data.

Conclusions: SHARE is one of the first systems to provide a mechanism for custodians to release differentially private aggregate statistics for a variety of use cases in the medical domain. This proof-of-concept system is intended to be applied to large-scale medical data warehouses.

PubMed Disclaimer

Figures

Figure 1
Figure 1
System overview of statistical health information release (SHARE): it is integrated with health information de-identification (HIDE) to provide both de-identification and differentially private statistical data release for unstructured and structured records. This figure is only reproduced in colour in the online version.
Figure 2
Figure 2
Overview and examples of differentially private histogram release (DPCube) and longitudinal pattern release (DPTrie) in statistical health information release (SHARE). This figure is only reproduced in colour in the online version.
Figure 3
Figure 3
Histograms of death cause after cancer diagnosis relative to the year of diagnosis and age of diagnosis generated from full data cubes for the surveillance, epidemiology and end results (SEER) dataset. All figures use green to indicate death as a result of cancer and blue to indicate other causes of death. This figure is only reproduced in colour in the online version.
Figure 4
Figure 4
Comparison of DPCube and baseline for number of cancer deaths relative to the year of diagnosis generated from the full data cubes (seven-dimensional) and reduced data cubes (three-dimensional). KL, Kullback–Leibler. This figure is only reproduced in colour in the online version.
Figure 5
Figure 5
Random selection of physicians where X value is month of residence and Y value is the average number of e-prescriptions (eRx) per visit in each month.
Figure 6
Figure 6
Average counts and query errors of longitudinal queries with respect to time length for the Emory electronic medical record (EeMR) dataset. This figure is only reproduced in colour in the online version.

Similar articles

Cited by

References

    1. Advisory C for USPIT, PITAC, (PITAC) President's Information Technology Advisory Committee. Revolutionizing health care through information technology. National Coordination Office for Information Technology Research and Development, 2004
    1. Stead WW, Lin HS, eds. Computational technology for effective health care: immediate steps and strategic directions. Committee on Engaging the Computer Science Research Community in Health Care Informatics; National Research Council. Washington DC: The National Academies Press, 2009 - PubMed
    1. Nass SJ, Levit LA, Gostin LO. Beyond the HIPAA privacy rule: enhancing privacy, improving health through research. Washington DC: National Academy Press, 2009 - PubMed
    1. Fung BCM, Wang K, Chen R, et al. Privacy-preserving data publishing: a survey of recent developments. ACM Computing Surveys 2010;42:1–534
    1. Malin B. An evaluation of the current state of genomic data privacy protection technology and a roadmap for the future. J Am Med Inform Assoc 2005;12:28–34 - PMC - PubMed

Publication types