Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 May 1;20(3):462-9.
doi: 10.1136/amiajnl-2012-001027. Epub 2012 Dec 13.

Privacy-preserving heterogeneous health data sharing

Affiliations

Privacy-preserving heterogeneous health data sharing

Noman Mohammed et al. J Am Med Inform Assoc. .

Abstract

Objective: Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among existing privacy models, ε-differential privacy provides one of the strongest privacy guarantees and makes no assumptions about an adversary's background knowledge. All existing solutions that ensure ε-differential privacy handle the problem of disclosing relational and set-valued data in a privacy-preserving manner separately. In this paper, we propose an algorithm that considers both relational and set-valued data in differentially private disclosure of healthcare data.

Methods: The proposed approach makes a simple yet fundamental switch in differentially private algorithm design: instead of listing all possible records (ie, a contingency table) for noise addition, records are generalized before noise addition. The algorithm first generalizes the raw data in a probabilistic way, and then adds noise to guarantee ε-differential privacy.

Results: We showed that the disclosed data could be used effectively to build a decision tree induction classifier. Experimental results demonstrated that the proposed algorithm is scalable and performs better than existing solutions for classification analysis.

Limitation: The resulting utility may degrade when the output domain size is very large, making it potentially inappropriate to generate synthetic data for large health databases.

Conclusions: Unlike existing techniques, the proposed algorithm allows the disclosure of health data containing both relational and set-valued data in a differentially private manner, and can retain essential information for discriminative analysis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Taxonomy tree of attributes.
Figure 2
Figure 2
Tree for partitioning records. A randomized mechanism was deployed for specializing predictors in a top-down manner (using half of the privacy budget). At leaf nodes, random noise is added to the count of elements using the second half of the privacy budget to ensure overall ε-differentially private outputs.
Figure 3
Figure 3
Classification accuracy for the MIMIC data set using DiffGen based on two scoring functions: information gain (INFOGAIN) and maximum frequency (MAX).
Figure 4
Figure 4
Classification accuracy for the Adult data set. BA, baseline accuracy; LA, lower-bound accuracy.
Figure 5
Figure 5
Comparison of DiffGen with DiffP-C4.5 and top-down specialization (TDS) algorithms. (A) Evaluation of averaged accuracy, where the bottom and topmost lines stand for the worst case (ie, all records generalized to one super record) and the optimal case (ie, no record is generalized at all), respectively; (B) evaluation in terms of reading, anonymization, and writing time of all three algorithms.

Similar articles

Cited by

References

    1. O'Keefe CM. Privacy and the use of health data—reducing disclosure risk. Electronic J Health Informatics 2008;3:e5:1–e5:9
    1. Standards for privacy of individually identifiable health information. Final Rule, 45 CFR parts 160 and 164. http://www.hhs.gov/ocr/privacy/hipaa/administrative/privacyrule/adminsim... (accessed 20 Feb 2012)
    1. Benitez K, Loukides G, Malin BA. Beyond safe harbor: automatic discovery of health information de-identification policy alternatives. The 1st ACM International Health Informatics Symposium; ACM, 2010:163–72 - PMC - PubMed
    1. Madsen E, Masys DR, Miller RA. HIPAA Possumus. J Am Med Inform Assoc 2003;10:294. - PMC - PubMed
    1. Baumer D, Earp JB, Payton FC. Privacy of medical records: IT implications of HIPAA. ACM Comput Soc (SIGCAS) 2000;30:40–7

Publication types