Privacy-preserving heterogeneous health data sharing
- PMID: 23242630
- PMCID: PMC3628047
- DOI: 10.1136/amiajnl-2012-001027
Privacy-preserving heterogeneous health data sharing
Abstract
Objective: Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among existing privacy models, ε-differential privacy provides one of the strongest privacy guarantees and makes no assumptions about an adversary's background knowledge. All existing solutions that ensure ε-differential privacy handle the problem of disclosing relational and set-valued data in a privacy-preserving manner separately. In this paper, we propose an algorithm that considers both relational and set-valued data in differentially private disclosure of healthcare data.
Methods: The proposed approach makes a simple yet fundamental switch in differentially private algorithm design: instead of listing all possible records (ie, a contingency table) for noise addition, records are generalized before noise addition. The algorithm first generalizes the raw data in a probabilistic way, and then adds noise to guarantee ε-differential privacy.
Results: We showed that the disclosed data could be used effectively to build a decision tree induction classifier. Experimental results demonstrated that the proposed algorithm is scalable and performs better than existing solutions for classification analysis.
Limitation: The resulting utility may degrade when the output domain size is very large, making it potentially inappropriate to generate synthetic data for large health databases.
Conclusions: Unlike existing techniques, the proposed algorithm allows the disclosure of health data containing both relational and set-valued data in a differentially private manner, and can retain essential information for discriminative analysis.
Figures
Similar articles
-
Differential privacy in health research: A scoping review.J Am Med Inform Assoc. 2021 Sep 18;28(10):2269-2276. doi: 10.1093/jamia/ocab135. J Am Med Inform Assoc. 2021. PMID: 34333623 Free PMC article. Review.
-
DPSynthesizer: Differentially Private Data Synthesizer for Privacy Preserving Data Sharing.Proceedings VLDB Endowment. 2014 Aug;7(13):1677-1680. doi: 10.14778/2733004.2733059. Proceedings VLDB Endowment. 2014. PMID: 26167358 Free PMC article.
-
Insuring against the perils in distributed learning: privacy-preserving empirical risk minimization.Math Biosci Eng. 2021 Mar 29;18(4):3006-3033. doi: 10.3934/mbe.2021151. Math Biosci Eng. 2021. PMID: 34198373
-
An Efficient Big Data Anonymization Algorithm Based on Chaos and Perturbation Techniques.Entropy (Basel). 2018 May 17;20(5):373. doi: 10.3390/e20050373. Entropy (Basel). 2018. PMID: 33265463 Free PMC article.
-
New Methods to Protect Privacy When Using Patient Health Data to Compare Treatments [Internet].Washington (DC): Patient-Centered Outcomes Research Institute (PCORI); 2021 Feb. Washington (DC): Patient-Centered Outcomes Research Institute (PCORI); 2021 Feb. PMID: 38232192 Free Books & Documents. Review.
Cited by
-
Privacy-Enhancing Technologies in Biomedical Data Science.Annu Rev Biomed Data Sci. 2024 Aug;7(1):317-343. doi: 10.1146/annurev-biodatasci-120423-120107. Annu Rev Biomed Data Sci. 2024. PMID: 39178425 Free PMC article. Review.
-
A Novel Privacy Paradigm for Improving Serial Data Privacy.Sensors (Basel). 2022 Apr 6;22(7):2811. doi: 10.3390/s22072811. Sensors (Basel). 2022. PMID: 35408425 Free PMC article.
-
Differential privacy in health research: A scoping review.J Am Med Inform Assoc. 2021 Sep 18;28(10):2269-2276. doi: 10.1093/jamia/ocab135. J Am Med Inform Assoc. 2021. PMID: 34333623 Free PMC article. Review.
-
Differentially private release of medical microdata: an efficient and practical approach for preserving informative attribute values.BMC Med Inform Decis Mak. 2020 Jul 8;20(1):155. doi: 10.1186/s12911-020-01171-5. BMC Med Inform Decis Mak. 2020. PMID: 32641043 Free PMC article.
-
Selecting Optimal Subset to release under Differentially Private M-estimators from Hybrid Datasets.IEEE Trans Knowl Data Eng. 2018 Mar 1;30(3):573-584. doi: 10.1109/TKDE.2017.2773545. Epub 2017 Nov 14. IEEE Trans Knowl Data Eng. 2018. PMID: 30034201 Free PMC article.
References
-
- O'Keefe CM. Privacy and the use of health data—reducing disclosure risk. Electronic J Health Informatics 2008;3:e5:1–e5:9
-
- Standards for privacy of individually identifiable health information. Final Rule, 45 CFR parts 160 and 164. http://www.hhs.gov/ocr/privacy/hipaa/administrative/privacyrule/adminsim... (accessed 20 Feb 2012)
-
- Baumer D, Earp JB, Payton FC. Privacy of medical records: IT implications of HIPAA. ACM Comput Soc (SIGCAS) 2000;30:40–7
