Reducing patient re-identification risk for laboratory results within research datasets
- PMID: 22822040
- PMCID: PMC3555327
- DOI: 10.1136/amiajnl-2012-001026
Reducing patient re-identification risk for laboratory results within research datasets
Abstract
Objective: To try to lower patient re-identification risks for biomedical research databases containing laboratory test results while also minimizing changes in clinical data interpretation.
Materials and methods: In our threat model, an attacker obtains 5-7 laboratory results from one patient and uses them as a search key to discover the corresponding record in a de-identified biomedical research database. To test our models, the existing Vanderbilt TIME database of 8.5 million Safe Harbor de-identified laboratory results from 61 280 patients was used. The uniqueness of unaltered laboratory results in the dataset was examined, and then two data perturbation models were applied-simple random offsets and an expert-derived clinical meaning-preserving model. A rank-based re-identification algorithm to mimic an attack was used. The re-identification risk and the retention of clinical meaning for each model's perturbed laboratory results were assessed.
Results: Differences in re-identification rates between the algorithms were small despite substantial divergence in altered clinical meaning. The expert algorithm maintained the clinical meaning of laboratory results better (affecting up to 4% of test results) than simple perturbation (affecting up to 26%).
Discussion and conclusion: With growing impetus for sharing clinical data for research, and in view of healthcare-related federal privacy regulation, methods to mitigate risks of re-identification are important. A practical, expert-derived perturbation algorithm that demonstrated potential utility was developed. Similar approaches might enable administrators to select data protection scheme parameters that meet their preferences in the trade-off between the protection of privacy and the retention of clinical meaning of shared data.
Conflict of interest statement
Figures
Similar articles
-
Evaluating re-identification risks with respect to the HIPAA privacy rule.J Am Med Inform Assoc. 2010 Mar-Apr;17(2):169-77. doi: 10.1136/jamia.2009.000026. J Am Med Inform Assoc. 2010. PMID: 20190059 Free PMC article.
-
Design and implementation of a privacy preserving electronic health record linkage tool in Chicago.J Am Med Inform Assoc. 2015 Sep;22(5):1072-80. doi: 10.1093/jamia/ocv038. Epub 2015 Jun 23. J Am Med Inform Assoc. 2015. PMID: 26104741 Free PMC article.
-
The disclosure of diagnosis codes can breach research participants' privacy.J Am Med Inform Assoc. 2010 May-Jun;17(3):322-7. doi: 10.1136/jamia.2009.002725. J Am Med Inform Assoc. 2010. PMID: 20442151 Free PMC article.
-
Privacy preserving interactive record linkage (PPIRL).J Am Med Inform Assoc. 2014 Mar-Apr;21(2):212-20. doi: 10.1136/amiajnl-2013-002165. Epub 2013 Nov 7. J Am Med Inform Assoc. 2014. PMID: 24201028 Free PMC article. Review.
-
Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies.Med Care. 2012 Jul;50 Suppl(Suppl):S82-101. doi: 10.1097/MLR.0b013e3182585355. Med Care. 2012. PMID: 22692265 Free PMC article. Review.
Cited by
-
Reidentification of Participants in Shared Clinical Data Sets: Experimental Study.JMIR AI. 2024 Mar 15;3:e52054. doi: 10.2196/52054. JMIR AI. 2024. PMID: 38875581 Free PMC article.
-
Regulations and Norms for Reuse of Residual Clinical Biospecimens and Health Data.West J Nurs Res. 2022 Nov;44(11):1068-1081. doi: 10.1177/01939459211029296. Epub 2021 Jul 8. West J Nurs Res. 2022. PMID: 34238076 Free PMC article. Review.
-
Revolutionizing Medical Data Sharing Using Advanced Privacy-Enhancing Technologies: Technical, Legal, and Ethical Synthesis.J Med Internet Res. 2021 Feb 25;23(2):e25120. doi: 10.2196/25120. J Med Internet Res. 2021. PMID: 33629963 Free PMC article.
-
Lost in Anonymization - A Data Anonymization Reference Classification Merging Legal and Technical Considerations.J Law Med Ethics. 2020 Mar;48(1):228-231. doi: 10.1177/1073110520917025. J Law Med Ethics. 2020. PMID: 32342783 Free PMC article. No abstract available.
-
Regulating the Secondary Use of Data for Research: Arguments Against Genetic Exceptionalism.Front Genet. 2019 Dec 20;10:1254. doi: 10.3389/fgene.2019.01254. eCollection 2019. Front Genet. 2019. PMID: 31956328 Free PMC article.
References
-
- Boaden R, Joyce P. Developing the electronic health record: what about patient safety? Health Serv Manage Res 2006;19:94–104 - PubMed
-
- Chaudhry B, Wang J, Wu S, et al. Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann Intern Med 2006;144:742–52 - PubMed
-
- Evans DC, Nichol WP, Perlin JB. Effect of the implementation of an enterprise-wide electronic health record on productivity in the Veterans Health Administration. Health Econ Policy Law 2006;1:163–9 - PubMed
-
- James B. E-health: steps on the road to interoperability. Health Aff (Millwood) 2005;Suppl Web Exclusives:W5–26–W5–30. - PubMed
-
- Soti P, Pandey S. Business process optimization for RHIOs. J Healthc Inf Manag 2007;21:40–7 - PubMed
Publication types
MeSH terms
Grants and funding
- U01 HG006385/HG/NHGRI NIH HHS/United States
- R01LM009018/LM/NLM NIH HHS/United States
- R01 LM009018/LM/NLM NIH HHS/United States
- T32GM07347/GM/NIGMS NIH HHS/United States
- 1U01HG006378/HG/NHGRI NIH HHS/United States
- R01 LM009989/LM/NLM NIH HHS/United States
- R01LM010828/LM/NLM NIH HHS/United States
- T32 GM007347/GM/NIGMS NIH HHS/United States
- U01 HG006378/HG/NHGRI NIH HHS/United States
- R01 LM010828/LM/NLM NIH HHS/United States
- R01 LM007995/LM/NLM NIH HHS/United States
- 1U01HG006385/HG/NHGRI NIH HHS/United States
- T15 LM007450/LM/NLM NIH HHS/United States
- R01LM007995/LM/NLM NIH HHS/United States
- T15LM007450/LM/NLM NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
