Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies
- PMID: 22692265
- PMCID: PMC6502465
- DOI: 10.1097/MLR.0b013e3182585355
Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies
Abstract
Background: De-identification and anonymization are strategies that are used to remove patient identifiers in electronic health record data. The use of these strategies in multicenter research studies is paramount in importance, given the need to share electronic health record data across multiple environments and institutions while safeguarding patient privacy.
Methods: Systematic literature search using keywords of de-identify, deidentify, de-identification, deidentification, anonymize, anonymization, data scrubbing, and text scrubbing. Search was conducted up to June 30, 2011 and involved 6 different common literature databases. A total of 1798 prospective citations were identified, and 94 full-text articles met the criteria for review and the corresponding articles were obtained. Search results were supplemented by review of 26 additional full-text articles; a total of 120 full-text articles were reviewed.
Results: A final sample of 45 articles met inclusion criteria for review and discussion. Articles were grouped into text, images, and biological sample categories. For text-based strategies, the approaches were segregated into heuristic, lexical, and pattern-based systems versus statistical learning-based systems. For images, approaches that de-identified photographic facial images and magnetic resonance image data were described. For biological samples, approaches that managed the identifiers linked with these samples were discussed, particularly with respect to meeting the anonymization requirements needed for Institutional Review Board exemption under the Common Rule.
Conclusions: Current de-identification strategies have their limitations, and statistical learning-based systems have distinct advantages over other approaches for the de-identification of free text. True anonymization is challenging, and further work is needed in the areas of de-identification of datasets and protection of genetic information.
Comment in
-
Commentary: Protecting human subjects and their data in multi-site research.Med Care. 2012 Jul;50 Suppl:S74-6. doi: 10.1097/MLR.0b013e318257ddd8. Med Care. 2012. PMID: 22692263
Similar articles
-
De-identification of free text data containing personal health information: a scoping review of reviews.Int J Popul Data Sci. 2023 Dec 12;8(1):2153. doi: 10.23889/ijpds.v8i1.2153. eCollection 2023. Int J Popul Data Sci. 2023. PMID: 38414537 Free PMC article. Review.
-
Automatic de-identification of textual documents in the electronic health record: a review of recent research.BMC Med Res Methodol. 2010 Aug 2;10:70. doi: 10.1186/1471-2288-10-70. BMC Med Res Methodol. 2010. PMID: 20678228 Free PMC article. Review.
-
Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review.J Med Internet Res. 2019 May 31;21(5):e13484. doi: 10.2196/13484. J Med Internet Res. 2019. PMID: 31152528 Free PMC article. Review.
-
Patient Privacy in the Era of Big Data.Balkan Med J. 2018 Jan 20;35(1):8-17. doi: 10.4274/balkanmedj.2017.0966. Epub 2017 Sep 13. Balkan Med J. 2018. PMID: 28903886 Free PMC article. Review.
-
Nonspecific deidentification of date-like text in deidentified clinical notes enables reidentification of dates.J Am Med Inform Assoc. 2022 Oct 7;29(11):1967-1971. doi: 10.1093/jamia/ocac147. J Am Med Inform Assoc. 2022. PMID: 36217861 Free PMC article.
Cited by
-
Automated anonymization of radiology reports: comparison of publicly available natural language processing and large language models.Eur Radiol. 2024 Oct 31. doi: 10.1007/s00330-024-11148-x. Online ahead of print. Eur Radiol. 2024. PMID: 39480533
-
TARGETING UNDERREPRESENTED POPULATIONS IN PRECISION MEDICINE: A FEDERATED TRANSFER LEARNING APPROACH.Ann Appl Stat. 2023 Dec;17(4):2970-2992. doi: 10.1214/23-AOAS1747. Epub 2023 Oct 30. Ann Appl Stat. 2023. PMID: 39314265 Free PMC article.
-
Radiology and multi-scale data integration for precision oncology.NPJ Precis Oncol. 2024 Jul 26;8(1):158. doi: 10.1038/s41698-024-00656-0. NPJ Precis Oncol. 2024. PMID: 39060351 Free PMC article. Review.
-
What Do We Mean by Sharing of Patient Data? DaSH: A Data Sharing Hierarchy of Privacy and Ethical Challenges.Appl Clin Inform. 2024 Oct;15(5):833-841. doi: 10.1055/a-2373-3291. Epub 2024 Jul 25. Appl Clin Inform. 2024. PMID: 39053616
-
Regulatory Issues in Electronic Health Records for Adolescent HIV Research: Strategies and Lessons Learned.JMIR Form Res. 2024 May 2;8:e46420. doi: 10.2196/46420. JMIR Form Res. 2024. PMID: 38696775 Free PMC article.
References
-
- Sweeney L Computational disclosure control: A primer on data privacy protection. Massachusetts Institute of Technology; 2001
-
- Velupillai S, Dalianis H, Hassel M, et al. Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial. Int J Med Inform 2009;78:e19–26 - PubMed
-
- Grouin C, Rosier A, Dameron O, et al. Testing tactics to localize de-identification. Stud Health Technol Inform 2009;150:735–739 - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous
