Identification of patient name references within medical documents using semantic selectional restrictions
- PMID: 12463926
- PMCID: PMC2244274
Identification of patient name references within medical documents using semantic selectional restrictions
Abstract
De-identification of a patient's personal data from medical records is a protective legal requirement imposed before medical documents can be used for research purposes or transferred to other healthcare providers (e.g., teachers, students, tele-consultations). This de-identification process is tedious if performed manually, and is known to be quite faulty in direct search and replace strategies [9]. In this paper, we report on the identification step of this process. The proposed algorithm is based on estimating the fitness of candidate patient name references to a set of semantic selectional restrictions. The semantic restrictions place tight contextual requirements upon candidate words in the report text and are determined automatically from a manually tagged corpus of training reports. Maximum entropy classifiers are used to provide a probabilistic measure of the belief of a given candidate token to a given semantic restriction. We report on the design and preliminary evaluation of the system within the do-main of pediatric urology.
Similar articles
-
The pattern of name tokens in narrative clinical text and a comparison of five systems for redacting them.J Am Med Inform Assoc. 2014 May-Jun;21(3):423-31. doi: 10.1136/amiajnl-2013-001689. Epub 2013 Sep 11. J Am Med Inform Assoc. 2014. PMID: 24026308 Free PMC article.
-
Automatic identification of critical follow-up recommendation sentences in radiology reports.AMIA Annu Symp Proc. 2011;2011:1593-602. Epub 2011 Oct 22. AMIA Annu Symp Proc. 2011. PMID: 22195225 Free PMC article.
-
Dynamic composition of semantic pathways for medical computational problem solving by means of semantic rules.IEEE Trans Inf Technol Biomed. 2011 Mar;15(2):334-43. doi: 10.1109/TITB.2010.2091645. Epub 2011 Feb 17. IEEE Trans Inf Technol Biomed. 2011. PMID: 21335316
-
A successful technique for removing names in pathology reports using an augmented search and replace method.Proc AMIA Symp. 2002:777-81. Proc AMIA Symp. 2002. PMID: 12463930 Free PMC article.
-
Using automatically learnt verb selectional preferences for classification of biomedical terms.J Biomed Inform. 2004 Dec;37(6):483-97. doi: 10.1016/j.jbi.2004.08.002. J Biomed Inform. 2004. PMID: 15542021
Cited by
-
Federated Learning on Clinical Benchmark Data: Performance Assessment.J Med Internet Res. 2020 Oct 26;22(10):e20891. doi: 10.2196/20891. J Med Internet Res. 2020. PMID: 33104011 Free PMC article.
-
Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.J Am Med Inform Assoc. 2020 Jul 1;27(9):1374-1382. doi: 10.1093/jamia/ocaa095. J Am Med Inform Assoc. 2020. PMID: 32930712 Free PMC article.
-
The machine giveth and the machine taketh away: a parrot attack on clinical text deidentified with hiding in plain sight.J Am Med Inform Assoc. 2019 Dec 1;26(12):1536-1544. doi: 10.1093/jamia/ocz114. J Am Med Inform Assoc. 2019. PMID: 31390016 Free PMC article.
-
Optimizing annotation resources for natural language de-identification via a game theoretic framework.J Biomed Inform. 2016 Jun;61:97-109. doi: 10.1016/j.jbi.2016.03.019. Epub 2016 Mar 25. J Biomed Inform. 2016. PMID: 27020263 Free PMC article.
-
Automatic detection of protected health information from clinic narratives.J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S30-S38. doi: 10.1016/j.jbi.2015.06.015. Epub 2015 Jul 29. J Biomed Inform. 2015. PMID: 26231070 Free PMC article.
References
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources