De-identification of Address, Date, and Alphanumeric Identifiers in Narrative Clinical Reports

AMIA Annu Symp Proc. 2014 Nov 14;2014:767-76. eCollection 2014.


Introduction: The Privacy Rule of Health Insurance Portability and Accountability Act requires that clinical documents be stripped of personally identifying information before they can be released to researchers and others. We have been developing a software application, NLM Scrubber, to de-identify narrative clinical reports.

Methods: We compared NLM Scrubber with MIT's and MITRE's de-identification systems on 3,093 clinical reports about 1,636 patients. The performance of each system was analyzed on address, date, and alphanumeric identifier recognition separately. Their overall performance on de-identification and on conservation of the remaining clinical text was analyzed as well.

Results: NLM Scrubber's sensitivity on de-identifying these identifiers was 99%. It's specificity on conserving the text with no personal identifiers was 99% as well.

Conclusion: The current version of the system recognizes and redacts patient names, alphanumeric identifiers, addresses and dates. We plan to make the system available prior to the AMIA Annual Symposium in 2014.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Intramural

MeSH terms

  • Computer Security
  • Confidentiality*
  • Electronic Health Records*
  • Health Insurance Portability and Accountability Act
  • Software*
  • United States