The Challenges of Creating a Gold Standard for De-identification Research

AMIA Annu Symp Proc. 2014 Nov 14;2014:353-8. eCollection 2014.


We created a Gold Standard corpus comprised over 20,000 records of annotated narrative clinical reports for use in the training and evaluation of NLM Scrubber, a de-identification software system for medical records. Our experience with designing the corpus demonstrated the conceptual complexity of the task.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • Confidentiality*
  • Electronic Health Records*
  • Health Insurance Portability and Accountability Act
  • Humans
  • Software*
  • United States