Automating tissue bank annotation from pathology reports - comparison to a gold standard expert annotation set

AMIA Annu Symp Proc. 2005:2005:460-4.

Abstract

Surgical pathology specimens are an important resource for medical research, particularly for cancer research. Although research studies would benefit from information derived from the surgical pathology reports, access to this information is limited by use of unstructured free-text in the reports. We have previously described a pipeline-based system for automated annotation of surgical pathology reports with UMLS concepts, which has been used to code over 450,000 surgical pathology reports at our institution. In addition to coding UMLS terms, it annotates values of several key variables, such as TNM stage and cancer grade. The object of this study was to evaluate the potential and limitations of automated extraction of these variables, by measuring the performance of the system against a true gold standard - manually encoded data entered by expert tissue annotators. We categorized and analyzed errors to determine the potential and limitations of information extraction from pathology reports for the purpose of automated biospecimen annotation.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Abstracting and Indexing / methods*
  • Abstracting and Indexing / standards
  • Algorithms
  • Clinical Laboratory Information Systems
  • Computer Communication Networks
  • Electronic Data Processing*
  • Feasibility Studies
  • Humans
  • Information Storage and Retrieval
  • Medical Informatics Applications
  • Medical Records Systems, Computerized
  • Natural Language Processing*
  • Pathology, Surgical*
  • Tissue Banks*
  • Unified Medical Language System