Extracting structured information from free text pathology reports

AMIA Annu Symp Proc. 2003;2003:584-8.


We have developed a method that extracts structured information about specimens and their related findings in free-text surgical pathology reports. Our method uses regular expressions that drive a state-automaton on top of XSLT and Java. Text fragments identified are coded against the UMLS. This paper describes the technical approach and reports on a preliminary evaluation study, designed to guide further development. We found that of 275 reviewed reports, 91% were coded at least so that all specimens and their critical pathologic findings were represented in codes.

Publication types

  • Evaluation Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Databases as Topic*
  • Forms and Records Control*
  • Humans
  • Information Storage and Retrieval / methods*
  • Internet
  • Medical Records Systems, Computerized
  • Pathology, Surgical / classification*
  • Programming Languages
  • Unified Medical Language System*