Semi-Automatic Terminology Generation for Information Extraction from German Chest X-Ray Reports

Stud Health Technol Inform. 2017:243:80-84.

Abstract

Extraction of structured data from textual reports is an important subtask for building medical data warehouses for research and care. Many medical and most radiology reports are written in a telegraphic style with a concatenation of noun phrases describing the presence or absence of findings. Therefore a lexico-syntactical approach is promising, where key terms and their relations are recognized and mapped on a predefined standard terminology (ontology). We propose a two-phase algorithm for terminology matching: In the first pass, a local terminology for recognition is derived as close as possible to the terms used in the radiology reports. In the second pass, the local terminology is mapped to a standard terminology. In this paper, we report on an algorithm for the first step of semi-automatic generation of the local terminology and evaluate the algorithm with radiology reports of chest X-ray examinations from Würzburg university hospital. With an effort of about 20 hours work of a radiologist as domain expert and 10 hours for meetings, a local terminology with about 250 attributes and various value patterns was built. In an evaluation with 100 randomly chosen reports it achieved an F1-Score of about 95% for information extraction.

Keywords: Data Warehouse; Information Extraction; Radiology Reports; Terminology Generation.

MeSH terms

  • Algorithms
  • Humans
  • Information Storage and Retrieval*
  • Radiography, Thoracic*
  • Radiology
  • Radiology Information Systems*
  • Terminology as Topic