Automatic information extraction from unstructured mammography reports using distributed semantics

J Biomed Inform. 2018 Feb:78:78-86. doi: 10.1016/j.jbi.2017.12.016. Epub 2018 Jan 9.


To date, the methods developed for automated extraction of information from radiology reports are mainly rule-based or dictionary-based, and, therefore, require substantial manual effort to build these systems. Recent efforts to develop automated systems for entity detection have been undertaken, but little work has been done to automatically extract relations and their associated named entities in narrative radiology reports that have comparable accuracy to rule-based methods. Our goal is to extract relations in a unsupervised way from radiology reports without specifying prior domain knowledge. We propose a hybrid approach for information extraction that combines dependency-based parse tree with distributed semantics for generating structured information frames about particular findings/abnormalities from the free-text mammography reports. The proposed IE system obtains a F1-score of 0.94 in terms of completeness of the content in the information frames, which outperforms a state-of-the-art rule-based system in this domain by a significant margin. The proposed system can be leveraged in a variety of applications, such as decision support and information retrieval, and may also easily scale to other radiology domains, since there is no need to tune the system with hand-crafted information extraction rules.

Keywords: Information extraction; Information frames; Report annotation; Word embedding.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Data Curation
  • Humans
  • Information Storage and Retrieval / methods*
  • Mammography / methods*
  • Natural Language Processing
  • Radiology Information Systems*
  • Semantics*