Concept Graphs: A Novel Approach for Textual Analysis of Medical Documents

Stud Health Technol Inform. 2023 Sep 12:307:172-179. doi: 10.3233/SHTI230710.

Abstract

The task of automatically analyzing the textual content of documents faces a number of challenges in general but even more so when dealing with the medical domain. Here, we can't normally rely on specifically pre-trained NLP models or even, due to data privacy reasons, (massive) amounts of training material to generate said models. We, therefore, propose a method that utilizes general-purpose basic text analysis components and state-of-the-art transformer models to represent a corpus of documents as multiple graphs, wherein important conceptually related phrases from documents constitute the nodes and their semantic relation form the edges. This method could serve as a basis for several explorative procedures and is able to draw on a plethora of publicly available resources. We test it by comparing the effectiveness of these so-called Concept Graphs with another recently suggested approach for a common use case in information retrieval, document clustering.

Keywords: Document Clustering; Graphs; Medical Documents; Natural Language Processing; Transformer Models; Word Embeddings.

MeSH terms

  • Cluster Analysis
  • Electric Power Supplies*
  • Information Storage and Retrieval*
  • Privacy
  • Semantics