Graph-based abstractive biomedical text summarization

J Biomed Inform. 2022 Aug:132:104099. doi: 10.1016/j.jbi.2022.104099. Epub 2022 Jun 11.

Abstract

Summarization is the process of compressing a text to obtain its important informative parts. In recent years, various methods have been presented to extract important parts of textual documents to present them in a summarized form. The first challenge of these methods is to detect the concepts that well convey the main topic of the text and extract sentences that better describe these essential concepts. The second challenge is the correct interpretation of the essential concepts to generate new paraphrased sentences such that they are not exactly the same as the sentences in the main text. The first challenge has been addressed by many researchers. However, the second one is still in progress. In this study, we focus on the abstractive summarization of biomedical documents. In this regard, for the first challenge, a new method is presented based on the graph generation and frequent itemset mining for generating extractive summaries by considering the concepts within the biomedical documents. Then, to address the second challenge, a transfer learning-based method is used to generate abstractive summarizations from extractive summaries. The efficiency of the proposed solution has been evaluated by conducting several experiments over BioMed Central and NLM's PubMed datasets. The obtained results show that the proposed approach admits a better interpretation of the main concepts and sentences of biomedical documents for the abstractive summarization by obtaining the overall ROUGE of 59.60%, which, on average, is 17% better than state-of-the-art summarization techniques. The source code, datasets, and results are available in GitHub1.

Keywords: Abstractive summarization; Biomedical domain; Graph; Itemset mining; Text summarization; Transfer learning.

MeSH terms

  • Algorithms*
  • Concept Formation
  • Language
  • Semantics*
  • Software