Towards Structuring Clinical Texts: Joint Entity and Relation Extraction from Japanese Case Report Corpus

Stud Health Technol Inform. 2024 Jan 25:310:559-563. doi: 10.3233/SHTI231027.


Important pieces of information related to patient symptoms and diagnosis are often written in free-text form in clinical texts. To utilize these texts, information extraction using natural language processing is required. This study evaluated the performance of named entity recognition (NER) and relation extraction (RE) using machine-learning methods. The Japanese case report corpus was used for this study, which had 113 types of entities and 36 types of relations that were manually annotated. There were 183 cases comprising 2,194 sentences after preprocessing. In addition, a machine learning model based on bidirectional encoder representations from transformers was used. The results revealed that the maximum micro-averaged F1 scores of NER and RE were 0.912 and 0.759, respectively. The results of this study are comparable to those of previous studies. Hence, these results could be of substantial baseline accuracy.

Keywords: Natural language processing; machine learning; medical informatics computing.

Publication types

  • Case Reports

MeSH terms

  • Electric Power Supplies*
  • Humans
  • Information Storage and Retrieval
  • Japan
  • Machine Learning
  • Writing*