Automatic SNOMED CT coding of Chinese clinical terms via attention-based semantic matching

Int J Med Inform. 2022 Mar:159:104676. doi: 10.1016/j.ijmedinf.2021.104676. Epub 2021 Dec 28.

Abstract

Background: A considerable amount of meaningful information is routinely recorded in Chinese clinical data in text format, referred to as Chinese clinical terms. The lack of coding is a major difficulty hindering the application of clinical terms. SNOMED CT is a widely used and comprehensive clinical health care terminology collection because of its coverage, granularity, clinical orientation, and logical underpinning. It is useful and efficient for automatically assigning SNOMED CT codes to Chinese clinical terms, but it still faces several problems. Current cross-language clinical term matching studies rely on external resources, such as machine translation and rule-based methods. Semantic matching methods have achieved strong performance on text matching, but few studies have been done on cross-language clinical term matching. We present an effective attention-based semantic matching algorithm to automatically cross-language code Chinese clinical terms with SNOMED CT.

Method: Firstly, BERT was used to turn the input into word embedding. Then, the word embeddings were encoded through a BiLSTM with self-attention to focus on capturing distant relationships among words with different weights depending on their contribution to semantic matching. Then, decomposable attention was used to make semantic matching trivially parallelizable to speed up calculation. Finally, fully connected layers and a sigmoid were utilized to output matching results.

Results: The 29,960 manually coded Chinese clinical terms, 30,040 unmatched Chinese clinical terms and SNOMED CT codes were collected to evaluate the proposed method. Compared with the existing semantic matching method, the proposed approach achieves state-of-the-art results demonstrating the effectiveness of the method with an accuracy of 0.905, a precision of 0.856, a recall of 0.518, and an F-measure of 0.645. The proposed Chinese-English bilingual term mapping, Chinese character-level and word-level encoder, English word-level encoder, BERT model, and attention mechanism performed better than other methods.

Conclusion: The proposed automatic SNOMED CT coding approach of Chinese clinical terms via attention-based semantic matching can improve the performance of automated SNOMED CT code assignment for Chinese clinical terms and improve the efficiency of the code assignment.

Keywords: Automatic coding; Decomposble attention; SNOMED CT; Semantic matching.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • China
  • Humans
  • Language
  • Semantics*
  • Systematized Nomenclature of Medicine*