Disease prediction based on multi-type data fusion from Chinese electronic health record

Math Biosci Eng. 2022 Sep 19;19(12):13732-13746. doi: 10.3934/mbe.2022640.

Abstract

Disease prediction by using a variety of healthcare data to assist doctors in disease diagnosis is becoming a more and more important research topic recently. This paper proposes a disease prediction model that fuses multiple types of encoded representations of Chinese electronic health records (EHRs). The model framework utilizes a multi-head self-attention mechanism, which combines textual and numerical features to enhance text representations. The BiLSTM-CRF and TextCNN models are used, respectively, to extract entities and then obtain the embedding representations of them. The representations of text and entities in it are combined together for formulating representations of EHRs. The experimental results on EHRs data collected from a Three Grade Class B Hospital General in Gansu Province, China, show that our model achieved an F1 score of 91.92%, which outperforms the previous baseline methods.

Keywords: BERT; Chinese electronic health record; TextCNN; disease prediction; multi-type data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • China
  • Delivery of Health Care
  • Electronic Health Records*
  • Hospitals
  • Humans