Deep learning model for multi-classification of infectious diseases from unstructured electronic medical records

Mengying Wang; Zhenhao Wei; Mo Jia; Lianzhong Chen; Hong Ji

doi:10.1186/s12911-022-01776-y

Deep learning model for multi-classification of infectious diseases from unstructured electronic medical records

BMC Med Inform Decis Mak. 2022 Feb 16;22(1):41. doi: 10.1186/s12911-022-01776-y.

Authors

Mengying Wang¹, Zhenhao Wei², Mo Jia¹, Lianzhong Chen², Hong Ji³

Affiliations

¹ Information Management and Big Data Center, Peking University Third Hospital, Beijing, China.
² Goodwill Hessian Health Technology Co. Ltd, Beijing, China.
³ Information Management and Big Data Center, Peking University Third Hospital, Beijing, China. puh3_imc@bjmu.edu.cn.

Abstract

Purpose: Predictively diagnosing infectious diseases helps in providing better treatment and enhances the prevention and control of such diseases. This study uses actual data from a hospital. A multiple infectious disease diagnostic model (MIDDM) is designed for conducting multi-classification of infectious diseases so as to assist in clinical infectious-disease decision-making.

Methods: Based on actual hospital medical records of infectious diseases from December 2012 to December 2020, a deep learning model for multi-classification research on infectious diseases is constructed. The data includes 20,620 cases covering seven types of infectious diseases, including outpatients and inpatients, of which training data accounted for 80%, i.e., 16,496 cases, and test data accounted for 20%, i.e., 4124 cases. Through the auto-encoder, data normalization and sparse data densification processing are carried out to improve the model training effect. A residual network and attention mechanism are introduced into the MIDDM model to improve the performance of the model.

Result: MIDDM achieved improved prediction results in diagnosing seven kinds of infectious diseases. In the case of similar disease diagnosis characteristics and similar interference factors, the prediction accuracy of disease classification with more sample data is significantly higher than the prediction accuracy of disease classification with fewer sample data. For instance, the training data for viral hepatitis, influenza, and hand foot and mouth disease were 2954, 3924, and 3015 respectively and the corresponding test accuracy rates were 99.86%, 98.47%, and 97.31%. There is less training data for syphilis, infectious diarrhea, and measles, i.e., 1208, 575, and 190 respectively and the corresponding test accuracy rates were noticeably lower, i.e., 83.03%, 87.30%, and42.11%. We also compared the MIDDM model with the models used in other studies. Using the same input data, taking viral hepatitis as an example, the accuracy of MIDDM is 99.44%, which is significantly higher than that of XGBoost (96.19%), Decision tree (90.13%), Bayesian method (85.19%), and logistic regression (91.26%). Other diseases were also significantly better predicted by MIDDM than by these three models.

Conclusion: The application of the MIDDM model to multi-class diagnosis and prediction of infectious diseases can improve the accuracy of infectious-disease diagnosis. However, these results need to be further confirmed via clinical randomized controlled trials.

Keywords: Deep learning; Early diagnosis; Infectious diseases; Multi-classification.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Bayes Theorem
Communicable Diseases* / diagnosis
Communicable Diseases* / epidemiology
Deep Learning*
Electronic Health Records
Humans
Neural Networks, Computer