Ensemble learning model for diagnosing COVID-19 from routine blood tests
- PMID: 33102686
- PMCID: PMC7572278
- DOI: 10.1016/j.imu.2020.100449
Ensemble learning model for diagnosing COVID-19 from routine blood tests
Abstract
Background and objectives: The pandemic of novel coronavirus disease 2019 (COVID-19) has severely impacted human society with a massive death toll worldwide. There is an urgent need for early and reliable screening of COVID-19 patients to provide better and timely patient care and to combat the spread of the disease. In this context, recent studies have reported some key advantages of using routine blood tests for initial screening of COVID-19 patients. In this article, first we present a review of the emerging techniques for COVID-19 diagnosis using routine laboratory and/or clinical data. Then, we propose ERLX which is an ensemble learning model for COVID-19 diagnosis from routine blood tests.
Method: The proposed model uses three well-known diverse classifiers, extra trees, random forest and logistic regression, which have different architectures and learning characteristics at the first level, and then combines their predictions by using a second level extreme gradient boosting (XGBoost) classifier to achieve a better performance. For data preparation, the proposed methodology employs a KNNImputer algorithm to handle null values in the dataset, isolation forest (iForest) to remove outlier data, and a synthetic minority oversampling technique (SMOTE) to balance data distribution. For model interpretability, features importance are reported by using the SHapley Additive exPlanations (SHAP) technique.
Results: The proposed model was trained and evaluated by using a publicly available data set from Albert Einstein Hospital in Brazil, which consisted of 5644 data samples with 559 confirmed COVID-19 cases. The ensemble model achieved outstanding performance with an overall accuracy of 99.88% [95% CI: 99.6-100], AUC of 99.38% [95% CI: 97.5-100], a sensitivity of 98.72% [95% CI: 94.6-100] and a specificity of 99.99% [95% CI: 99.99-100].
Discussion: The proposed model revealed better performance when compared against existing state-of-the-art studies (Banerjee et al., 2020; de Freitas Barbosa et al., 2020; de Moraes Batista et al., 2020; Soares et al., 2020) [3,22,56,71] for the same set of features employed by them. As compared to the best performing Bayes Net model (de Freitas Barbosa et al., 2020) [22] average accuracy of 95.159%, ERLX achieved an average accuracy of 99.94%. In comparison with AUC of 85% reported by the SVM model (de Moraes Batista et al., 2020) [56], ERLX obtained AUC of 99.77% in addition to improvements in sensitivity, and specificity. As compared with ER-COV model (Soares et al., 2020) [71] average sensitivity of 70.25% and specificity of 85.98%, ERLX model achieved sensitivity of 99.47% and specificity of 99.99%. The ERLX model obtained a considerably higher score as compared with ANN model (Banerjee et al., 2020) [3] in all performance metrics. Therefore, the model presented is robust and can be deployed for reliable early and rapid screening of COVID-19 patients.
Keywords: COVID-19; Diagnostic model; Ensemble; Machine learning; Routine blood tests.
© 2020 The Author(s).
Conflict of interest statement
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Figures
Similar articles
-
Artificial intelligence in clinical care amidst COVID-19 pandemic: A systematic review.Comput Struct Biotechnol J. 2021;19:2833-2850. doi: 10.1016/j.csbj.2021.05.010. Epub 2021 May 7. Comput Struct Biotechnol J. 2021. PMID: 34025952 Free PMC article. Review.
-
Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage.BMC Med Inform Decis Mak. 2022 Oct 25;22(1):278. doi: 10.1186/s12911-022-02018-x. BMC Med Inform Decis Mak. 2022. PMID: 36284327 Free PMC article.
-
Explainable artificial intelligence model for identifying COVID-19 gene biomarkers.Comput Biol Med. 2023 Mar;154:106619. doi: 10.1016/j.compbiomed.2023.106619. Epub 2023 Feb 1. Comput Biol Med. 2023. PMID: 36738712 Free PMC article.
-
Deep forest model for diagnosing COVID-19 from routine blood tests.Sci Rep. 2021 Aug 17;11(1):16682. doi: 10.1038/s41598-021-95957-w. Sci Rep. 2021. PMID: 34404838 Free PMC article.
-
Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence.Artif Intell Med. 2023 Mar;137:102490. doi: 10.1016/j.artmed.2023.102490. Epub 2023 Jan 18. Artif Intell Med. 2023. PMID: 36868685 Free PMC article. Review.
Cited by
-
MENet: A Mitscherlich function based ensemble of CNN models to classify lung cancer using CT scans.PLoS One. 2024 Mar 11;19(3):e0298527. doi: 10.1371/journal.pone.0298527. eCollection 2024. PLoS One. 2024. PMID: 38466701 Free PMC article.
-
A brief review and scientometric analysis on ensemble learning methods for handling COVID-19.Heliyon. 2024 Feb 20;10(4):e26694. doi: 10.1016/j.heliyon.2024.e26694. eCollection 2024 Feb 29. Heliyon. 2024. PMID: 38420425 Free PMC article.
-
Synergistic integration of Multi-View Brain Networks and advanced machine learning techniques for auditory disorders diagnostics.Brain Inform. 2024 Jan 14;11(1):3. doi: 10.1186/s40708-023-00214-7. Brain Inform. 2024. PMID: 38219249 Free PMC article.
-
Prediction of nonsentinel lymph node metastasis in breast cancer patients based on machine learning.World J Surg Oncol. 2023 Aug 11;21(1):244. doi: 10.1186/s12957-023-03109-3. World J Surg Oncol. 2023. PMID: 37563717 Free PMC article.
-
Predicting adverse outcomes in pregnant patients positive for SARS-CoV-2: a machine learning approach- a retrospective cohort study.BMC Pregnancy Childbirth. 2023 Aug 2;23(1):553. doi: 10.1186/s12884-023-05679-2. BMC Pregnancy Childbirth. 2023. PMID: 37532986 Free PMC article.
References
-
- Altman N.S. An introduction to kernel and nearest-neighbor nonparametric regression. Am Statistician. 1992;46:175–185.
-
- Bao F.S., He Y., Liu J., Chen Y., Li Q., Zhang C.R., Han L., Zhu B., Ge Y., Chen S. 2020. Triaging moderate covid-19 and other viral pneumonias from routine blood tests. arXiv preprint arXiv:2005.06546.
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials