Objective: This study aimed to implement and evaluate machine learning based-models to predict COVID-19' diagnosis and disease severity.
Methods: COVID-19 test samples (positive or negative results) from patients who attended a single hospital were evaluated. Patients diagnosed with COVID-19 were categorised according to the severity of the disease. Data were submitted to exploratory analysis (principal component analysis, PCA) to detect outlier samples, recognise patterns, and identify important variables. Based on patients' laboratory tests results, machine learning models were implemented to predict disease positivity and severity. Artificial neural networks (ANN), decision trees (DT), partial least squares discriminant analysis (PLS-DA), and K nearest neighbour algorithm (KNN) models were used. The four models were validated based on the accuracy (area under the ROC curve).
Results: The first subset of data had 5,643 patient samples (5,086 negatives and 557 positives for COVID-19). The second subset included 557 COVID-19 positive patients. The ANN, DT, PLS-DA, and KNN models allowed the classification of negative and positive samples with >84% accuracy. It was also possible to classify patients with severe and non-severe disease with an accuracy >86%. The following were associated with the prediction of COVID-19 diagnosis and severity: hyperferritinaemia, hypocalcaemia, pulmonary hypoxia, hypoxemia, metabolic and respiratory acidosis, low urinary pH, and high levels of lactate dehydrogenase.
Conclusion: Our analysis shows that all the models could assist in the diagnosis and prediction of COVID-19 severity.
Keywords: Blood test; COVID-19; Diagnosis; Machine learning model; Severity; Urine test.
Copyright © 2021 Elsevier Ltd. All rights reserved.