A machine learning model to identify early stage symptoms of SARS-Cov-2 infected patients

Md Martuza Ahamad; Sakifa Aktar; Md Rashed-Al-Mahfuz; Shahadat Uddin; Pietro Liò; Haoming Xu; Matthew A Summers; Julian M W Quinn; Mohammad Ali Moni

doi:10.1016/j.eswa.2020.113661

A machine learning model to identify early stage symptoms of SARS-Cov-2 infected patients

Expert Syst Appl. 2020 Dec 1:160:113661. doi: 10.1016/j.eswa.2020.113661. Epub 2020 Jun 20.

Authors

Md Martuza Ahamad¹, Sakifa Aktar¹, Md Rashed-Al-Mahfuz², Shahadat Uddin³, Pietro Liò⁴, Haoming Xu^{5

6}, Matthew A Summers^{7

8}, Julian M W Quinn^{7

9}, Mohammad Ali Moni^{7

10}

Affiliations

¹ Department of Computer Science and Engineering, Bangabandhu Sheikh Mujibur Rahman Science & Technology University, Gopalganj 8100, Bangladesh.
² Department of Computer Science and Engineering, University of Rajshahi, Rajshahi 6205, Bangladesh.
³ Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Darlington, NSW 2008, Australia.
⁴ Computer Laboratory, The University of Cambridge, 15 JJ Thomson Avenue, Cambridge, UK.
⁵ Department of Biomedical Engineering, Duke University, Durham, NC 27708, USA.
⁶ Chengdu Institute of Public Administration, Sichuan, 610110, China.
⁷ The Garvan Institute of Medical Research, Healthy Ageing Theme, Darlinghurst, NSW, Australia.
⁸ St Vincent's Clinical School, University of New South Wales, Faculty of Medicine, Sydney, Australia.
⁹ Royal North Shore Hospital SERT Institute, St. Leonards, NSW Australia.
¹⁰ WHO Collaborating Centre on eHealth, UNSW Digital Health, School of Public Health and Community Medicine, Faculty of Medicine, UNSW Sydney, Australia.

Abstract

The recent outbreak of the respiratory ailment COVID-19 caused by novel coronavirus SARS-Cov2 is a severe and urgent global concern. In the absence of effective treatments, the main containment strategy is to reduce the contagion by the isolation of infected individuals; however, isolation of unaffected individuals is highly undesirable. To help make rapid decisions on treatment and isolation needs, it would be useful to determine which features presented by suspected infection cases are the best predictors of a positive diagnosis. This can be done by analyzing patient characteristics, case trajectory, comorbidities, symptoms, diagnosis, and outcomes. We developed a model that employed supervised machine learning algorithms to identify the presentation features predicting COVID-19 disease diagnoses with high accuracy. Features examined included details of the individuals concerned, e.g., age, gender, observation of fever, history of travel, and clinical details such as the severity of cough and incidence of lung infection. We implemented and applied several machine learning algorithms to our collected data and found that the XGBoost algorithm performed with the highest accuracy (>85%) to predict and select features that correctly indicate COVID-19 status for all age groups. Statistical analyses revealed that the most frequent and significant predictive symptoms are fever (41.1%), cough (30.3%), lung infection (13.1%) and runny nose (8.43%). While 54.4% of people examined did not develop any symptoms that could be used for diagnosis, our work indicates that for the remainder, our predictive model could significantly improve the prediction of COVID-19 status, including at early stages of infection.

Keywords: COVID-19; Coronavirus; Early stage symptom; Machine learning; SARS-Cov-2.