COVID-19 Phenotypes and Comorbidity: A Data-Driven, Pattern Recognition Approach Using National Representative Data from the United States

George D Vavougios; Vasileios T Stavrou; Christoforos Konstantatos; Pavlos-Christoforos Sinigalias; Sotirios G Zarogiannis; Konstantinos Kolomvatsos; George Stamoulis; Konstantinos I Gourgoulianis

doi:10.3390/ijerph19084630

COVID-19 Phenotypes and Comorbidity: A Data-Driven, Pattern Recognition Approach Using National Representative Data from the United States

Int J Environ Res Public Health. 2022 Apr 12;19(8):4630. doi: 10.3390/ijerph19084630.

Authors

Affiliations

¹ Department of Neurology, University of Cyprus, 75 Kallipoleos Street, Lefkosia 1678, Cyprus.
² Laboratory of Cardio-Pulmonary Testing and Pulmonary Rehabilitation, Department of Respiratory Medicine, Faculty of Medicine, University of Thessaly, Biopolis, 41500 Larissa, Greece.
³ Department of Respiratory Medicine, Faculty of Medicine, School of Health Sciences, University of Thessaly, Biopolis, 41500 Larissa, Greece.
⁴ Department of Business Administration, University of Patras, University Campus-Rio, 26504 Patras, Greece.
⁵ Department of Mechanical Engineering and Aeronautics, University of Patras, 26504 Patras, Greece.
⁶ Department of Physiology, Faculty of Medicine, School of Health Sciences, University of Thessaly, Biopois, 41500 Larissa, Greece.
⁷ Department of Electrical and Computer Engineering, University of Thessaly, 37 Glavani-28th October Str., Deligiorgi Building, 4th Floor, 38221 Volos, Greece.

Abstract

The aim of our study was to determine COVID-19 syndromic phenotypes in a data-driven manner using the survey results based on survey results from Carnegie Mellon University’s Delphi Group. Monthly survey results (>1 million responders per month; 320,326 responders with a certain COVID-19 test status and disease duration <30 days were included in this study) were used sequentially in identifying and validating COVID-19 syndromic phenotypes. Logistic Regression-weighted multiple correspondence analysis (LRW-MCA) was used as a preprocessing procedure, in order to weigh and transform symptoms recorded by the survey to eigenspace coordinates, capturing a total variance of >75%. These scores, along with symptom duration, were subsequently used by the Two Step Clustering algorithm to produce symptom clusters. Post-hoc logistic regression models adjusting for age, gender, and comorbidities and confirmatory linear principal components analyses were used to further explore the data. Model creation, based on August’s 66,165 included responders, was subsequently validated in data from March−December 2020. Five validated COVID-19 syndromes were identified in August: 1. Afebrile (0%), Non-Coughing (0%), Oligosymptomatic (ANCOS); 2. Febrile (100%) Multisymptomatic (FMS); 3. Afebrile (0%) Coughing (100%) Oligosymptomatic (ACOS); 4. Oligosymptomatic with additional self-described symptoms (100%; OSDS); 5. Olfaction/Gustatory Impairment Predominant (100%; OGIP). Our findings indicate that the COVID-19 spectrum may be undetectable when applying current disease definitions focusing on respiratory symptoms alone.

Keywords: COVID-19; big data; comorbidity; epidemiology; pattern recognition; phenotypes.

MeSH terms

COVID-19* / epidemiology
Comorbidity
Cough
Humans
Phenotype
SARS-CoV-2
United States / epidemiology