An Elastic Net Regression Model for Identifying Long COVID Patients Using Health Administrative Data: A Population-Based Study

Open Forum Infect Dis. 2022 Nov 24;9(12):ofac640. doi: 10.1093/ofid/ofac640. eCollection 2022 Dec.

Abstract

Background: Long coronavirus disease (COVID) patients experience persistent symptoms after acute severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. Healthcare utilization data could provide critical information on the disease burden of long COVID for service planning; however, not all patients are diagnosed or assigned long COVID diagnostic codes. We developed an algorithm to identify individuals with long COVID using population-level health administrative data from British Columbia (BC), Canada.

Methods: An elastic net penalized logistic regression model was developed to identify long COVID patients based on demographic characteristics, pre-existing conditions, COVID-19-related data, and all symptoms/conditions recorded >28-183 days after the COVID-19 symptom onset/reported (index) date of known long COVID patients (n = 2430) and a control group (n = 24 300), selected from all adult COVID-19 cases in BC with an index date on/before October 31, 2021 (n = 168 111). Known long COVID cases were diagnosed in a clinic and/or had the International Classification of Diseases, Tenth Revision, Canada (ICD-10-CA) code for "post COVID-19 condition" in their records.

Results: The algorithm retained known symptoms/conditions associated with long COVID, demonstrating high sensitivity (86%), specificity (86%), and area under the receiver operator curve (93%). It identified 25 220 (18%) long COVID patients among the remaining 141 381 adult COVID-19 cases, >10 times the number of known cases. Known and predicted long COVID patients had comparable demographic and health-related characteristics.

Conclusions: Our algorithm identified long COVID patients with a high level of accuracy. This large cohort of long COVID patients will serve as a platform for robust assessments on the clinical course of long COVID, and provide much needed concrete information for decision-making.

Keywords: long COVID; post-COVID-19 condition; post-acute COVID-19 syndrome; post-acute sequelae of COVID-19.