Pseudo-likelihood based logistic regression for estimating COVID-19 infection and case fatality rates by gender, race, and age in California

Epidemics. 2020 Dec:33:100418. doi: 10.1016/j.epidem.2020.100418. Epub 2020 Nov 9.

Abstract

In emerging epidemics, early estimates of key epidemiological characteristics of the disease are critical for guiding public policy. In particular, identifying high-risk population subgroups aids policymakers and health officials in combating the epidemic. This has been challenging during the coronavirus disease 2019 (COVID-19) pandemic because governmental agencies typically release aggregate COVID-19 data as summary statistics of patient demographics. These data may identify disparities in COVID-19 outcomes between broad population subgroups, but do not provide comparisons between more granular population subgroups defined by combinations of multiple demographics. We introduce a method that helps to overcome the limitations of aggregated summary statistics and yields estimates of COVID-19 infection and case fatality rates - key quantities for guiding public policy related to the control and prevention of COVID-19 - for population subgroups across combinations of demographic characteristics. Our approach uses pseudo-likelihood based logistic regression to combine aggregate COVID-19 case and fatality data with population-level demographic survey data to estimate infection and case fatality rates for population subgroups across combinations of demographic characteristics. We illustrate our method on California COVID-19 data to estimate test-based infection and case fatality rates for population subgroups defined by gender, age, and race/ethnicity. Our analysis indicates that in California, males have higher test-based infection rates and test-based case fatality rates across age and race/ethnicity groups, with the gender gap widening with increasing age. Although elderly infected with COVID-19 are at an elevated risk of mortality, the test-based infection rates do not increase monotonically with age. The workforce population, especially, has a higher test-based infection rate than children, adolescents, and other elderly people in their 60-80. LatinX and African Americans have higher test-based infection rates than other race/ethnicity groups. The subgroups with the highest 5 test-based case fatality rates are all-male groups with race as African American, Asian, Multi-race, LatinX, and White, followed by African American females, indicating that African Americans are an especially vulnerable California subpopulation.

Keywords: COVID-19; California Health Interview Survey; Case fatality rate; Infection rate; Logistic regression.

MeSH terms

  • Adolescent
  • Adult
  • Age Factors
  • Aged
  • Aged, 80 and over
  • COVID-19 / epidemiology*
  • COVID-19 / mortality
  • California / epidemiology
  • California / ethnology
  • Child
  • Ethnicity
  • Female
  • Health Surveys
  • Humans
  • Likelihood Functions
  • Logistic Models*
  • Male
  • Middle Aged
  • Monte Carlo Method
  • Pandemics
  • Racial Groups
  • Risk Factors
  • SARS-CoV-2 / physiology
  • Sex Factors