Identifying Communities at Risk for COVID-19-Related Burden Across 500 US Cities and Within New York City: Unsupervised Learning of the Coprevalence of Health Indicators

JMIR Public Health Surveill. 2021 Aug 26;7(8):e26604. doi: 10.2196/26604.

Abstract

Background: Although it is well-known that older individuals with certain comorbidities are at the highest risk for complications related to COVID-19 including hospitalization and death, we lack tools to identify communities at the highest risk with fine-grained spatial resolution. Information collected at a county level obscures local risk and complex interactions between clinical comorbidities, the built environment, population factors, and other social determinants of health.

Objective: This study aims to develop a COVID-19 community risk score that summarizes complex disease prevalence together with age and sex, and compares the score to different social determinants of health indicators and built environment measures derived from satellite images using deep learning.

Methods: We developed a robust COVID-19 community risk score (COVID-19 risk score) that summarizes the complex disease co-occurrences (using data for 2019) for individual census tracts with unsupervised learning, selected on the basis of their association with risk for COVID-19 complications such as death. We mapped the COVID-19 risk score to corresponding zip codes in New York City and associated the score with COVID-19-related death. We further modeled the variance of the COVID-19 risk score using satellite imagery and social determinants of health.

Results: Using 2019 chronic disease data, the COVID-19 risk score described 85% of the variation in the co-occurrence of 15 diseases and health behaviors that are risk factors for COVID-19 complications among ~28,000 census tract neighborhoods (median population size of tracts 4091). The COVID-19 risk score was associated with a 40% greater risk for COVID-19-related death across New York City (April and September 2020) for a 1 SD change in the score (risk ratio for 1 SD change in COVID-19 risk score 1.4; P<.001) at the zip code level. Satellite imagery coupled with social determinants of health explain nearly 90% of the variance in the COVID-19 risk score in the United States in census tracts (r2=0.87).

Conclusions: The COVID-19 risk score localizes risk at the census tract level and was able to predict COVID-19-related mortality in New York City. The built environment explained significant variations in the score, suggesting risk models could be enhanced with satellite imagery.

Keywords: COVID-19; United States; artificial intelligence; built environment; community; comorbidity; determinant; environment; indicator; machine learning; mortality; population; prediction; risk; satellite imagery; social determinants of health.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19 / epidemiology*
  • COVID-19 / mortality
  • Cities / epidemiology
  • Cost of Illness*
  • Health Status Indicators
  • Humans
  • New York City / epidemiology
  • Residence Characteristics / statistics & numerical data*
  • Risk Assessment / methods
  • Risk Factors
  • Social Determinants of Health
  • United States / epidemiology
  • Unsupervised Machine Learning