Developing small-area predictions for smoking and obesity prevalence in the United States for use in Environmental Public Health Tracking

Environ Res. 2014 Oct;134:435-52. doi: 10.1016/j.envres.2014.07.029. Epub 2014 Sep 28.


Background: Globally and in the United States, smoking and obesity are leading causes of death and disability. Reliable estimates of prevalence for these risk factors are often missing variables in public health surveillance programs. This may limit the capacity of public health surveillance to target interventions or to assess associations between other environmental risk factors (e.g., air pollution) and health because smoking and obesity are often important confounders.

Objectives: To generate prevalence estimates of smoking and obesity rates over small areas for the United States (i.e., at the ZIP code and census tract levels).

Methods: We predicted smoking and obesity prevalence using a combined approach first using a lasso-based variable selection procedure followed by a two-level random effects regression with a Poisson link clustered on state and county. We used data from the Behavioral Risk Factor Surveillance System (BRFSS) from 1991 to 2010 to estimate the model. We used 10-fold cross-validated mean squared errors and the variance of the residuals to test our model. To downscale the estimates we combined the prediction equations with 1990 and 2000 U.S. Census data for each of the four five-year time periods in this time range at the ZIP code and census tract levels. Several sensitivity analyses were conducted using models that included only basic terms, that accounted for spatial autocorrelation, and used Generalized Linear Models that did not include random effects.

Results: The two-level random effects model produced improved estimates compared to the fixed effects-only models. Estimates were particularly improved for the two-thirds of the conterminous U.S. where BRFSS data were available to estimate the county level random effects. We downscaled the smoking and obesity rate predictions to derive ZIP code and census tract estimates.

Conclusions: To our knowledge these smoking and obesity predictions are the first to be developed for the entire conterminous U.S. for census tracts and ZIP codes. Our estimates could have significant utility for public health surveillance.

Keywords: GIS; Multilevel model; Obesity; Small-area; Smoking.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Obesity / epidemiology*
  • Prevalence
  • Public Health Practice*
  • Smoking / epidemiology*
  • United States / epidemiology