Background: Although ongoing, multi-topic surveys form the basis of public health surveillance in many countries, their utility for specific subject matter areas can be limited by high proportions of missing data. For example, the National Health and Examination Survey is the main resource for surveillance of elevated blood lead levels (EBLLs) in US children, but key predictor variables are missing for as many as 35% of respondents.
Methods: Using a Bayesian framework, we formulate a t-distributed Heckman selection model applicable to the case of multiple missing-not-at-random variables in the context of a complex survey design. We demonstrate the utility of the results by calculating prevalence estimates for lead levels exceeding 2.5, 5.0, and 10.0 µg/dL among children 1 to 5 years of age for a variety of time points and geographies by applying the coefficients to data from the American Community Survey from the US Census.
Results: We present a protocol for estimating posterior distributions of parameters using Gibbs and grid sampling steps. Stark disparities in the prevalence of EBLL by race/ethnicity, age of housing, and poverty are readily quantified, and three- to five-fold differences in predicted prevalence across geographies within the US are presented.
Conclusions: We are able to conduct multivariate analyses of EBLLs that incorporate the crucial variable age of housing, analyses that have not been previously available using these data. This represents an expansion of the utility of National Health and Examination Survey that is likely to be relevant to many similar ongoing, multi-topic health surveillance efforts. Copyright © 2016 John Wiley & Sons, Ltd.
Keywords: lead poisoning; missing-not-at-random; selection models; survey data.
Copyright © 2016 John Wiley & Sons, Ltd.