Background: HIV prevalence estimates from population-based surveys are vulnerable to selection bias if HIV status is missing for a proportion of the eligible population. Standard approaches, such as imputation, to correct prevalence estimates for selective nonparticipation assume that data are "missing at random." These approaches lead to biased estimates, if unobserved factors are associated with both survey participation and HIV status.
Methods: We use Heckman-type selection models to test and correct for selection on unobserved factors (separately for men and women) in the 2007 Zambia Demographic and Health Survey, in which 28% of the 7146 eligible men and 23% of the 7408 eligible women did not participate in HIV testing. Performance of these models depends crucially on selection variables that determine survey participation but do not independently affect HIV status.
Results: We identify 2 highly-plausible selection variables that are statistically significant determinants of survey participation: interviewer identity, and visit on the first day of fieldwork in a survey cluster. HIV-positive status was negatively correlated with consent to test in men (ρ = -0.75 [95% confidence interval = -0.94 to -0.18]), but not in women. Adjusting for selection on unobserved variables substantially increased the HIV prevalence estimate for men from 12% (based on measured HIV status alone) and 12% (based on imputation) to 21%. In addition, the adjustment for selection substantially changed the estimated effects of HIV risk factors.
Conclusions: Studies of HIV prevalence and risk factors based on surveys with substantial nonparticipation should routinely use Heckman-type selection models to correct for selection on unobserved variables.