Accounting for selection bias in association studies with complex survey data

Epidemiology. 2014 May;25(3):444-53. doi: 10.1097/EDE.0000000000000037.


Obtaining representative information from hidden and hard-to-reach populations is fundamental to describe the epidemiology of many sexually transmitted diseases, including HIV. Unfortunately, simple random sampling is impractical in these settings, as no registry of names exists from which to sample the population at random. However, complex sampling designs can be used, as members of these populations tend to congregate at known locations, which can be enumerated and sampled at random. For example, female sex workers may be found at brothels and street corners, whereas injection drug users often come together at shooting galleries. Despite the logistical appeal, complex sampling schemes lead to unequal probabilities of selection, and failure to account for this differential selection can result in biased estimates of population averages and relative risks. However, standard techniques to account for selection can lead to substantial losses in efficiency. Consequently, researchers implement a variety of strategies in an effort to balance validity and efficiency. Some researchers fully or partially account for the survey design, whereas others do nothing and treat the sample as a realization of the population of interest. We use directed acyclic graphs to show how certain survey sampling designs, combined with subject-matter considerations unique to individual exposure-outcome associations, can induce selection bias. Finally, we present a novel yet simple maximum likelihood approach for analyzing complex survey data; this approach optimizes statistical efficiency at no cost to validity. We use simulated data to illustrate this method and compare it with other analytic techniques.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Attitude to Health
  • Epidemiologic Methods*
  • Female
  • HIV Infections / epidemiology*
  • HIV Infections / prevention & control
  • Health Surveys
  • Humans
  • Likelihood Functions
  • Logistic Models
  • Male
  • Monte Carlo Method
  • Risk-Taking
  • Sampling Studies
  • Selection Bias*
  • Sensitivity and Specificity
  • Sexually Transmitted Diseases / epidemiology*
  • Sexually Transmitted Diseases / prevention & control