Controlling confounding when studying large pharmacoepidemiologic databases: a case study of the two-stage sampling design

Epidemiology. 1998 May;9(3):309-15. doi: 10.1097/00001648-199805000-00011.


Large drug databases have been the source of interesting developments for pharmacoepidemiologic research, because they provide relatively accurate drug exposure histories. An important limitation of these databases is the lack of information on potential confounders. One solution, developed more than a decade ago but not widely used, is "two-stage sampling," in which stage 1 is the collection of information on drug exposure and outcomes, and stage 2 is the collection of confounder data on a subset of the stage 1 sample. The balanced design, wherein an equal number of individuals is selected from each drug exposure/disease category, is usually the most efficient strategy by which to select the stage 2 sample. We illustrate the efficiency of the balanced design in two-stage sampling using data from a provincial health organization and a simulation. We also evaluate the relative importance of factors affecting the precision of the effect estimate of the exposure of interest.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Case-Control Studies
  • Confounding Factors, Epidemiologic*
  • Databases, Factual*
  • Drug-Related Side Effects and Adverse Reactions*
  • Humans
  • Research Design*
  • Sampling Studies