Multi-stage sampling in genetic epidemiology

Stat Med. 1997 Jan 15-Feb 15;16(1-3):153-67. doi: 10.1002/(sici)1097-0258(19970130)16:2<153::aid-sim477>;2-7.


When data are expensive to collect, it can be cost-efficient to sample in two or more stages. In the first stage a simple random sample is drawn and then stratified according to some easily measured attribute. In each subsequent stage a random subset of previously selected units is sampled for more detailed observation, with a unit's sampling probability determined by its attributes as observed in the previous stages. These designs are useful in many medical studies; here we use them in genetic epidemiology. Two genetic studies illustrate the strengths and limitations of the approach. The first study evaluates nuclear and mitochondrial DNA in U.S. blacks. The goal is to estimate the relative contributions of white male genes and white female genes to the gene pool of African-Americans. This example shows that the Horvitz-Thompson estimators proposed for multi-stage designs can be inefficient, particularly when used with unnecessary stratification. The second example is a multi-stage study of familial prostate cancer. The goal is to gather pedigrees, blood samples and archived tissue for segregation and linkage analysis of familial prostate cancer data by first obtaining crude family data from prostate cancer cases and cancer-free controls. This second example shows the gains in efficiency from multi-stage sampling when the individual likelihood or quasilikelihood scores vary substantially across strata.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • African Americans
  • African Continental Ancestry Group / genetics
  • Cost-Benefit Analysis
  • DNA / analysis
  • DNA, Mitochondrial / analysis
  • Data Collection / economics
  • Data Interpretation, Statistical
  • Disease Susceptibility
  • Epidemiologic Methods*
  • Female
  • Genetic Linkage
  • Genotype
  • Humans
  • Likelihood Functions
  • Male
  • Models, Statistical*
  • Prostatic Neoplasms / epidemiology
  • Prostatic Neoplasms / genetics
  • Research Design
  • Sampling Studies*
  • United States / epidemiology


  • DNA, Mitochondrial
  • DNA