Stratified case sampling and the use of family controls

Genet Epidemiol. 2001 Apr;20(3):316-27. doi: 10.1002/gepi.3.


We compare the asymptotic relative efficiency (ARE) of different study designs for estimating gene and gene-environment interaction effects using matched case-control data. In the sampling schemes considered, cases are selected differentially based on their family history of disease. Controls are selected either from unrelated subjects or from among the case's unaffected siblings and cousins. Parameters are estimated using weighted conditional logistic regression, where the likelihood contributions for each subject are weighted by the fraction of cases sampled sharing the same family history. Results showed that compared to random sampling, over-sampling cases with a positive family history increased the efficiency for estimating the main effect of a gene for sib-control designs (103-254% ARE) and decreased efficiency for cousin-control and population-control designs (68-94% ARE and 67-84% ARE, respectively). Population controls and random sampling of cases were most efficient for a recessive gene or a dominant gene with an relative risk less than 9. For estimating gene-environment interactions, over-sampling positive-family-history cases again led to increased efficiency using sib controls (111-180% ARE) and decreased efficiency using population controls (68-87% ARE). Using case-cousin pairs, the results differed based on the genetic model and the size of the interaction effect; biased sampling was only slightly more efficient than random sampling for large interaction effects under a dominant gene model (relative risk ratio = 8, 106% ARE). Overall, the most efficient study design for studying gene-environment interaction was the case-sib-control design with over-sampling of positive-family-history-cases.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Bias
  • Case-Control Studies*
  • Colorectal Neoplasms / genetics*
  • Genotype
  • Humans
  • Likelihood Functions
  • Logistic Models
  • Models, Genetic*
  • Registries
  • Research Design
  • Risk
  • Sampling Studies