A simple and improved correction for population stratification in case-control studies

Am J Hum Genet. 2007 May;80(5):921-30. doi: 10.1086/516842. Epub 2007 Mar 29.


Population stratification remains an important issue in case-control studies of disease-marker association, even within populations considered to be genetically homogeneous. Campbell et al. (Nature Genetics 2005;37:868-872) illustrated this by showing that stratification induced a spurious association between the lactase gene (LCT) and tall/short status in a European American sample. Furthermore, existing approaches for controlling stratification by use of substructure-informative loci (e.g., genomic control, structured association, and principal components) could not resolve this confounding. To address this problem, we propose a simple two-step procedure. In the first step, we model the odds of disease, given data on substructure-informative loci (excluding the test locus). For each participant, we use this model to calculate a stratification score, which is that participant's estimated odds of disease calculated using his or her substructure-informative-loci data in the disease-odds model. In the second step, we assign subjects to strata defined by stratification score and then test for association between the disease and the test locus within these strata. The resulting association test is valid even in the presence of population stratification. Our approach is computationally simple and less model dependent than are existing approaches for controlling stratification. To illustrate these properties, we apply our approach to the data from Campbell et al. and find no association between the LCT locus and tall/short status. Using simulated data, we show that our approach yields a more appropriate correction for stratification than does principal components or genomic control.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Alleles
  • Body Height / genetics
  • Case-Control Studies
  • Gene Frequency
  • Genetic Markers
  • Genetic Techniques
  • Genetics, Population*
  • Humans
  • Lactase / genetics
  • Models, Genetic
  • Models, Statistical
  • Polymorphism, Single Nucleotide


  • Genetic Markers
  • Lactase

Associated data

  • OMIM/603202