Effect of population stratification on case-control association studies. I. Elevation in false positive rates and comparison to confounding risk ratios (a simulation study)

Hum Hered. 2004;58(1):30-9. doi: 10.1159/000081454.


Objectives: This is the first of two articles discussing the effect of population stratification on the type I error rate (i.e., false positive rate). This paper focuses on the confounding risk ratio (CRR). It is accepted that population stratification (PS) can produce false positive results in case-control genetic association. However, which values of population parameters lead to an increase in type I error rate is unknown. Some believe PS does not represent a serious concern, whereas others believe that PS may contribute to contradictory findings in genetic association. We used computer simulations to estimate the effect of PS on type I error rate over a wide range of disease frequencies and marker allele frequencies, and we compared the observed type I error rate to the magnitude of the confounding risk ratio.

Methods: We simulated two populations and mixed them to produce a combined population, specifying 160 different combinations of input parameters (disease prevalences and marker allele frequencies in the two populations). From the combined populations, we selected 5000 case-control datasets, each with either 50, 100, or 300 cases and controls, and determined the type I error rate. In all simulations, the marker allele and disease were independent (i.e., no association).

Results: The type I error rate is not substantially affected by changes in the disease prevalence per se. We found that the CRR provides a relatively poor indicator of the magnitude of the increase in type I error rate. We also derived a simple mathematical quantity, Delta, that is highly correlated with the type I error rate. In the companion article (part II, in this issue), we extend this work to multiple subpopulations and unequal sampling proportions.

Conclusion: Based on these results, realistic combinations of disease prevalences and marker allele frequencies can substantially increase the probability of finding false evidence of marker disease associations. Furthermore, the CRR does not indicate when this will occur.

MeSH terms

  • Bias*
  • Case-Control Studies*
  • Computer Simulation*
  • Confounding Factors, Epidemiologic
  • Gene Frequency
  • Genetics, Population*
  • Humans
  • Models, Statistical*
  • Odds Ratio
  • Prevalence