Power for genetic association studies with random allele frequencies and genotype distributions

Am J Hum Genet. 2004 Apr;74(4):683-93. doi: 10.1086/383282. Epub 2004 Mar 12.


One of the first and most important steps in planning a genetic association study is the accurate estimation of the statistical power under a proposed study design and sample size. In association studies for candidate genes or in fine-mapping applications, allele and genotype frequencies are often assumed to be known when, in fact, they are unknown (i.e., random variables from some distribution). For example, if we consider a diallelic marker with allele frequencies of 0.5 and 0.5 and Hardy-Weinberg proportions, the three genotype frequencies are often assumed to be 0.25, 0.50, and 0.25, and the statistical power is calculated. Unfortunately, ignoring this source of variation can inflate the estimated power of the study. In the present article, we propose averaging the estimates of power over the distribution of the genotype frequencies to calculate the true estimate of power for a fixed allele frequency. For the usual situation, in which allele frequencies in a population are not known, we propose placing a prior distribution on the allele frequency, taking advantage of any available genotype information. This Bayesian approach provides a more accurate estimate of power. We present examples for quantitative and qualitative traits in cohort studies of unrelated individuals and results from an extensive series of examples that show that ignoring the uncertainty in allele frequencies can inflate the estimated power of the study. We also present the results from case-control studies and show that standard methods may also overestimate power. As discussed in this article, the approach of fixing allele frequencies even if they are not known is the common approach to power calculations. We show that ignoring the sources of variation in allele frequencies tends to result in overestimates of power and, consequently, in studies that are underpowered. Software in C is available at http://www.ambrosius.net/Power/.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Alleles
  • Bayes Theorem
  • Case-Control Studies
  • Gene Frequency / genetics*
  • Genes, Dominant / genetics
  • Genetic Markers / genetics
  • Genetic Predisposition to Disease / genetics*
  • Genetic Variation / genetics
  • Genotype
  • Humans
  • Internet
  • Models, Genetic
  • Quantitative Trait, Heritable
  • Research Design*
  • Sample Size
  • Software
  • Statistical Distributions


  • Genetic Markers