Power estimation and sample size determination for replication studies of genome-wide association studies

BMC Genomics. 2016 Jan 11;17 Suppl 1(Suppl 1):3. doi: 10.1186/s12864-015-2296-4.


Background: Replication study is a commonly used verification method to filter out false positives in genome-wide association studies (GWAS). If an association can be confirmed in a replication study, it will have a high confidence to be true positive. To design a replication study, traditional approaches calculate power by treating replication study as another independent primary study. These approaches do not use the information given by primary study. Besides, they need to specify a minimum detectable effect size, which may be subjective. One may think to replace the minimum effect size with the observed effect sizes in the power calculation. However, this approach will make the designed replication study underpowered since we are only interested in the positive associations from the primary study and the problem of the "winner's curse" will occur.

Results: An Empirical Bayes (EB) based method is proposed to estimate the power of replication study for each association. The corresponding credible interval is estimated in the proposed approach. Simulation experiments show that our method is better than other plug-in based estimators in terms of overcoming the winner's curse and providing higher estimation accuracy. The coverage probability of given credible interval is well-calibrated in the simulation experiments. Weighted average method is used to estimate the average power of all underlying true associations. This is used to determine the sample size of replication study. Sample sizes are estimated on 6 diseases from Wellcome Trust Case Control Consortium (WTCCC) using our method. They are higher than sample sizes estimated by plugging observed effect sizes in power calculation.

Conclusions: Our new method can objectively determine replication study's sample size by using information extracted from primary study. Also the winner's curse is alleviated. Thus, it is a better choice when designing replication studies of GWAS. The R-package is available at: http://bioinformatics.ust.hk/RPower.html .

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Alleles
  • Arthritis, Rheumatoid / genetics
  • Bayes Theorem
  • Crohn Disease / genetics
  • Diabetes Mellitus / genetics
  • Genetic Predisposition to Disease
  • Genome, Human*
  • Genome-Wide Association Study*
  • Humans
  • Polymorphism, Single Nucleotide
  • Vascular Diseases / genetics