Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 2;19(1):78.
doi: 10.1186/s12859-018-2081-x.

Control Procedures and Estimators of the False Discovery Rate and Their Application in Low-Dimensional Settings: An Empirical Investigation

Affiliations
Free PMC article

Control Procedures and Estimators of the False Discovery Rate and Their Application in Low-Dimensional Settings: An Empirical Investigation

Regina Brinster et al. BMC Bioinformatics. .
Free PMC article

Abstract

Background: When many (up to millions) of statistical tests are conducted in discovery set analyses such as genome-wide association studies (GWAS), approaches controlling family-wise error rate (FWER) or false discovery rate (FDR) are required to reduce the number of false positive decisions. Some methods were specifically developed in the context of high-dimensional settings and partially rely on the estimation of the proportion of true null hypotheses. However, these approaches are also applied in low-dimensional settings such as replication set analyses that might be restricted to a small number of specific hypotheses. The aim of this study was to compare different approaches in low-dimensional settings using (a) real data from the CKDGen Consortium and (b) a simulation study.

Results: In both application and simulation FWER approaches were less powerful compared to FDR control methods, whether a larger number of hypotheses were tested or not. Most powerful was the q-value method. However, the specificity of this method to maintain true null hypotheses was especially decreased when the number of tested hypotheses was small. In this low-dimensional situation, estimation of the proportion of true null hypotheses was biased.

Conclusions: The results highlight the importance of a sizeable data set for a reliable estimation of the proportion of true null hypotheses. Consequently, methods relying on this estimation should only be applied in high-dimensional settings. Furthermore, if the focus lies on testing of a small number of hypotheses such as in replication settings, FWER methods rather than FDR methods should be preferred to maintain high specificity.

Keywords: False discovery rate; Low-dimensional setting; Q-value method; Simulation study.

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
CKDGen data example – Number of significant p-values (regions) in replication set. Applied procedures controlling the type I error: Bonferroni correction (BO), Hommel’s procedure (HO), Benjamini-Yekutieli’s procedure (BY), Strimmer’s LFDR method (LFDR), Benjamini-Hochberg’s procedure (BH), Two-stage procedure (TSBH), Strimmer’s q-value method (qv Str), Storey’s q-value method (qv Sto). Results are ordered by number of significant p-values leading to a separation of FDR methods from FWER methods (indicated by dashed line). Additional significant p-values from one approach to another are indicated by decreasing gray shades within the bars
Fig. 2
Fig. 2
Simulation – Number of repetitions with at least 1 false positive decision and average specificity for π0 = 100% (a). Average power and specificity for β1 = 2.5 and π0 = 75% (b), 50% (c), 25% (d). Applied procedures controlling the type I error: Bonferroni correction, Hommel’s procedure, Benjamini-Hochberg’s procedure, Two-stage procedure, Benjamini-Yekutieli’s procedure, Storey’s q-value method, Strimmer’s q-value method, Strimmer’s LFDR method. Power is defined as the proportion of correctly rejected hypotheses and specificity as the proportion of correctly maintained hypotheses. Both proportions potentially range from 0 to 1. Simulations for each scenario were repeated 100 times
Fig. 3
Fig. 3
Simulation – Observed estimations of π0 for Storey’s (qv) and Strimmer’s q-value methods (fdr) for π0 = 100% (a) and for β1 = 2.5 and π0 = 75% (b), 50% (c), 25% (d)

Similar articles

See all similar articles

Cited by 2 articles

References

    1. Balding DJ. A tutorial on statistical methods for population association studies. Nat Rev Genet. 2006;7(10):781–791. doi: 10.1038/nrg1916. - DOI - PubMed
    1. Zeng P, Zhao Y, Qian C, Zhang L, Zhang R, Gou J, Liu J, Liu L, Chen F. Statistical analysis for genome-wide association study. J Biomed Res. 2015;29(4):285–297. - PMC - PubMed
    1. Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. 3. Philadelphia: Lippincott Williams & Wilkins; 2008. pp. 151–156.
    1. Goeman JJ, Solari A. Multiple hypothesis testing in genomics. Stat Med. 2014;33(11):1946–1978. doi: 10.1002/sim.6082. - DOI - PubMed
    1. Bland JM, Altman DG. Multiple significance tests: the Bonferroni method. BMJ. 1995;310(6973):170. doi: 10.1136/bmj.310.6973.170. - DOI - PMC - PubMed

Publication types

LinkOut - more resources

Feedback