Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 6, 25-32

Estimating the Proportion of True Null Hypotheses for Multiple Comparisons

Affiliations

Estimating the Proportion of True Null Hypotheses for Multiple Comparisons

Hongmei Jiang et al. Cancer Inform.

Abstract

Whole genome microarray investigations (e.g. differential expression, differential methylation, ChIP-Chip) provide opportunities to test millions of features in a genome. Traditional multiple comparison procedures such as familywise error rate (FWER) controlling procedures are too conservative. Although false discovery rate (FDR) procedures have been suggested as having greater power, the control itself is not exact and depends on the proportion of true null hypotheses. Because this proportion is unknown, it has to be accurately (small bias, small variance) estimated, preferably using a simple calculation that can be made accessible to the general scientific community. We propose an easy-to-implement method and make the R code available, for estimating the proportion of true null hypotheses. This estimate has relatively small bias and small variance as demonstrated by (simulated and real data) comparing it with four existing procedures. Although presented here in the context of microarrays, this estimate is applicable for many multiple comparison situations.

Keywords: epigenomics; false discovery rate; microarray; multiple comparisons; type I error rate.

Figures

Figure 1
Figure 1
Simulation results of the False Discovery Rate (FDR) at significance level α = 0.05 for seven procedures: Benjamini and Hochberg’s FDR controlling procedure with incorporation of the true π0 (BHπ0 ), Benjamini and Hochberg’s FDR controlling procedure (BH), Benjamini and Hochberg’s adaptive approach with incorporation of the estimate of π0 which is estimated by the proposed average estimate procedure where B is chosen via bootstrapping (Bboot), Benjamini and Hochberg’s lowest slope approach (LSL), Storey’s bootstrapping approach (Storeyboot), Storey and Tibshirani’s smoother method (STsmoother), and Langass et al.’s nonparametric maximum likelihood estimate (convest), respectively. The black straight line represents FDR = 0.05. The total number of hypotheses tests is m = 1, 000 and the size of simulation study 1,000 for each value of π0.
Figure 2
Figure 2
Simulation results for the evaluation of statistical power at significance level α = 0.05 for seven procedures: Benjamini and Hochberg’s FDR controlling procedure with incorporation of the true π0 (BHπ0 ), Benjamini and Hochberg’s FDR controlling procedure (BH), Benjamini and Hochberg’s adaptive approach with incorporation of the estimate of π0 which is estimated by the proposed average estimate procedure where B is chosen via bootstrapping (Bboot), Benjamini and Hochberg’s lowest slope approach (LSL), Storey’s bootstrapping approach (Storeyboot), Storey and Tibshirani’s smoother method (STsmoother) and Langass et al.’s nonparametric maximum likelihood estimate (convest), respectively. The total number of hypotheses tests is m = 1, 000, and the size of simulation study is 1,000 for each value of π0.

Similar articles

See all similar articles

Cited by 6 articles

See all "Cited by" articles

References

    1. Benjamini Y, Hochberg Y. ‘Controlling the false discovery rate: a practical and powerful approach to multiple testing’. Journal of the Royal Statistical Society, Series B. 1995;57:289–300.
    1. Benjamini Y, Hochberg Y. ‘On the adaptive control of the false discovery rate in multiple testing with independent statistics’. Journal of Educational and Behavioral Statistics. 2000;25(1):60–83.
    1. Benjamini Y, Yekutieli D. ‘The control of the false discovery rate in multiple testing under dependency’. The Annals of Statistics. 2001;29:1165–88.
    1. Black MA. ‘A note on the adaptive control of false discovery rates’. Journal of the Royal Statistical Society, Series B. 2004;66(2):297–304.
    1. Golub T, et al. ‘Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring’. Science. 1999;286:531–7. - PubMed

LinkOut - more resources

Feedback