Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jan 1;40(9):1949-1964.
doi: 10.1080/02664763.2013.800035.

Estimating the Proportion of True Null Hypotheses Using the Pattern of Observed p-values

Affiliations
Free PMC article

Estimating the Proportion of True Null Hypotheses Using the Pattern of Observed p-values

Tiejun Tong et al. J Appl Stat. .
Free PMC article

Abstract

Estimating the proportion of true null hypotheses, π0, has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π0 in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π0 by incorporating the distribution pattern of the observed p-values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-null p-values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1 - λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance.

Keywords: gene expression data; multiple testing; p-value; proportion of true null hypotheses.

Figures

Figure 1
Figure 1
Histograms for 9 simulated p-value sets when ρ = 0.5.
Figure 2
Figure 2
Conditional densities of p2 given the value of p1 at 0.2 (the left panel) or 0.5 (the right panel). In both panels, three curves (solid, dashed and dotted) correspond to three different ρ (0.8, 0.5, 0.2) respectively.
Figure 3
Figure 3
Sketch plot for estimating f1(λ), f2(λ) and f3(λ).
Figure 4
Figure 4
Histogram of the standard deviations for all the genes.
Figure 5
Figure 5
Histograms of p-values for the three data sets, where Panels A, B, C and D correspond to the p-values from Kuo et al. (2003)’s data set, Scholtens et al.’s data set, Cui et al.’s data set with the CHQBC method, and Cui et al.’s data set with the TW method, respectively.

Similar articles

See all similar articles

Cited by 1 article

LinkOut - more resources

Feedback