Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec 1;27(23):3235-41.
doi: 10.1093/bioinformatics/btr568. Epub 2011 Oct 13.

Sequential Monte Carlo Multiple Testing

Affiliations
Free PMC article

Sequential Monte Carlo Multiple Testing

Geir Kjetil Sandve et al. Bioinformatics. .
Free PMC article

Abstract

Motivation: In molecular biology, as in many other scientific fields, the scale of analyses is ever increasing. Often, complex Monte Carlo simulation is required, sometimes within a large-scale multiple testing setting. The resulting computational costs may be prohibitively high.

Results: We here present MCFDR, a simple, novel algorithm for false discovery rate (FDR) modulated sequential Monte Carlo (MC) multiple hypothesis testing. The algorithm iterates between adding MC samples across tests and calculating intermediate FDR values for the collection of tests. MC sampling is stopped either by sequential MC or based on a threshold on FDR. An essential property of the algorithm is that it limits the total number of MC samples whatever the number of true null hypotheses. We show on both real and simulated data that the proposed algorithm provides large gains in computational efficiency.

Availability: MCFDR is implemented in the Genomic HyperBrowser (http://hyperbrowser.uio.no/mcfdr), a web-based system for genome analysis. All input data and results are available and can be reproduced through a Galaxy Pages document at: http://hyperbrowser.uio.no/mcfdr/u/sandve/p/mcfdr.

Contact: geirksa@ifi.uio.no.

Figures

Fig. 1.
Fig. 1.
Total number of samples for sequential MC and MCFDR, respectively, as a function of the number of true H1. When the proportion of true H1 is low, most tests are stopped by the sequential MC criterion, resulting in a similar number of samples for both schemes. At larger proportions of true H1, the multiple testing correction becomes milder, and thus fewer samples are needed to reach the FDR threshold. Thus, for the MCFDR scheme the total number of needed samples decreases with higher true H1 proportions. In contrast, for the sequential MC scheme, the number of needed samples increases linearly with increasing proportion of true H1. For standard MC, a large, constant number of samples is needed.
Fig. 2.
Fig. 2.
Behavior of the sequential MC and MCFDR schemes as a function of the number of true H1. (a) Number of rejected tests as a function of the number of true H1. (b) Empirical FDR on test collections as a function of the number of true H1.
Fig. 3.
Fig. 3.
Test collection at π0=0.9. (a) Underlying and estimated P-values (sorted by underlying P-value). The small P-values, mostly from H1, are accurately estimated. Larger P-values, mainly from H0, are less accurately estimated, as for sequential MC. (b) Number of samples drawn per test, as well as the number of extreme samples among these, with tests sorted in the same order as in panel (a).
Fig. 4.
Fig. 4.
Test collection at π0=0.2. (a) Underlying and estimated P-values (sorted by underlying P-value). (b) Number of samples drawn per test, as well as the number of extreme samples among these, with tests sorted in the same order as in panel (a).
Fig. 5.
Fig. 5.
H3K4me2 modifications and Ensembl genes occurring in gene region corresponding to Ensembl gene ID ENSG00000112038 (chr6:154,402,136-154,609,693), visualized by the UCSC Genome Browser. In reference to the above-mentioned gene (corresponding to Ensembl transcript ID ENST00000337049 in the figure), H3K4me2 modifications occur significantly more downstream in the gene. However, in reference to the gene corresponding to ID ENST00000367220, which is shorter and at the opposite strand, the H3K4me2 modifications are preferentially located close to TSS, occur gradually less frequently throughout the gene body and stop appearing after the gene body.

Similar articles

See all similar articles

Cited by 7 articles

See all "Cited by" articles

References

    1. Barski A., et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–837. - PubMed
    1. Benjamini Y., Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. 1995;57:289–300.
    1. Besag J., Clifford P. Sequential Monte Carlo p-values. Biometrika. 1991;78:301.
    1. Celisse A., Robin S. A cross-validation based estimation of the proportion of true null hypotheses. J. Stat. Plan. Inf. 2010;140:3132–3147.
    1. Davison A., Hinkley D. Bootstrap Methods and their Application. Cambridge, UK: Cambridge University Press; 1997.

Publication types

Feedback