Combining p-values in large-scale genomics experiments

Dmitri V Zaykin; Lev A Zhivotovsky; Wendy Czika; Susan Shao; Russell D Wolfinger

doi:10.1002/pst.304

Combining p-values in large-scale genomics experiments

Pharm Stat. 2007 Jul-Sep;6(3):217-26. doi: 10.1002/pst.304.

Authors

Dmitri V Zaykin¹, Lev A Zhivotovsky, Wendy Czika, Susan Shao, Russell D Wolfinger

Affiliation

¹ National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA. zaykind@niehs.nih.gov

PMID: 17879330
PMCID: PMC2569904
DOI: 10.1002/pst.304

Abstract

In large-scale genomics experiments involving thousands of statistical tests, such as association scans and microarray expression experiments, a key question is: Which of the L tests represent true associations (TAs)? The traditional way to control false findings is via individual adjustments. In the presence of multiple TAs, p-value combination methods offer certain advantages. Both Fisher's and Lancaster's combination methods use an inverse gamma transformation. We identify the relation of the shape parameter of that distribution to the implicit threshold value; p-values below that threshold are favored by the inverse gamma method (GM). We explore this feature to improve power over Fisher's method when L is large and the number of TAs is moderate. However, the improvement in power provided by combination methods is at the expense of a weaker claim made upon rejection of the null hypothesis - that there are some TAs among the L tests. Thus, GM remains a global test. To allow a stronger claim about a subset of p-values that is smaller than L, we investigate two methods with an explicit truncation: the rank truncated product method (RTP) that combines the first K-ordered p-values, and the truncated product method (TPM) that combines p-values that are smaller than a specified threshold. We conclude that TPM allows claims to be made about subsets of p-values, while the claim of the RTP is, like GM, more appropriately about all L tests. GM gives somewhat higher power than TPM, RTP, Fisher, and Simes methods across a range of simulations.

Publication types

Research Support, N.I.H., Intramural

MeSH terms

Computer Simulation
Data Interpretation, Statistical*
Genomics / statistics & numerical data*
Humans
Models, Genetic*
Models, Statistical*
Oligonucleotide Array Sequence Analysis / methods
Probability

Grants and funding

Z01 ES101866-03/Intramural NIH HHS/United States