Resampling-based expression pathway analysis techniques have been shown to preserve type I error rates, in contrast to simple gene-list approaches that implicitly assume the independence of genes in ranked lists. However, resampling is intensive in computation time and memory requirements. We describe accurate analytic approximations to permutations of score statistics, including novel approaches for Pearson's correlation, and summed score statistics, that have good performance for even relatively small sample sizes. Our approach preserves the essence of permutation pathway analysis, but with greatly reduced computation. Extensions for inclusion of covariates and censored data are described, and we test the performance of our procedures using simulations based on real datasets. These approaches have been implemented in the new R package safeExpress.
Keywords: Gene sets; Multiple hypothesis testing; Permutation approximation.