Empirical pathway analysis, without permutation

Biostatistics. 2013 Jul;14(3):573-85. doi: 10.1093/biostatistics/kxt004. Epub 2013 Feb 20.


Resampling-based expression pathway analysis techniques have been shown to preserve type I error rates, in contrast to simple gene-list approaches that implicitly assume the independence of genes in ranked lists. However, resampling is intensive in computation time and memory requirements. We describe accurate analytic approximations to permutations of score statistics, including novel approaches for Pearson's correlation, and summed score statistics, that have good performance for even relatively small sample sizes. Our approach preserves the essence of permutation pathway analysis, but with greatly reduced computation. Extensions for inclusion of covariates and censored data are described, and we test the performance of our procedures using simulations based on real datasets. These approaches have been implemented in the new R package safeExpress.

Keywords: Gene sets; Multiple hypothesis testing; Permutation approximation.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biostatistics
  • Breast Neoplasms / genetics
  • Computer Simulation
  • Databases, Genetic / statistics & numerical data
  • Disease-Free Survival
  • Female
  • Gene Expression Profiling / statistics & numerical data*
  • Gene Regulatory Networks
  • Genes, p53
  • Humans
  • Models, Genetic
  • Models, Statistical*
  • Salivary Glands / metabolism
  • Software
  • Stochastic Processes