A multivariate extension of the gene set enrichment analysis

J Bioinform Comput Biol. 2007 Oct;5(5):1139-53. doi: 10.1142/s0219720007003041.


A test-statistic typically employed in the gene set enrichment analysis (GSEA) prevents this method from being genuinely multivariate. In particular, this statistic is insensitive to changes in the correlation structure of the gene sets of interest. The present paper considers the utility of an alternative test-statistic in designing the confirmatory component of the GSEA. This statistic is based on a pertinent distance between joint distributions of expression levels of genes included in the set of interest. The null distribution of the proposed test-statistic, known as the multivariate N-statistic, is obtained by permuting group labels. Our simulation studies and analysis of biological data confirm the conjecture that the N-statistic is a much better choice for multivariate significance testing within the framework of the GSEA. We also discuss some other aspects of the GSEA paradigm and suggest new avenues for future research.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology
  • Gene Expression Profiling / statistics & numerical data
  • Models, Genetic
  • Multivariate Analysis
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data*
  • Phenotype