Gene set enrichment analysis made simple

Stat Methods Med Res. 2009 Dec;18(6):565-75. doi: 10.1177/0962280209351908.


Among the many applications of microarray technology, one of the most popular is the identification of genes that are differentially expressed in two conditions. A common statistical approach is to quantify the interest of each gene with a p-value, adjust these p-values for multiple comparisons, choose an appropriate cut-off, and create a list of candidate genes. This approach has been criticised for ignoring biological knowledge regarding how genes work together. Recently a series of methods, that do incorporate biological knowledge, have been proposed. However, the most popular method, gene set enrichment analysis (GSEA), seems overly complicated. Furthermore, GSEA is based on a statistical test known for its lack of sensitivity. In this article we compare the performance of a simple alternative to GSEA. We find that this simple solution clearly outperforms GSEA. We demonstrate this with eight different microarray datasets.

MeSH terms

  • Algorithms
  • Computational Biology
  • Databases, Genetic
  • Gene Expression Profiling / methods*
  • Gene Expression Profiling / statistics & numerical data
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods*
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data
  • Phenotype