Gene set bagging for estimating the probability a statistically significant result will replicate

BMC Bioinformatics. 2013 Dec 12;14:360. doi: 10.1186/1471-2105-14-360.

Abstract

Background: Significance analysis plays a major role in identifying and ranking genes, transcription factor binding sites, DNA methylation regions, and other high-throughput features associated with illness. We propose a new approach, called gene set bagging, for measuring the probability that a gene set replicates in future studies. Gene set bagging involves resampling the original high-throughput data, performing gene-set analysis on the resampled data, and confirming that biological categories replicate in the bagged samples.

Results: Using both simulated and publicly-available genomics data, we demonstrate that significant categories in a gene set enrichment analysis may be unstable when subjected to resampling. We show our method estimates the replication probability (R), the probability that a gene set will replicate as a significant result in future studies, and show in simulations that this method reflects replication better than each set's p-value.

Conclusions: Our results suggest that gene lists based on p-values are not necessarily stable, and therefore additional steps like gene set bagging may improve biological inference on gene sets.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Binding Sites / genetics
  • Brain Chemistry / genetics
  • Computer Simulation
  • DNA Methylation / genetics*
  • DNA Replication / genetics*
  • Databases, Factual
  • Gene Expression Profiling / methods
  • Genome, Human
  • Genomics / methods*
  • Genomics / trends
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods
  • Predictive Value of Tests
  • Probability
  • Sample Size
  • Smoking / genetics
  • Transcription Factors / genetics
  • Transcription Factors / metabolism

Substances

  • Transcription Factors

Associated data

  • GEO/GSE15745