Adding confidence to gene expression clustering

Genetics. 2005 Aug;170(4):2003-11. doi: 10.1534/genetics.104.031500. Epub 2005 Jun 8.


It has been well established that gene expression data contain large amounts of random variation that affects both the analysis and the results of microarray experiments. Typically, microarray data are either tested for differential expression between conditions or grouped on the basis of profiles that are assessed temporally or across genetic or environmental conditions. While testing differential expression relies on levels of certainty to evaluate the relative worth of various analyses, cluster analysis is exploratory in nature and has not had the benefit of any judgment of statistical inference. By using a novel dissimilarity function to ascertain gene expression clusters and conditional randomization of the data space to illuminate distinctions between statistically significant clusters of gene expression patterns, we aim to provide a level of confidence to inferred clusters of gene expression data. We apply both permutation and convex hull approaches for randomization of the data space and show that both methods can provide an effective assessment of gene expression profiles whose coregulation is statistically different from that expected by random chance alone.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Cluster Analysis*
  • Computer Simulation
  • Gene Expression Profiling
  • Gene Expression*
  • Genetic Linkage
  • Genetic Variation
  • Oligonucleotide Array Sequence Analysis