A latent variable approach for meta-analysis of gene expression data from multiple microarray experiments

BMC Bioinformatics. 2007 Sep 27;8:364. doi: 10.1186/1471-2105-8-364.


Background: With the explosion in data generated using microarray technology by different investigators working on similar experiments, it is of interest to combine results across multiple studies.

Results: In this article, we describe a general probabilistic framework for combining high-throughput genomic data from several related microarray experiments using mixture models. A key feature of the model is the use of latent variables that represent quantities that can be combined across diverse platforms. We consider two methods for estimation of an index termed the probability of expression (POE). The first, reported in previous work by the authors, involves Markov Chain Monte Carlo (MCMC) techniques. The second method is a faster algorithm based on the expectation-maximization (EM) algorithm. The methods are illustrated with application to a meta-analysis of datasets for metastatic cancer.

Conclusion: The statistical methods described in the paper are available as an R package, metaArray 1.8.1, which is at Bioconductor, whose URL is http://www.bioconductor.org/.

Publication types

  • Evaluation Study
  • Meta-Analysis
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Biomarkers, Tumor / analysis*
  • Carcinoma / diagnosis
  • Carcinoma / metabolism*
  • Carcinoma / secondary*
  • Data Interpretation, Statistical
  • Gene Expression Profiling / methods*
  • Humans
  • Markov Chains
  • Neoplasm Proteins / analysis*
  • Oligonucleotide Array Sequence Analysis / methods*


  • Biomarkers, Tumor
  • Neoplasm Proteins