Mixture models for assessing differential expression in complex tissues using microarray data

Bioinformatics. 2004 Jul 22;20(11):1663-9. doi: 10.1093/bioinformatics/bth139. Epub 2004 Feb 26.

Abstract

Motivation: The use of DNA microarrays has become quite popular in many scientific and medical disciplines, such as in cancer research. One common goal of these studies is to determine which genes are differentially expressed between cancer and healthy tissue, or more generally, between two experimental conditions. A major complication in the molecular profiling of tumors using gene expression data is that the data represent a combination of tumor and normal cells. Much of the methodology developed for assessing differential expression with microarray data has assumed that tissue samples are homogeneous.

Results: In this paper, we outline a general framework for determining differential expression in the presence of mixed cell populations. We consider study designs in which paired tissues and unpaired tissues are available. A hierarchical mixture model is used for modeling the data; a combination of methods of moments procedures and the expectation-maximization algorithm are used to estimate the model parameters. The finite-sample properties of the methods are assessed in simulation studies; they are applied to two microarray datasets from cancer studies. Commands in the R language can be downloaded from the URL http://www.sph.umich.edu/~ghoshd/COMPBIO/COMPMIX/.

Publication types

  • Comparative Study
  • Evaluation Study
  • Validation Study

MeSH terms

  • Algorithms*
  • Coculture Techniques / methods*
  • Colonic Neoplasms / classification*
  • Colonic Neoplasms / diagnosis
  • Colonic Neoplasms / genetics*
  • Culture Techniques / methods
  • Diagnosis, Computer-Assisted
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Neoplastic / genetics
  • Genetic Testing / methods
  • Humans
  • Models, Genetic*
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis / methods*
  • Reproducibility of Results
  • Sensitivity and Specificity