Motivation: The use of DNA microarrays has become quite popular in many scientific and medical disciplines, such as in cancer research. One common goal of these studies is to determine which genes are differentially expressed between cancer and healthy tissue, or more generally, between two experimental conditions. A major complication in the molecular profiling of tumors using gene expression data is that the data represent a combination of tumor and normal cells. Much of the methodology developed for assessing differential expression with microarray data has assumed that tissue samples are homogeneous.
Results: In this paper, we outline a general framework for determining differential expression in the presence of mixed cell populations. We consider study designs in which paired tissues and unpaired tissues are available. A hierarchical mixture model is used for modeling the data; a combination of methods of moments procedures and the expectation-maximization algorithm are used to estimate the model parameters. The finite-sample properties of the methods are assessed in simulation studies; they are applied to two microarray datasets from cancer studies. Commands in the R language can be downloaded from the URL http://www.sph.umich.edu/~ghoshd/COMPBIO/COMPMIX/.