Background: 'Fold-change' cutoffs have been widely used in microarray assays to identify genes that are differentially expressed between query and reference samples. More accurate measures of differential expression and effective data-normalization strategies are required to identify high-confidence sets of genes with biologically meaningful changes in transcription. Further, the analysis of a large number of expression profiles is facilitated by a common reference sample, the construction of which must be carefully addressed.
Results: We carried out a series of 'self-self' hybridizations in which aliquots of the same RNA sample were labeled separately with Cy3 and Cy5 fluorescent dyes and co-hybridized to the same microarray. From this, we can analyze the intensity-dependent behavior of microarray data, define a statistically significant measure of differential expression that exploits the structure of the fluorescent signals, and measure the inherent reproducibility of the technique. We also devised a simple procedure for identifying and eliminating low-quality data for replicates within and between slides. We examine the properties required of a universal reference RNA sample and show how pooling a small number of samples with a diverse representation of expressed genes can outperform more complex mixtures as a reference sample.
Conclusion: Analysis of cell-line samples can identify systematic structure in measured gene-expression levels. A general procedure for analyzing cDNA microarray data is proposed and validated. We show that pooled reference samples should be based not only on the expression of individual genes in each cell line but also on the expression levels of genes within cell lines.