Benchmark datasets are important for the validation and optimization of the analysis routes. Lately, a new benchmark dataset, 'Platinum Spike', for the Affymetrix GeneChip experiments has been introduced. We performed a quality check of the Platinum Spike dataset by using probe-level linear mixed models. The results have shown that there are 'empty' probe sets detecting transcripts, spiked in at different concentrations, and, reversely, there are probe sets that do not detect transcripts, spiked in at different concentrations, even though they were designed to do so. We proposed a formal inference procedure for testing the assumption of independence of all technical replicates in the data and concluded that for almost 10% of probe sets arrays cannot be treated independently, which has strong implications for the normalization procedures and testing for the differential expression. The proposed diagnostics procedure is used to facilitate a thorough exploration of gene expression Affymetrix data beyond the preprocessing and differential expression analysis.
Keywords: Benchmark data; Gene expression analysis; Linear mixed models; Probe level data.
Copyright © 2013 Elsevier Inc. All rights reserved.