Quality control of Platinum Spike dataset by probe-level mixed models

Math Biosci. 2014 Feb:248:1-10. doi: 10.1016/j.mbs.2013.11.004. Epub 2013 Dec 1.

Abstract

Benchmark datasets are important for the validation and optimization of the analysis routes. Lately, a new benchmark dataset, 'Platinum Spike', for the Affymetrix GeneChip experiments has been introduced. We performed a quality check of the Platinum Spike dataset by using probe-level linear mixed models. The results have shown that there are 'empty' probe sets detecting transcripts, spiked in at different concentrations, and, reversely, there are probe sets that do not detect transcripts, spiked in at different concentrations, even though they were designed to do so. We proposed a formal inference procedure for testing the assumption of independence of all technical replicates in the data and concluded that for almost 10% of probe sets arrays cannot be treated independently, which has strong implications for the normalization procedures and testing for the differential expression. The proposed diagnostics procedure is used to facilitate a thorough exploration of gene expression Affymetrix data beyond the preprocessing and differential expression analysis.

Keywords: Benchmark data; Gene expression analysis; Linear mixed models; Probe level data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Benchmarking / standards
  • Benchmarking / statistics & numerical data
  • Biostatistics
  • Databases, Genetic / standards
  • Databases, Genetic / statistics & numerical data*
  • Gene Expression Profiling / standards
  • Gene Expression Profiling / statistics & numerical data*
  • Linear Models
  • Mathematical Concepts
  • Oligonucleotide Array Sequence Analysis / standards
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data*
  • Quality Control