Effect of normalization on significance testing for oligonucleotide microarrays

J Biopharm Stat. 2004 Aug;14(3):575-89. doi: 10.1081/BIP-200025650.

Abstract

Motivation: Normalization techniques are used to reduce variation among gene expression measurements in oligonucleotide microarrays in an effort to improve the quality of the data and the power of significance tests for detecting differential expression. Of several such proposed methods, two that have commonly been employed include median-interquartile range normalization and quantile normalization. The median-IQR method applied directly to fold-changes for paired data also was considered. Two methods for calculating gene expression values include the MAS 5.0 algorithm [Affymetrix. (2002). Statistical Algorithms Description Document. Santa Clara, CA: Affymetrix, Inc. http://www.affymetrix.com/support/technical/whitepapers/sadd-whitepaper.pdf] and the RMA method [Irizarry, R. A., Bolstad, B. M., Collin, F., Cope, L. M., Hobbs, B., Speed, T. P. (2003a). Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31(4,e15); Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U., Speed, T. P. (2003b). Exploration, normalization, and summaries of high density oligonucleotide array probe-level data. Biostatistics 4(2):249-264; Irizarry, R. A., Gautier, L., Cope, L. (2003c). An R package for analysis of Affymetrix oligonucleotide arrays. In: Parmigiani, R. I. G., Garrett, E. S., Ziegler, S., eds. The Analysis of Gene Expression Data: Methods and Software. Berlin: Springer, pp. 102-119].

Results: In considering these methods applied to a prostate cancer data set derived from paired samples on normal and tumor tissue, it is shown that normalization methods may lead to substantial inflation of the number of genes identified by paired-t significance tests even after adjustment for multiple testing. This is shown to be due primarily to an unintended effect that normalization has on the experimental error variance. The impact appears to be greater in the RMA method compared to the MAS 5.0 algorithm and for quantile normalization compared to median-IQR normalization.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Analysis of Variance
  • Data Interpretation, Statistical
  • Gene Expression
  • Humans
  • Male
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data*
  • Oligonucleotides / genetics*
  • Prostatic Neoplasms / genetics
  • Software

Substances

  • Oligonucleotides