Microarray test results should not be compensated for multiplicity of gene contents

BMC Syst Biol. 2011;5 Suppl 2(Suppl 2):S6. doi: 10.1186/1752-0509-5-S2-S6. Epub 2011 Dec 14.

Abstract

Background: Microarray technology has enabled the measurement of comprehensive transcriptomic information. However, each data entry may reflect trivial individual differences among samples and also contain technical noise. Therefore, the certainty of each observed difference should be confirmed at earlier steps of the analyses, and statistical tests are frequently used for this purpose. Since microarrays analyze a huge number of genes simultaneously, concerns of multiplicity, i.e. the family wise error rate (FWER) and false discovery rate (FDR), have been raised in testing the data. To deal with these concerns, several compensation methodologies have been proposed, making the tests very conservative to the extent that arbitrary tuning of the threshold has been introduced to relax the conditions. Unexpectedly, however, the appropriateness of the test methodologies, the concerns of multiplicity, and the compensation methodologies have not been sufficiently confirmed.

Results: The appropriateness was checked by means of coincidence between the methodologies' premises and the statistical characteristics of data found in two typical microarray platforms. As expected, normality was observed in within-group data differences, supporting application of t-test and F-test statistics. However, genes displayed their own tendencies in the magnitude of variations, and the distributions of p-values were rather complex. These characteristics are inconsistent with premises underlying the compensation methodologies, which assume that most of the null hypotheses are true. The evidence also raised concerns about multiplicity. In transcriptomic studies, FWER should not be critical, as analyses at higher levels would not be influenced by a few false positives. Additionally, the concerns for FDR are not suitable for the sharp null hypotheses on expression levels.

Conclusions: Therefore, although compensation methods have been recommended to deal with the problem of multiplicity, the compensations are actually inappropriate for transcriptome analyses. Compensations are not only unnecessary, but will increase the occurrence of false negative errors, and arbitrary adjustment of the threshold damages the objectivity of the tests. Rather, the results of parametric tests should be evaluated directly.

MeSH terms

  • Computer Simulation
  • Databases, Genetic
  • Gene Expression Profiling / methods*
  • Genes*
  • Humans
  • Microarray Analysis / methods*
  • Models, Statistical
  • Reproducibility of Results