Statistical analysis of differential gene expression relative to a fold change threshold on NanoString data of mouse odorant receptor genes

BMC Bioinformatics. 2014 Feb 4:15:39. doi: 10.1186/1471-2105-15-39.

Abstract

Background: A challenge in gene expression studies is the reliable identification of differentially expressed genes. In many high-throughput studies, genes are accepted as differentially expressed only if they satisfy simultaneously a p value criterion and a fold change criterion. A statistical method, TREAT, has been developed for microarray data to assess formally if fold changes are significantly higher than a predefined threshold. We have recently applied the NanoString digital platform to study expression of mouse odorant receptor genes, which form with 1,200 members the largest gene family in the mouse genome. Our objectives are, on these data, to decrease false discoveries when formally assessing the genes relative to a fold change threshold, and to provide a guided selection in the choice of this threshold.

Results: Statistical tests have been developed for microarray data to identify genes that are differentially expressed relative to a fold change threshold. Here we report that another approach, which we refer to as tTREAT, is more appropriate for our NanoString data, where false discoveries lead to costly and time-consuming follow-up experiments. Methods that we refer to as tTREAT2 and the running fold change model improve the performance of the statistical tests by protecting or selecting the fold change threshold more objectively. We show the benefits on simulated and real data.

Conclusions: Gene-wise statistical analyses of gene expression data, for which the significance relative to a fold change threshold is important, give reproducible and reliable results on NanoString data of mouse odorant receptor genes. Because it can be difficult to set in advance a fold change threshold that is meaningful for the available data, we developed methods that enable a better choice (thus reducing false discoveries and/or missed genes) or avoid this choice altogether. This set of tools may be useful for the analysis of other types of gene expression data.

MeSH terms

  • Animals
  • Computational Biology / methods*
  • Gene Expression Profiling / methods*
  • Mice
  • Models, Statistical
  • Oligonucleotide Array Sequence Analysis / methods*
  • Receptors, Odorant / analysis
  • Receptors, Odorant / genetics*
  • Receptors, Odorant / metabolism

Substances

  • Receptors, Odorant