RCRdiff: A fully integrated Bayesian method for differential expression analysis using raw NanoString nCounter data

Stat Med. 2022 Feb 20;41(4):665-680. doi: 10.1002/sim.9250. Epub 2021 Nov 12.

Abstract

The medium-throughput mRNA abundance platform NanoString nCounter has gained great popularity in the past decade, due to its high sensitivity and technical reproducibility as well as remarkable applicability to ubiquitous formalin fixed paraffin embedded (FFPE) tissue samples. Based on RCRnorm developed for normalizing NanoString nCounter data and Bayesian LASSO for variable selection, we propose a fully integrated Bayesian method, called RCRdiff, to detect differentially expressed (DE) genes between different groups of tissue samples (eg, normal and cancer). Unlike existing methods that often require normalization performed beforehand, RCRdiff directly handles raw read counts and jointly models the behaviors of different types of internal controls along with DE and non-DE gene patterns. Doing so would avoid efficiency loss caused by ignoring estimation uncertainty from the normalization step in a sequential approach and thus can offer more reliable statistical inference. We also propose clustering-based strategies for DE gene selection, which do not require any external dataset and are free of any arbitrary cutoff. Empirical evidence of the attractiveness of RCRdiff is demonstrated via extensive simulation and data examples.

Keywords: Bayesian LASSO; FFPE; gene expression; gene selection; normalization; random-coefficient regression.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Bayes Theorem
  • Gene Expression Profiling* / methods
  • Humans
  • Reproducibility of Results