Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data

PLoS One. 2020 Apr 30;15(4):e0232271. doi: 10.1371/journal.pone.0232271. eCollection 2020.

Abstract

Benchmarking RNA-seq differential expression analysis methods using spike-in and simulated RNA-seq data has often yielded inconsistent results. The spike-in data, which were generated from the same bulk RNA sample, only represent technical variability, making the test results less reliable. We compared the performance of 12 differential expression analysis methods for RNA-seq data, including recent variants in widely used software packages, using both RNA spike-in and simulation data for negative binomial (NB) model. Performance of edgeR, DESeq2, and ROTS was particularly different between the two benchmark tests. Then, each method was tested under most extensive simulation conditions especially demonstrating the large impacts of proportion, dispersion, and balance of differentially expressed (DE) genes. DESeq2, a robust version of edgeR (edgeR.rb), voom with TMM normalization (voom.tmm) and sample weights (voom.sw) showed an overall good performance regardless of presence of outliers and proportion of DE genes. The performance of RNA-seq DE gene analysis methods substantially depended on the benchmark used. Based on the simulation results, suitable methods were suggested under various test conditions.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Benchmarking / methods
  • Computer Simulation
  • Gene Expression Profiling / methods*
  • Humans
  • RNA / genetics*
  • RNA-Seq / methods*
  • Sequence Analysis, RNA / methods
  • Software

Substances

  • RNA

Grants and funding

National Research Foundation (NRF) of Korea, Genomics Program [2016M3C9A3945893]; The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.