A benchmark study of simulation methods for single-cell RNA sequencing data

Nat Commun. 2021 Nov 25;12(1):6911. doi: 10.1038/s41467-021-27130-w.


Single-cell RNA-seq (scRNA-seq) data simulation is critical for evaluating computational methods for analysing scRNA-seq data especially when ground truth is experimentally unattainable. The reliability of evaluation depends on the ability of simulation methods to capture properties of experimental data. However, while many scRNA-seq data simulation methods have been proposed, a systematic evaluation of these methods is lacking. We develop a comprehensive evaluation framework, SimBench, including a kernel density estimation measure to benchmark 12 simulation methods through 35 scRNA-seq experimental datasets. We evaluate the simulation methods on a panel of data properties, ability to maintain biological signals, scalability and applicability. Our benchmark uncovers performance differences among the methods and highlights the varying difficulties in simulating data characteristics. Furthermore, we identify several limitations including maintaining heterogeneity of distribution. These results, together with the framework and datasets made publicly available as R packages, will guide simulation methods selection and their future development.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Benchmarking / methods*
  • Computer Simulation
  • Data Analysis
  • Models, Statistical
  • Reproducibility of Results
  • Research Design
  • Sequence Analysis, RNA / methods*
  • Single-Cell Analysis / methods*
  • Spatial Analysis