simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data

Bioinformatics. 2023 Aug 1;39(8):btad453. doi: 10.1093/bioinformatics/btad453.

Abstract

Motivation: Single-cell chromatin accessibility sequencing (scCAS) technology provides an epigenomic perspective to characterize gene regulatory mechanisms at single-cell resolution. With an increasing number of computational methods proposed for analyzing scCAS data, a powerful simulation framework is desirable for evaluation and validation of these methods. However, existing simulators generate synthetic data by sampling reads from real data or mimicking existing cell states, which is inadequate to provide credible ground-truth labels for method evaluation.

Results: We present simCAS, an embedding-based simulator, for generating high-fidelity scCAS data from both cell- and peak-wise embeddings. We demonstrate simCAS outperforms existing simulators in resembling real data and show that simCAS can generate cells of different states with user-defined cell populations and differentiation trajectories. Additionally, simCAS can simulate data from different batches and encode user-specified interactions of chromatin regions in the synthetic data, which provides ground-truth labels more than cell states. We systematically demonstrate that simCAS facilitates the benchmarking of four core tasks in downstream analysis: cell clustering, trajectory inference, data integration, and cis-regulatory interaction inference. We anticipate simCAS will be a reliable and flexible simulator for evaluating the ongoing computational methods applied on scCAS data.

Availability and implementation: simCAS is freely available at https://github.com/Chen-Li-17/simCAS.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromatin*
  • Computer Simulation
  • Gene Expression Regulation*
  • High-Throughput Nucleotide Sequencing / methods
  • Sequence Analysis, DNA / methods
  • Single-Cell Analysis / methods

Substances

  • Chromatin