Synggen: fast and data-driven generation of synthetic heterogeneous NGS cancer data

Bioinformatics. 2023 Jan 1;39(1):btac792. doi: 10.1093/bioinformatics/btac792.


Summary: Whole-exome and targeted sequencing are widely utilized both in translational cancer genomics and in the setting of precision medicine. The benchmarking of computational methods and tools that are in continuous development is fundamental for the correct interpretation of somatic genomic profiling results. To this aim we developed synggen, a tool for the fast generation of large-scale realistic and heterogeneous cancer whole-exome and targeted sequencing synthetic datasets, which enables the incorporation of phased germline single nucleotide polymorphisms and complex allele-specific somatic genomic events. Synggen performances and effectiveness in generating synthetic cancer data are shown across different scenarios and considering different platforms with distinct characteristics.

Availability and implementation: synggen is freely available at

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Exome
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing* / methods
  • Humans
  • Neoplasms* / genetics
  • Software