DeepSimulator1.5: a more powerful, quicker and lighter simulator for Nanopore sequencing

Bioinformatics. 2020 Apr 15;36(8):2578-2580. doi: 10.1093/bioinformatics/btz963.

Abstract

Motivation: Nanopore sequencing is one of the leading third-generation sequencing technologies. A number of computational tools have been developed to facilitate the processing and analysis of the Nanopore data. Previously, we have developed DeepSimulator1.0 (DS1.0), which is the first simulator for Nanopore sequencing to produce both the raw electrical signals and the reads. However, although DS1.0 can produce high-quality reads, for some sequences, the divergence between the simulated raw signals and the real signals can be large. Furthermore, the Nanopore sequencing technology has evolved greatly since DS1.0 was released. It is thus necessary to update DS1.0 to accommodate those changes.

Results: We propose DeepSimulator1.5 (DS1.5), all three modules of which have been updated substantially from DS1.0. As for the sequence generator, we updated the sample read length distribution to reflect the newest real reads' features. In terms of the signal generator, which is the core of DeepSimulator, we added one more pore model, the context-independent pore model, which is much faster than the previous context-dependent one. Furthermore, to make the generated signals more similar to the real ones, we added a low-pass filter to post-process the pore model signals. Regarding the basecaller, we added the support for the newest official basecaller, Guppy, which can support both GPU and CPU. In addition, multiple optimizations, related to multiprocessing control, memory and storage management, have been implemented to make DS1.5 a much more amenable and lighter simulator than DS1.0.

Availability and implementation: The main program and the data are available at https://github.com/lykaust15/DeepSimulator.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • High-Throughput Nucleotide Sequencing*
  • Nanopore Sequencing
  • Nanopores*
  • Sequence Analysis, DNA
  • Software