Fast and Accurate Classification of Meta-Genomics Long Reads With deSAMBA

Front Cell Dev Biol. 2021 Apr 28:9:643645. doi: 10.3389/fcell.2021.643645. eCollection 2021.

Abstract

There is still a lack of fast and accurate classification tools to identify the taxonomies of noisy long reads, which is a bottleneck to the use of the promising long-read metagenomic sequencing technologies. Herein, we propose de Bruijn graph-based Sparse Approximate Match Block Analyzer (deSAMBA), a tailored long-read classification approach that uses a novel pseudo alignment algorithm based on sparse approximate match block (SAMB). Benchmarks on real sequencing datasets demonstrate that deSAMBA enables to achieve high yields and fast speed simultaneously, which outperforms state-of-the-art tools and has many potentials to cutting-edge metagenomics studies.

Keywords: de Bruijn graph-based index; long read; metagenomics 16S; pseudo alignment; read classification.