SPUMONI 2: improved classification using a pangenome index of minimizer digests

Genome Biol. 2023 May 18;24(1):122. doi: 10.1186/s13059-023-02958-1.

Abstract

Genomics analyses use large reference sequence collections, like pangenomes or taxonomic databases. SPUMONI 2 is an efficient tool for sequence classification of both short and long reads. It performs multi-class classification using a novel sampled document array. By incorporating minimizers, SPUMONI 2's index is 65 times smaller than minimap2's for a mock community pangenome. SPUMONI 2 achieves a speed improvement of 3-fold compared to SPUMONI and 15-fold compared to minimap2. We show SPUMONI 2 achieves an advantageous mix of accuracy and efficiency in practical scenarios such as adaptive sampling, contamination detection and multi-class metagenomics classification.

Keywords: Classification; Indexing; Minimizer; Pangenome.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Databases, Factual
  • Genomics*
  • Metagenomics
  • Sequence Analysis, DNA