Swarm: robust and fast clustering method for amplicon-based studies

PeerJ. 2014 Sep 25;2:e593. doi: 10.7717/peerj.593. eCollection 2014.

Abstract

Popular de novo amplicon clustering methods suffer from two fundamental flaws: arbitrary global clustering thresholds, and input-order dependency induced by centroid selection. Swarm was developed to address these issues by first clustering nearly identical amplicons iteratively using a local threshold, and then by using clusters' internal structure and amplicon abundances to refine its results. This fast, scalable, and input-order independent approach reduces the influence of clustering parameters and produces robust operational taxonomic units.

Keywords: Barcoding; Environmental diversity; Molecular operational taxonomic units.

Grant support

FM and CdeV were supported by the EU EraNet BiodivErsA program BioMarKs (grant #2008-6530) and the French government “Investissements d’Avenir” project OCEANOMICS (ANR-11-BTBR-0008) and the EU FP7 program MicroB3 (contract number 287589). FM and MD were supported by the Deutsche Forschungsgemeinschaft (grant #DU1319/1-1). TR was supported by a Centre of Excellence grant from the Research Council of Norway to CMBN. CQ is funded by an EPSRC Career Acceleration Fellowship – EP/H003851/1. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.