CholeraSeq: a comprehensive genomic pipeline for cholera surveillance and near real-time outbreak investigation

Bioinformatics. 2026 Jan 2;42(1):btaf665. doi: 10.1093/bioinformatics/btaf665.

Abstract

Summary: Next Generation Sequencing is widely deployed in cholera-endemic regions, yet an end-to-end reproducible pipeline that unifies read QC, filtering, reference mapping, variant calling/annotation, recombination screening, and extraction of parsimony informative sites/variant codons, phylogenetic inference for downstream phylodynamic and epidemiological analyses have been lacking, slowing outbreak investigation and public health response. CholeraSeq is a high-throughput genomics pipeline for cholera genomic surveillance. It ingests consensus genomes, short read sequence data, draft assemblies, and scales seamlessly from local to cloud environments. To accelerate epidemiological context placement of new outbreak strains, we provide a curated ready-to-use core genome alignment compiled from public data, enabling flexible, fast, integration of new samples for outbreak investigations.

Availability and implementation: CholeraSeq is freely available on the GitHub platform https://github.com/CERI-KRISP/CholeraSeq. CholeraSeq is implemented in Nextflow with a modular design building upon the nf-core community standards.

MeSH terms

  • Cholera* / epidemiology
  • Cholera* / genetics
  • Disease Outbreaks*
  • Genome, Bacterial*
  • Genomics* / methods
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Phylogeny
  • Software*
  • Vibrio cholerae* / genetics