Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jun 8;3(6):e000116.
doi: 10.1099/mgen.0.000116. eCollection 2017 Jun 30.

SNVPhyl: A Single Nucleotide Variant Phylogenomics Pipeline for Microbial Genomic Epidemiology

Affiliations
Free PMC article

SNVPhyl: A Single Nucleotide Variant Phylogenomics Pipeline for Microbial Genomic Epidemiology

Aaron Petkau et al. Microb Genom. .
Free PMC article

Abstract

The recent widespread application of whole-genome sequencing (WGS) for microbial disease investigations has spurred the development of new bioinformatics tools, including a notable proliferation of phylogenomics pipelines designed for infectious disease surveillance and outbreak investigation. Transitioning the use of WGS data out of the research laboratory and into the front lines of surveillance and outbreak response requires user-friendly, reproducible and scalable pipelines that have been well validated. Single Nucleotide Variant Phylogenomics (SNVPhyl) is a bioinformatics pipeline for identifying high-quality single-nucleotide variants (SNVs) and constructing a whole-genome phylogeny from a collection of WGS reads and a reference genome. Individual pipeline components are integrated into the Galaxy bioinformatics framework, enabling data analysis in a user-friendly, reproducible and scalable environment. We show that SNVPhyl can detect SNVs with high sensitivity and specificity, and identify and remove regions of high SNV density (indicative of recombination). SNVPhyl is able to correctly distinguish outbreak from non-outbreak isolates across a range of variant-calling settings, sequencing-coverage thresholds or in the presence of contamination. SNVPhyl is available as a Galaxy workflow, Docker and virtual machine images, and a Unix-based command-line application. SNVPhyl is released under the Apache 2.0 license and available at http://snvphyl.readthedocs.io/ or at https://github.com/phac-nml/snvphyl-galaxy.

Keywords: bacterial genomics; bioinformatics; genomic epidemiology; infectious disease surveillance; phylogenomics; single nucleotide variation detection.

Figures

Fig. 1.
Fig. 1.
(a) Overview of the SNVPhyl pipeline. Input to the pipeline is provided as a reference genome, a set of sequence reads for each isolate and an optional list of positions to mask from the final results. Repeat regions are identified on the reference genome and reference mapping followed by variant calling is performed on the sequence reads. The resulting files are compiled together to construct a SNV alignment and list of identified SNVs, which are further processed to construct a SNV distance matrix, maximum-likelihood phylogeny and a summary of the identified SNVs. Individual software or scripts are given in the parenthesis below each stage. (b) An overview of the Mapping/Variant Calling stage of SNVPhyl. Variants are called using two separate software packages and compiled together in the Variant Consolidation stage. As output, a list of the validated variant calls, regions with high-density SNVs, as well as quality information on the mean mapping coverage, are produced and sent to further stages.

Similar articles

See all similar articles

Cited by 30 articles

See all "Cited by" articles

References

    1. Hendriksen RS, Price LB, Schupp JM, Gillece JD, Kaas RS, et al. Population genetics of Vibrio cholerae from Nepal in 2010: evidence on the origin of the Haitian outbreak. MBio. 2011;2:e00157-11 doi: 10.1128/mBio.00157-11. - DOI - PMC - PubMed
    1. Katz LS, Petkau A, Beaulaurier J, Tyler S, Antonova ES, et al. Evolutionary dynamics of Vibrio cholerae O1 following a single-source introduction to Haiti. MBio. 2013;4:e00398-13 doi: 10.1128/mBio.00398-13. - DOI - PMC - PubMed
    1. Frerichs RR, Keim PS, Barrais R, Piarroux R. Nepalese origin of cholera epidemic in Haiti. Clin Microbiol Infect. 2012;18:E158 doi: 10.1111/j.1469-0691.2012.03841.x. - DOI - PubMed
    1. Gardy JL, Johnston JC, Ho Sui SJ, Cook VJ, Shah L, et al. Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. N Engl J Med. 2011;364:730–739. doi: 10.1056/NEJMoa1003176. - DOI - PubMed
    1. Roetzer A, Diel R, Kohl TA, Rückert C, Nübel U, et al. Whole genome sequencing versus traditional genotyping for investigation of a Mycobacterium tuberculosis outbreak: a longitudinal molecular epidemiological study. PLoS Med. 2013;10:e1001387 doi: 10.1371/journal.pmed.1001387. - DOI - PMC - PubMed

Publication types

LinkOut - more resources

Feedback