Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2020 Feb 18;86(5):e02265-19.
doi: 10.1128/AEM.02265-19. Print 2020 Feb 18.

Performance and Accuracy of Four Open-Source Tools for In Silico Serotyping of Salmonella spp. Based on Whole-Genome Short-Read Sequencing Data

Affiliations
Comparative Study

Performance and Accuracy of Four Open-Source Tools for In Silico Serotyping of Salmonella spp. Based on Whole-Genome Short-Read Sequencing Data

Laura Uelze et al. Appl Environ Microbiol. .

Abstract

We compared the performance of four open-source in silico Salmonella typing tools (SeqSero, SeqSero2, Salmonella In Silico Typing Resource [SISTR], and Metric Oriented Sequence Typer [MOST]) to assess their potential for replacing laboratory serological testing with serovar predictions from whole-genome sequencing data. We conducted a retrospective analysis of 1,624 Salmonella isolates of 72 serovars submitted to the German National Salmonella Reference Laboratory between 1999 and 2019. All isolates are derived from animal and foodstuff origins. We conducted Illumina short-read sequencing and compared the in silico serovar prediction results with the results of routine laboratory serotyping. We found the best-performing in silico serovar prediction tool to be SISTR, with 94% correctly typed isolates, followed by SeqSero2 (87%), SeqSero (81%), and MOST (79%). Furthermore, we found that mapping-based tools like SeqSero and SeqSero2 (allele mode) were more reliable for the prediction of monophasic variants, while sequence type and cluster-based methods like MOST and SISTR (core-genome multilocus sequence type [cgMLST]), showed greater resilience when confronted with GC-biased sequencing data. We showed that the choice of library preparation kit could substantially affect O antigen detection, due to the low GC content of the wzx and wzy genes. Although the accuracy of computational serovar predictions is still not quite on par with traditional serotyping by Salmonella reference laboratories, the command-line tools investigated in this study perform a rapid, efficient, inexpensive, and reproducible analysis, which can be integrated into in-house characterization pipelines. Based on our results, we find SISTR most suitable for automated, routine serotyping for public health surveillance of SalmonellaIMPORTANCESalmonella spp. are important foodborne pathogens. To reduce the number of infected patients, it is essential to understand which subtypes of the bacteria cause disease outbreaks. Traditionally, characterization of Salmonella requires serological testing, a laboratory method by which Salmonella isolates can be classified into over 2,600 distinct subtypes, called serovars. Due to recent advances in whole-genome sequencing, many tools have been developed to replace traditional testing methods with computational analysis of genome sequences. It is crucial to validate that these tools, many already in use for routine surveillance, deliver accurate and reliable serovar information. In this study, we set out to compare which of the currently available open-source command-line tools is most suitable to replace serological testing. A thorough evaluation of the differing computational approaches is highly important to ensure the backward compatibility of serotyping data and to maintain comparability between laboratories.

Keywords: O antigen; Salmonella; serotyping; serovar prediction; whole-genome sequencing.

PubMed Disclaimer

Figures

FIG 1
FIG 1
Graphical representation of the in silico typing results by tool. In silico serotyping results were compared to laboratory serological testing and categorized as full, inconclusive, incongruent, or incorrect matches in keeping with the methodology of Yachison and colleagues (15). The stacked-bar chart shows the results summarized by tool and/or subresult. The percentages of correct results per tool are shown.
FIG 2
FIG 2
Quality parameters of reads mapped against the O antigen sequence. Trimmed reads of all 1,624 isolates were mapped with SRST2 (27) against the O antigen sequence database of SISTR (containing both the wzx and the wzy gene sequences). Only the best-scoring match was considered for each isolate. Quality parameters (number of reads mapped and percent coverage) for all respective best-scoring matches were extracted, statistically evaluated, and visualized in box plots. Results are divided into four categories (A to D) depending on whether the in silico serotyping tools could successfully determine the O antigen from the sequencing data and which library kit was used; the fill colors indicate the library kit with which the respective isolates were sequenced (Flex kit, Nextera DNA Flex library preparation kit; XT kit, Nextera XT DNA library preparation kit [both Illumina]). The number of isolates per category is given in the key.
FIG 3
FIG 3
(a) Correlation between GC content and read depth across the wzy locus. The colored line graphs (left y axis) display the read depths of four serovar Infantis isolates mapped against a reference genome (strain NCTC6703; NCBI accession number NZ_LS483479.1). The gray dashed line (right y axis) displays the GC content graph of the reference genome (the GC content was calculated with a perl script available from https://github.com/DamienFr/GC-content-in-sliding-window- with a step size of 1 nucleotide). The position of the wzy gene is highlighted with a gray box and was determined through BLAST of the reference genome against the SeqSero2 O antigen database (best match, O-7_wzy_1080: 99.8% identity, 1,080 bp in length, 100% coverage, 2 mismatches). (b) The graph shows the normalized observed/expected read counts per 300 bp across the whole genome. The GC bias was calculated using Benjamini’s method (17) with help from the computeGCBias function of the deepTools package (18). The function counts the number of reads per GC fraction and compares them to the expected GC profile, calculated by counting the number of DNA fragments per GC fraction in a reference genome. In an ideal experiment, the observed GC profile would match the expected profile, producing a flat line at 0. The fluctuations on the ends of the x axis are due to the fact that only a small number of genome regions have extreme GC fractions, so that the number of fragments that are picked up in the random sampling can vary. The library kits with which the respective isolates were sequenced are indicated by line colors in both figures (blue, Nextera XT DNA library preparation kit; red, Nextera DNA Flex library preparation kit [both Illumina]).
FIG 4
FIG 4
Graphical representation of the composition of the sample set by serotypes. The pie chart shows the composition of the analyzed sample set of 1,624 isolates grouped by serotypes as determined through serological testing. Isolates that could not be typed in the laboratory are listed in the category “rough/nonmotile.”

Comment in

Similar articles

Cited by

References

    1. De Cesare A. 2018. Salmonella in foods: a re-emerging problem. Adv Food Nutr Res 86:137–179. doi:10.1016/bs.afnr.2018.02.007. - DOI - PubMed
    1. Guibourdenche M, Roggentin P, Mikoleit M, Fields PI, Bockemühl J, Grimont PAD, Weill F-X. 2010. Supplement 2003–2007 (no. 47) to the White-Kauffmann-Le Minor scheme. Res Microbiol 161:26–29. doi:10.1016/j.resmic.2009.10.002. - DOI - PubMed
    1. Zhang S, Yin Y, Jones MB, Zhang Z, Kaiser BLD, Dinsmore BA, Fitzgerald C, Fields PI, Deng X. 2015. Salmonella serotype determination utilizing high-throughput genome sequencing data. J Clin Microbiol 53:1685–1692. doi:10.1128/JCM.00323-15. - DOI - PMC - PubMed
    1. Zhang S, den Bakker HC, Li S, Chen J, Dinsmore BA, Lane C, Lauer AC, Fields PI, Deng X. 2019. SeqSero2: rapid and improved Salmonella serotype determination using whole-genome sequencing data. Appl Environ Microbiol 85:e01746-19. - PMC - PubMed
    1. Yoshida CE, Kruczkiewicz P, Laing CR, Lingohr EJ, Gannon VPJ, Nash JHE, Taboada EN. 2016. The Salmonella In Silico Typing Resource (SISTR): an open web-accessible tool for rapidly typing and subtyping draft Salmonella genome assemblies. PLoS One 11:e0147101. doi:10.1371/journal.pone.0147101. - DOI - PMC - PubMed

Publication types

LinkOut - more resources