Rapid identification of enteric bacteria from whole genome sequences using average nucleotide identity metrics

Front Microbiol. 2023 Dec 14:14:1225207. doi: 10.3389/fmicb.2023.1225207. eCollection 2023.

Abstract

Identification of enteric bacteria species by whole genome sequence (WGS) analysis requires a rapid and an easily standardized approach. We leveraged the principles of average nucleotide identity using MUMmer (ANIm) software, which calculates the percent bases aligned between two bacterial genomes and their corresponding ANI values, to set threshold values for determining species consistent with the conventional identification methods of known species. The performance of species identification was evaluated using two datasets: the Reference Genome Dataset v2 (RGDv2), consisting of 43 enteric genome assemblies representing 32 species, and the Test Genome Dataset (TGDv1), comprising 454 genome assemblies which is designed to represent all species needed to query for identification, as well as rare and closely related species. The RGDv2 contains six Campylobacter spp., three Escherichia/Shigella spp., one Grimontia hollisae, six Listeria spp., one Photobacterium damselae, two Salmonella spp., and thirteen Vibrio spp., while the TGDv1 contains 454 enteric bacterial genomes representing 42 different species. The analysis showed that, when a standard minimum of 70% genome bases alignment existed, the ANI threshold values determined for these species were ≥95 for Escherichia/Shigella and Vibrio species, ≥93% for Salmonella species, and ≥92% for Campylobacter and Listeria species. Using these metrics, the RGDv2 accurately classified all validation strains in TGDv1 at the species level, which is consistent with the classification based on previous gold standard methods.

Keywords: ANI; WGS; average nucleotide identity; enteric bacteria; species identification.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was made possible through support from the Advanced Molecular Detection (AMD) Initiative grant number AMD-21 at the Centers for Disease Control and Prevention. Apart from the funding, the Office of AMD did not have any further involvement in the study. This project was supported, in part, by an appointment to the Research Participation Program at the Centers for Disease Control and Prevention administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and the Centers for Disease Control and Prevention.