Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 25;45(2):e7.
doi: 10.1093/nar/gkw837. Epub 2016 Sep 19.

MetaMLST: multi-locus strain-level bacterial typing from metagenomic samples

Affiliations

MetaMLST: multi-locus strain-level bacterial typing from metagenomic samples

Moreno Zolfo et al. Nucleic Acids Res. .

Abstract

Metagenomic characterization of microbial communities has the potential to become a tool to identify pathogens in human samples. However, software tools able to extract strain-level typing information from metagenomic data are needed. Low-throughput molecular typing schema such as Multilocus Sequence Typing (MLST) are still widely used and provide a wealth of strain-level information that is currently not exploited by metagenomic methods. We introduce MetaMLST, a software tool that reconstructs the MLST loci of microorganisms present in microbial communities from metagenomic data. Tested on synthetic and spiked-in real metagenomes, the pipeline was able to reconstruct the MLST sequences with >98.5% accuracy at coverages as low as 1×. On real samples, the pipeline showed higher sensitivity than assembly-based approaches and it proved successful in identifying strains in epidemic outbreaks as well as in intestinal, skin and gastrointestinal microbiome samples.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Application of MetaMLST on synthetic and semi-synthetic metagenomes highlights the high accuracy of the approach. (A) Frequency histogram for the reconstruction accuracy of MLST profiles (black) and of single MLST loci (blue and red) reconstructed by MetaMLST on synthetic metagenomes. We used a total of 12 synthetic metagenomes (2 Gbps depth each, 100 nt read length), sampled from reference genomes using an Illumina error model (10). The accuracy (i.e. percentage of identity to the reference genomes) was computed for known MLST alleles (i.e. present in existing MLST databases, in red) and for unknown MLST alleles (blue). We consider a sequence type (ST) correctly identified when >99% of the sequence is identical to the corresponding reference genomes. (B) Reconstruction accuracy of MLST loci from semi-synthetic datasets at increasing sequence coverages. Color-intensity represents the average of the confidence scores attributed by MetaMLST to the reconstruction of each locus (see ‘Materials and Methods’ section). (C) Number of known ST detected by MetaMLST and by metagenomic assembly in the samples. STs detected by both approaches are in the intersections and marked with an asterisk when one of the predictions is in disagreement. We used here a subset of HMP samples (i.e. samples from anterior nares, retroauricolar crease and stool) for which a metagenomic assembly was run successfully (23). The two methods disagreed for one case in Propionibacterium acnes, Staphylococcus epidermidis and Escherichia coli (marked with an asterisk) and agreed in all other cases. However, MetaMLST often (16 out of 31 cases) identifies more targets compared to metagenomic assembly.
Figure 2.
Figure 2.
MetaMLST typing Escherichia coli in the human gut microbiome (A) PCA on reconstructed loci-sequences by MetaMLST (see ‘Materials and Methods’ section) without (left) and with (right) the publicly available sequences for E. coli. Samples are colored by dataset and scaled in size according to the abundance of the ST. Circles represent known STs, while triangles represent new types. The pathogenic type ST-678 associated with the German 2011 outbreak and the benign type ST-10 are highlighted and tracked by MetaMLST. (B) Minimum Spanning Tree on the ST, computed with PHYLOViZ full MST algorithm (see ‘Materials and Methods’ section) and colored by dataset. The size scales with the abundance of each ST and the color is proportional to the contribution of each dataset to that ST. New types detected by MetaMLST are circled in black. Members of the non-pathogenic Clonal Complex 10 are circled in blue to highlight its high prevalence in the Chinese population.
Figure 3.
Figure 3.
MetaMLST applied to Staphylococcus epidermidis on 473 metagenomic samples. (A) Phylogenetic tree on the concatenated loci reconstructed by MetaMLST. Oh et al. (light gray) and HMP (dark gray) samples are associated with their metadata: Body-site type (outer ring) and Subject ID (intermediate ring). New STs are marked in green in the inner ring. The ‘moist’ and ‘sebaceous’ subtrees are highlighted in red and yellow, respectively. (B) Minimum Spanning Tree on the ST, computed with PHYLOViZ full MST algorithm and colored by body-site type. Each node represents an ST and the size scales with the abundance of that ST in the dataset. STs detected in more than one body-site type and their relative proportion at each site is indicated in the inner node represented as a pie chart. Two groups of mainly-moist and mainly-sebaceous associated STs are visible at the top and bottom edges of the MST.
Figure 4.
Figure 4.
MetaMLST applied to Staphylococcus aureus (A) and Propionibacterium acnes (B) on 473 metagenomic samples. Phylogenetic tree on the concatenated loci reconstructed by MetaMLST. Oh et al. (light gray) and HMP (dark gray) samples are associated with their metadata: Subject ID (intermediate ring) and body-site type (outer ring). New STs are marked in green in the inner ring. S. aureus tree (A) was computed together with available reference genomes. Particularly, subjects sh01 (nine samples, light blue) and hv10 (eight samples, purple) were colonized with either the same or very closely related S. aureus. In P. acnes we show instead both STs that are highly conserved in one subject (ST-70, blue arc) as well as STs that are highly prevalent across different subjects (STs 2, 3, 4 and 1). Prevalence in the cohort and occurrence within each subject are reported in Supplementary Table S8.
Figure 5.
Figure 5.
PCA plot of the public available MLST types for Helicobacter pylori, colored by aggregated structure population. The concatenated MLST-loci sequences of those H. pylori isolates that could be associated with a structure population in the PubMLST database (5) were analyzed in PCA space. The colors represent continental groups of structure populations; the black triangle indicates the Iceman metagenomically-inferred MLST type. The ancient H. pylori reconstructed loci are at the boundaries between European (orange) and Asian (yellow green) types. Values in brackets of PC1 and PC2 represent the amount of explained variance of the PCA analysis.

Similar articles

Cited by

References

    1. Maiden M.C., Bygraves J.A., Feil E., Morelli G., Russell J.E., Urwin R., Zhang Q., Zhou J., Zurth K., Caugant D.A., et al. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. U.S.A. 1998;95:3140–3145. - PMC - PubMed
    1. Zhang R., Gu D.X., Huang Y.L., Chan E.W., Chen G.X., Chen S. Comparative genetic characterization of Enteroaggregative Escherichia coli strains recovered from clinical and non-clinical settings. Sci. Rep. 2016;6:e24321. - PMC - PubMed
    1. Fazeli H., Sadighian H., Esfahani B.N., Pourmand M.R. Genetic characterization of Pseudomonas aeruginosa-resistant isolates at the university teaching hospital in Iran. Adv. Biomed. Res. 2015;4:e156. - PMC - PubMed
    1. Bougnoux M.E., Tavanti A., Bouchier C., Gow N.A., Magnier A., Davidson A.D., Maiden M.C., D'Enfert C., Odds F.C. Collaborative consensus for optimized multilocus sequence typing of Candida albicans. J. Clin. Microbiol. 2003;41:5265–5266. - PMC - PubMed
    1. Jolley K.A., Maiden M.C. BIGSdb: scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics. 2010;11:e595. - PMC - PubMed

Publication types

LinkOut - more resources