Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul 2;46(W1):W282-W288.
doi: 10.1093/nar/gky467.

The Microbial Genomes Atlas (MiGA) Webserver: Taxonomic and Gene Diversity Analysis of Archaea and Bacteria at the Whole Genome Level

Affiliations
Free PMC article

The Microbial Genomes Atlas (MiGA) Webserver: Taxonomic and Gene Diversity Analysis of Archaea and Bacteria at the Whole Genome Level

Luis M Rodriguez-R et al. Nucleic Acids Res. .
Free PMC article

Abstract

The small subunit ribosomal RNA gene (16S rRNA) has been successfully used to catalogue and study the diversity of prokaryotic species and communities but it offers limited resolution at the species and finer levels, and cannot represent the whole-genome diversity and fluidity. To overcome these limitations, we introduced the Microbial Genomes Atlas (MiGA), a webserver that allows the classification of an unknown query genomic sequence, complete or partial, against all taxonomically classified taxa with available genome sequences, as well as comparisons to other related genomes including uncultivated ones, based on the genome-aggregate Average Nucleotide and Amino Acid Identity (ANI/AAI) concepts. MiGA integrates best practices in sequence quality trimming and assembly and allows input to be raw reads or assemblies from isolate genomes, single-cell sequences, and metagenome-assembled genomes (MAGs). Further, MiGA can take as input hundreds of closely related genomes of the same or closely related species (a so-called 'Clade Project') to assess their gene content diversity and evolutionary relationships, and calculate important clade properties such as the pangenome and core gene sets. Therefore, MiGA is expected to facilitate a range of genome-based taxonomic and diversity studies, and quality assessment across environmental and clinical settings. MiGA is available at http://microbial-genomes.org/.

Figures

Figure 1.
Figure 1.
Estimate the confidence of taxonomic assignments based on AAI values. (A) Distributions of AAI values for genome pairs in the RefSeq database per rank of lowest common taxon. The central 90%, 95% and 99% ranges are shown on each rank. Outlier taxa, listed in supplementary methods, are presented here as gray dots. Note that values between Escherichia coli and Shigella spp significantly deviate from the central tendency, but are not excluded because the lowest common taxon (family Enterobacteriaceae) is not significantly affected. (B) Average AAI values per taxon by lowest common taxon. Within each row, dot sizes linearly reflect the size of the taxon. Outlier taxa (excluded from the empirical distributions) are highlighted. (C) Empirical P-values (per AAI to best match) for the alternative hypotheses that (i) the classification of query and reference genomes are the same at the given rank (taxonomic classification test; solid lines) or (ii) that the query genome represents a novel taxon with respect to the database (taxonomic novelty test; dashed lines). Colors correspond to the evaluated rank (as in A and B).
Figure 2.
Figure 2.
Clade Project on Bacillus cereus sensu lato (s.l). Collection of publicly available genomes from B. cereus, B. thuringensis, B. mycoides, B. pseudomycoides and B. anthracis species, a clade of highly related species collectively known as B. cereus s.l. (A) Schematic indicating the various definitions used in Clade Projects. The matrix represents genes organized by groups of orthology (OG; rows) per genome (columns). Absent genes (genomes missing a given OG) are indicated with empty boxes, while solid boxes represent present genes. Multiple boxes in the same cell indicate multiple copies (i.e., internal paralogs). The collection of all non-redundant OGs is termed pangenome, the subset of OGs present in all genomes is termed core genome, and the subset of OGs present in single copy in all genomes is termed Unus genome. (B) Rarefied OG counts in the Pangenome and the Core genome per sampled genomes. This graphical output is directly generated by MiGA. (C) Comparison between the ANI clustering (left cladogram) and canonical SNP scheme (right cladogram) for the B. anthracis genomes in the collection. Note that both techniques produce the same large groupings (colors), but ANI offers higher resolution, with subclades defined within each clade up to four degrees.
Figure 3.
Figure 3.
Classification examples in MiGA Online. Two examples of the typical output of MiGA classification analysis are shown. The top panel shows a genome classified as Bacillus bombysepticus, displaying high ANI values against its best match and consequently high classification confidence. The bottom panel shows a query genome with only distant relatives available in the database, with a maximum AAI of 48% classified to order level (Chromatiales) with low confidence, and with high confidence only to class level (Gammaproteobacteria).
Figure 4.
Figure 4.
MiGA Workflow. MiGA users can initialize their analyses from raw reads or assemblies, either for genomic datasets (isolate genomes, single-cell amplified genomes, or metagenome-assembled genomes) or metagenomic (microbial metagenomes or viral enrichment metagenomes). After basic pre-processing, MiGA will query the resulting sequences against its reference genome sequences to identify the closest relatives using a hierarchical hAAI/AAI/ANI scheme, and determine the best-supported taxonomic assignment.

Similar articles

See all similar articles

Cited by 34 articles

See all "Cited by" articles

References

    1. Rodriguez-R L.M., Castro J.C., Kyrpides N.C., Cole J.R., Tiedje J.M., Konstantinidis K.T. How much do rRNA gene surveys underestimate extant bacterial diversity. Appl. Environ. Microbiol.: AEM. 2018; 84:e00014–18. - PMC - PubMed
    1. Konstantinidis K.T., Tiedje J.M. Genomic insights that advance the species definition for prokaryotes. Proc. Natl. Acad. Sci. U.S.A. 2005; 102:2567–2572. - PMC - PubMed
    1. Goris J., Konstantinidis K.T., Klappenbach J.A., Coenye T., Vandamme P., Tiedje J.M. DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int. J. Syst. Evol. Microbiol. 2007; 57:81–91. - PubMed
    1. Richter M., Rosselló-Móra R. Shifting the genomic gold standard for the prokaryotic species definition. Proc. Natl. Acad. Sci. U.S.A. 2009; 106:19126–19131. - PMC - PubMed
    1. Kim M., Oh H-S., Park S-C., Chun J. Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes. Int. J. Syst. Evol. Microbiol. 2014; 64:346–351. - PubMed

Publication types

Substances

Feedback