Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 67 (5), 1613-1617

Introducing EzBioCloud: A Taxonomically United Database of 16S rRNA Gene Sequences and Whole-Genome Assemblies

Affiliations

Introducing EzBioCloud: A Taxonomically United Database of 16S rRNA Gene Sequences and Whole-Genome Assemblies

Seok-Hwan Yoon et al. Int J Syst Evol Microbiol.

Abstract

The recent advent of DNA sequencing technologies facilitates the use of genome sequencing data that provide means for more informative and precise classification and identification of members of the Bacteria and Archaea. Because the current species definition is based on the comparison of genome sequences between type and other strains in a given species, building a genome database with correct taxonomic information is of paramount need to enhance our efforts in exploring prokaryotic diversity and discovering novel species as well as for routine identifications. Here we introduce an integrated database, called EzBioCloud, that holds the taxonomic hierarchy of the Bacteria and Archaea, which is represented by quality-controlled 16S rRNA gene and genome sequences. Whole-genome assemblies in the NCBI Assembly Database were screened for low quality and subjected to a composite identification bioinformatics pipeline that employs gene-based searches followed by the calculation of average nucleotide identity. As a result, the database is made of 61 700 species/phylotypes, including 13 132 with validly published names, and 62 362 whole-genome assemblies that were identified taxonomically at the genus, species and subspecies levels. Genomic properties, such as genome size and DNA G+C content, and the occurrence in human microbiome data were calculated for each genus or higher taxa. This united database of taxonomy, 16S rRNA gene and genome sequences, with accompanying bioinformatics tools, should accelerate genome-based classification and identification of members of the Bacteria and Archaea. The database and related search tools are available at www.ezbiocloud.net/.

Conflict of interest statement

Authors are employees of ChunLab, Inc., a company that provides bioinformatics services in microbial genomics and metagenomics. ChunLab paid for the research.

Figures

Fig. 1.
Fig. 1.
Schematic diagram of the algorithm for taxonomic identification of WGAs. The search engine used was the composite one described in detail in the text. Candidate novel species were added to the EzBioCloud database when a valid 16S rRNA gene sequence became available.
Fig. 2.
Fig. 2.
Examples of UPGMA dendrograms generated from a query WGA and reference genomes. (a) Strain 6_1_63FAA (NCBI Assembly accession GCF_000209425.1) is labelled as Lachnospiraceae bacterium 6_1_63FAA, but was identified as a strain of Blautia hansenii with an ANI value of 98.7 %. (b) Strain TUMSAT_H03_S5 (GCF_000591535.1) was originally deposited as a strain of Vibrio parahaemolyticus, but was identified as a strain of Vibrio alginolyticus with an OrthoANIu value of 98.6 %.

Similar articles

See all similar articles

Cited by 390 PubMed Central articles

See all "Cited by" articles

References

    1. Chun J, Lee JH, Jung Y, Kim M, Kim S, et al. EzTaxon: a web-based tool for the identification of prokaryotes based on 16S ribosomal RNA gene sequences. Int J Syst Evol Microbiol. 2007;57:2259–2261. doi: 10.1099/ijs.0.64915-0. - DOI - PubMed
    1. Cole JR, Wang Q, Fish JA, Chai B, Mcgarrell DM, et al. Ribosomal database project: data and tools for high throughput rRNA analysis. Nucleic Acids Res. 2014;42:D633–D642. doi: 10.1093/nar/gkt1244. - DOI - PMC - PubMed
    1. Kim OS, Cho YJ, Lee K, Yoon SH, Kim M, et al. Introducing EzTaxon-e: a prokaryotic 16S rRNA gene sequence database with phylotypes that represent uncultured species. Int J Syst Evol Microbiol. 2012;62:716–721. doi: 10.1099/ijs.0.038075-0. - DOI - PubMed
    1. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013;41:D590–D596. doi: 10.1093/nar/gks1219. - DOI - PMC - PubMed
    1. Fox GE, Wisotzkey JD, Jurtshuk P. How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity. Int J Syst Bacteriol. 1992;42:166–170. doi: 10.1099/00207713-42-1-166. - DOI - PubMed

Substances

LinkOut - more resources

Feedback