A complete domain-to-species taxonomy for Bacteria and Archaea

Nat Biotechnol. 2020 Sep;38(9):1079-1086. doi: 10.1038/s41587-020-0501-8. Epub 2020 Apr 27.

Abstract

The Genome Taxonomy Database is a phylogenetically consistent, genome-based taxonomy that provides rank-normalized classifications for ~150,000 bacterial and archaeal genomes from domain to genus. However, almost 40% of the genomes in the Genome Taxonomy Database lack a species name. We address this limitation by using commonly accepted average nucleotide identity criteria to set bounds on species and propose species clusters that encompass all publicly available bacterial and archaeal genomes. Unlike previous average nucleotide identity studies, we chose a single representative genome to serve as the effective nomenclatural 'type' defining each species. Of the 24,706 proposed species clusters, 8,792 are based on published names. We assigned placeholder names to the remaining 15,914 species clusters to provide names to the growing number of genomes from uncultivated species. This resource provides a complete domain-to-species taxonomic framework for bacterial and archaeal genomes, which will facilitate research on uncultivated species and improve communication of scientific results.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Archaea / classification*
  • Archaea / genetics
  • Bacteria / classification*
  • Bacteria / genetics
  • Databases, Genetic
  • Genome, Archaeal / genetics
  • Genome, Bacterial / genetics
  • Nucleic Acid Hybridization
  • Phylogeny*
  • Reproducibility of Results