Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
, 36 (21), 6688-719

Genomics of Bacteria and Archaea: The Emerging Dynamic View of the Prokaryotic World

Affiliations
Review

Genomics of Bacteria and Archaea: The Emerging Dynamic View of the Prokaryotic World

Eugene V Koonin et al. Nucleic Acids Res.

Abstract

The first bacterial genome was sequenced in 1995, and the first archaeal genome in 1996. Soon after these breakthroughs, an exponential rate of genome sequencing was established, with a doubling time of approximately 20 months for bacteria and approximately 34 months for archaea. Comparative analysis of the hundreds of sequenced bacterial and dozens of archaeal genomes leads to several generalizations on the principles of genome organization and evolution. A crucial finding that enables functional characterization of the sequenced genomes and evolutionary reconstruction is that the majority of archaeal and bacterial genes have conserved orthologs in other, often, distant organisms. However, comparative genomics also shows that horizontal gene transfer (HGT) is a dominant force of prokaryotic evolution, along with the loss of genetic material resulting in genome contraction. A crucial component of the prokaryotic world is the mobilome, the enormous collection of viruses, plasmids and other selfish elements, which are in constant exchange with more stable chromosomes and serve as HGT vehicles. Thus, the prokaryotic genome space is a tightly connected, although compartmentalized, network, a novel notion that undermines the 'Tree of Life' model of evolution and requires a new conceptual framework and tools for the study of prokaryotic evolution.

Figures

Figure 1.
Figure 1.
The temporal dynamics of genome sequencing for bacteria and archaea. Bacteria: doubling time ∼20 months. Archaea: doubling time ∼34 months.
Figure 2.
Figure 2.
Distribution of genome sizes among bacteria and archaea. The distributions curves were obtained by Gaussian-kernel smoothing of the individual data points (276).
Figure 3.
Figure 3.
Density of protein-coding genes in bacterial and archaeal genomes. The distributions curves were obtained by Gaussian-kernel smoothing of the individual data points (276).
Figure 4.
Figure 4.
Length distributions of protein-coding genes (a) and intergenic regions (b) in bacterial and archaeal genomes. The distributions curves were obtained by Gaussian-kernel smoothing of the individual data points (276).
Figure 5.
Figure 5.
Coverage of bacterial and archaeal genomes with cluster of orthologous genes. The COGs were from the EggNOG database (41), and the proteins from each genome were assigned to these clusters using a modified COGNITOR method (42).
Figure 6.
Figure 6.
Representation of bacteria and archaea in clusters of orthologs: core, shell and cloud (a) distribution#of clusters of orthologs [from EggNOG (41)] by the number of included genomes—linear plot; (b) distribution of clusters of orthologs by the number of included genomes (semi-logarithmic plot) and approximation with three exponential functions.
Figure 7.
Figure 7.
Common and rare genes in selected archaeal and bacterial genomes. Red, core; green, shell; light gray, cloud; dark gray, ORFans. The assignment of the genes from each genome to one of the four classes was based on their inclusion to the core, shell or cloud EggNOGs (Figure 6); the remaining genes were classified as ORFans.
Figure 8.
Figure 8.
Genome–COG vectors. A fragment of the complete genome–COG matrix is shown. The number 1 indicates the presence and 0 indicates the absence of a gene(s) from the given genome in the given COG.
Figure 9.
Figure 9.
The prokaryotic genome space: a SOM. The SOM was produced using a custom script that implements the Kohonen algorithm (48).
Figure 10.
Figure 10.
Distribution of predicted gene functional classes for selected archaeal and bacterial genomes. Red, information processing genes; blue, genes involved in cellular functions; green, genes involved in metabolism and transport; light gray, general prediction only; dark gray, no prediction. The function class assignment is based on the inclusion of the respective genes in COGs (34).
Figure 11.
Figure 11.
Distributions of the number of organisms in clusters of orthologs for informational and operational genes. Translation, transcription and replication repair are informational function classes, and the rest are operational function classes. The distributions curves were obtained by Gaussian-kernel smoothing of the individual data points (276).
Figure 12.
Figure 12.
The function space of prokaryotes: a SOM. The SOM was produced using a custom script that implements the Kohonen algorithm (48).
Figure 13.
Figure 13.
Evolution of gene order in bacteria and archaea: genomic dot-plots. (a) Colinearity with a few breakpoints between closely related bacteria: Geobacillus thermodenitrificans versus Geobacillus kaustophilus; (b) X-shaped pattern between moderately diverged bacteria: Shewanella sp. MR-4 versus Shewanella oneidensis; (c) X-shaped pattern between moderately diverged archaea: Pyrococcus horikoshii OT3 versus Pyrococcus abyssi GE5; and (d) No clear pattern between more distantly related bacteria: Streptococcus gordonii str. Challis versus Streptococcus pneumoniae R6. In each panel, the genome indicated first is plotted along the vertical axis.
Figure 14.
Figure 14.
Scaling of genes in different functional categories with the total number of genes in archaeal and bacterial and genomes. (a) Data for individual protein-coding genes. (b) Data for COGs. The function class assignment is based on the inclusion of the respective genes in COGs (34).
Figure 15.
Figure 15.
The taxonomic breakdown of the best database hits for proteins encoded in diverse bacterial and archaeal genomes. (a) A mesophilic bacterium, Bifidobacterium longum (Biflo), compared to a hyperthermophilic bacterium, T. maritima (Thema). (b) A mesophilic archaeon, M. mazei (Metma), compared to hyperthemrophilic archaeon, Sulfolobus solfataricus (Sulso). The best hits were obtained by processing the results of the searches of the NCBI's nonredundant protein sequence database using the BLASTP program (277).
Figure 16.
Figure 16.
Two cases of readily demonstrable horizontal gene transfer between archaea and bacteria. (a) COG0030, dimethyladenosine transferase, an enzyme involved in rRNA methylation. (b) COG0206, FtsZ, a GTPase involved in cell division. Blue, bacteria; magenta, archaea. The trees were constructed using the maximum likelihood method implemented in the PhyML software (278) (WAG evolutionary model; γ-distributed site-specific rates with the shape parameter 1.0). The complete information on the analyzed sequences and the alignments are available from the authors upon request.
Figure 17.
Figure 17.
The dynamic view of the prokaryotic world. The figure is a conceptual schematic representation that is not based on specific data. The larger blue circles denote extant (solid lines) or ancestral (dashed lines) archaeal and bacterial genomes. The small red circles denote mobilome components such as plasmids or phages. Gray lines denote vertical inheritance of genes; green lines denote recent (solid) or ancient (dashed) HGT; red lines denote the permanent ongoing process of the exchange of genetic material between mobilome elements. The thickness of connecting lines reflects the intensity of gene transfer between the respective genetic elements.
Figure 18.
Figure 18.
The principal forces of evolution in prokaryotes and their effects on archaeal and bacterial genomes. The horizontal line shows archaeal and bacterial genome size on a logarithmic scale (in megabase pairs) and the approximate corresponding number of genes (in parentheses). On this axis, some values that are important in the context of comparative genomics are roughly mapped: the two peaks of genome size distribution (Figure 2); ‘Van Nimwegen Limit’ (VNL) determined by the ‘cellular bureaucracy’ burden; the minimal genome size of free-living archaea and bacteria (MFL); the minimal genome size inferred by genome comparison [MG, (133,135,136)]; the smallest (C.r., C. rudii); and the largest (S.c., S. cellulosum) known bacterial genome size. The effects of the main forces of prokaryotic genome evolution are denoted by triangles that are positioned, roughly, over the ranges of genome size for which the corresponding effects are thought to be most pronounced.
Figure 19.
Figure 19.
The dependence between genome size and selection pressure in prokaryotes. The data are from the analysis of 41 alignable tight genome clusters (ATGCs) of bacteria and archaea [(240); P.S. Novichkov, Y.I.W., I. Dubchak and E.V.K., unpublished data). DN is the median of dN, and DS is the median of dS for the respective ATGC. The greater DN/DS the lower the pressure of purifying selection that affects the evolution of the genomes within an ATGC is considered to be. Rs is Spearman ranking correlation coefficient.

Similar articles

See all similar articles

Cited by 277 articles

See all "Cited by" articles

References

    1. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd [see comments] Science. 1995;269:496–512. - PubMed
    1. Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, Fleischmann RD, Bult CJ, Kerlavage AR, Sutton G, Kelley JM, et al. The minimal gene complement of Mycoplasma genitalium. Science. 1995;270:397–403. - PubMed
    1. Koonin EV, Mushegian AR. Complete genome sequences of cellular life forms: glimpses of theoretical evolutionary genomics. Curr. Opin. Genet. Dev. 1996;6:757–762. - PubMed
    1. Koonin EV, Mushegian AR, Rudd KE. Sequencing and analysis of bacterial genomes. Curr. Biol. 1996;6:404–416. - PubMed
    1. Entrez Genome Project. 2008. [(accessed 10 June 2008)]. National Center for Biotechnology Information, NIH, Bethesda. Available at http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi.

Publication types

Feedback