Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Filters applied. Clear all
. 2004 Jun 29;101(26):9722-7.
doi: 10.1073/pnas.0400975101. Epub 2004 Jun 21.

Computational Inference of Scenarios for Alpha-Proteobacterial Genome Evolution

Affiliations
Free PMC article

Computational Inference of Scenarios for Alpha-Proteobacterial Genome Evolution

Bastien Boussau et al. Proc Natl Acad Sci U S A. .
Free PMC article

Abstract

The alpha-proteobacteria, from which mitochondria are thought to have originated, display a 10-fold genome size variation and provide an excellent model system for studies of genome size evolution in bacteria. Here, we use computational approaches to infer ancestral gene sets and to quantify the flux of genes along the branches of the alpha-proteobacterial species tree. Our study reveals massive gene expansions at branches diversifying plant-associated bacteria and extreme losses at branches separating intracellular bacteria of animals and humans. Alterations in gene numbers have mostly affected functional categories associated with regulation, transport, and small-molecule metabolism, many of which are encoded by paralogous gene families located on auxiliary chromosomes. The results suggest that the alpha-proteobacterial ancestor contained 3,000-5,000 genes and was a free-living, aerobic, and motile bacterium with pili and surface proteins for host cell and environmental interactions. Approximately one third of the ancestral gene set has no homologs among the eukaryotes. More than 40% of the genes without eukaryotic counterparts encode proteins that are conserved among the alpha-proteobacteria but for which no function has yet been identified. These genes that never made it into the eukaryotes but are widely distributed in bacteria may represent bacterial drug targets and should be prime candidates for future functional characterization.

Figures

Fig. 3.
Fig. 3.
Inference of deletions/duplications and gene-genesis events based on the α-proteobacterial tree was made by using different clustering levels and penalty values. The inference was based on proteins already classified in COGs (23) to which we added COGs containing proteins in three or more species internally related by best hits (58,171 proteins in total) (a) and the complete set of proteins (73,658 proteins in total) (b). Inference of gene contents was made by using the acctran option for parsimony analysis in paup* with penalties for duplication, deletion, and gene genesis set to 1, 1, and 5, respectively. Numbers along branches refer to the number of duplications/losses/genesis, respectively. Numbers at nodes refer to the putative number of genes in the inferred genome at the node. Outgroup sequences are as described for Fig. 2, but they were pruned from the tree shown here. Abbreviations for species names are as described in the legends to Figs. 1 and 2.
Fig. 1.
Fig. 1.
Plot of genome size against gene content for each of the functional categories. RP, R. prowazekii; RC, R. conorii; BQ, B. quintana; BH, B. henselae; BM, B. melitensis; BS, B. suis; CC, C. crescentus; AT, A. tumefaciens; SM, S. meliloti; ML, M. loti; and BJ, B. japonicum. See Table 1 for genome sizes. The data were separated into two sections (a and b) to prevent overcrowding.
Fig. 2.
Fig. 2.
Phylogenetic relationship of 13 α-proteobacterial species (high-lighted by the purple background) with 7 species from other proteobacterial subdivisions as outgroups. The topology, branch lengths, and bootstrap support are according to maximum-likelihood reconstructions with the Jones-Taylor-Thornton + 4ΓI model. Similar results were obtained with the neighbor-joining method and after removal of positions with gaps. A list of genes used for the phylogenetic reconstructions is given in Table 5. Abbreviations for species names are as described in the legend to Fig. 1 with the addition of the following taxa: WP, W. pipientis; RhP, R. palustris; CJ, C. jejuni; EC, E. coli; HP, H. pylori; PA, P. aeruginosa; RS, R. solanacearum; ST, S. typhi; and XF, X. fastidiosa.
Fig. 4.
Fig. 4.
Net gene loss or gain throughout the evolution of the α-proteobacterial species. Arrows pointing upward indicate net gains of genes (G), and arrows pointing downward indicate net losses of genes (L). Colors and sizes of arrows refer to the net number of genes gained or lost at each branch. Colors of circles refer to the relative fraction of genes assigned to the different functional groups in the modern and inferred genome at the node. Yellow, information storage and processing; green, metabolism; red, cellular processes; blue, poorly characterized. Clustering groups and estimated frequencies are as described for Fig. 3a. Abbreviations for species names are as described in the legends to Figs. 1 and 2.
Fig. 5.
Fig. 5.
Number of COGs in the α-proteobacterial ancestor (Fig. 3a) with sequence similarity to eukaryotic genes for different blast score values. Estimated number of COGs that shows similarity to eukaryotic genes in the inferred proteomes of the α-proteobacterial ancestor (upper curve) and the minimal protomitochondrial ancestor (lower curve) (15).

Similar articles

See all similar articles

Cited by 64 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback