Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 17 (11), 1572-85

Reductive Evolution of Architectural Repertoires in Proteomes and the Birth of the Tripartite World

Affiliations

Reductive Evolution of Architectural Repertoires in Proteomes and the Birth of the Tripartite World

Minglei Wang et al. Genome Res.

Abstract

The repertoire of protein architectures in proteomes is evolutionarily conserved and capable of preserving an accurate record of genomic history. Here we use a census of protein architecture in 185 genomes that have been fully sequenced to generate genome-based phylogenies that describe the evolution of the protein world at fold (F) and fold superfamily (FSF) levels. The patterns of representation of F and FSF architectures over evolutionary history suggest three epochs in the evolution of the protein world: (1) architectural diversification, where members of an architecturally rich ancestral community diversified their protein repertoire; (2) superkingdom specification, where superkingdoms Archaea, Bacteria, and Eukarya were specified; and (3) organismal diversification, where F and FSF specific to relatively small sets of organisms appeared as the result of diversification of organismal lineages. Functional annotation of FSF along these architectural chronologies revealed patterns of discovery of biological function. Most importantly, the analysis identified an early and extensive differential loss of architectures occurring primarily in Archaea that segregates the archaeal lineage from the ancient community of organisms and establishes the first organismal divide. Reconstruction of phylogenomic trees of proteomes reflects the timeline of architectural diversification in the emerging lineages. Thus, Archaea undertook a minimalist strategy using only a small subset of the full architectural repertoire and then crystallized into a diversified superkingdom late in evolution. Our analysis also suggests a communal ancestor to all life that was molecularly complex and adopted genomic strategies currently present in Eukarya.

Figures

Figure 1.
Figure 1.
Architectural chronologies of (F) folds (left) and (FSF) fold superfamilies (right) suggest three evolutionary epochs in the timeline of the protein world. (A) Optimal (P < 0.01) most-parsimonious F (85,644 steps; CI = 0.043, RI = 0.770; g1 = −0.134) and FSF (118,119 steps; CI = 0.031, RI = 0.759; g1 = −0.099) trees were reconstructed from a protein domain census in 185 completely sequenced genomes. Venn diagrams show occurrence of architectures in the three superkingdoms of life, Archaea (A), Bacteria (B), and Eukarya (E). Terminal leaves were not labeled, as they would not be legible. (Red) Branches defining F and FSF that occur after the appearance of the first architecture unique to a superkingdom (B). (B) Distribution index of individual architectures (f, the number of species using an architecture/total number of species) against the age of architectures (nd, number of nodes from the root/total number of nodes in the tree) uncovers evolutionary patterns of architectural innovation and usage when studying all genomes or only those that are free-living. Based on these patterns, we propose three evolutionary epochs of the protein world: (light green) structural diversification; (salmon) superkingdom specification; (yellow) organismal diversification epochs.
Figure 2.
Figure 2.
Six phases in the evolutionary timeline of the protein world based on distribution of F (left) and FSF (right) within the three superkingdoms of life. (A) Bar diagrams display ranges of age (nd) for architectures unique to superkingdoms (A, B, or E) or shared by two (AB, BE, or AE) or all (ABE) superkingdoms. Trees describe global most-parsimonious scenarios for organismal diversification of proteomes based on architectural distribution patterns. Numbers indicate the size of architectural repertoires in A, B, and E lineages at the corresponding nd values. The horizontal scale is as in B. (B) Distribution index (f) of F and FSF within the three superkingdoms for (gray) all organisms or (black) free living only against the age of the individual architectures. (Light green) Structural diversification; (salmon) superkingdom specification; (yellow) organismal diversification epochs. Roman numerals indicate the evolutionary phases of the protein world described in the text. (Red lines) Cumulative loss of BE architectures (number of architectures absent in each organism, summated over organisms, and integrated over nd); the ordinate is in logarithmic scale with units not displayed; the abscissa matches nd values.
Figure 3.
Figure 3.
Evolution of biological function along the six phases of the architectural chronology. (A) Bar diagrams describe the fraction of FSF corresponding to each of seven coarse-grained functional categories in each superkingdom relative to their use in all life within a particular evolutionary phase (fo), and circles describe how widely distributed these FSF are among organisms within each superkingdom, as average distribution indices (f). When bars and circles are both high or low, the relative importance of that function is either high or low, respectively—the function present in most FSF is important to most organisms in a superkingdom, or the function present in few FSF is only important to a small organismal subset. When bars are high and circles are low or when bars are low and circles are high, function in most FSF is important to small organismal subsets or function in few FSF is important to most organisms, respectively. (B) Pie charts describe FSF distribution in functional categories for every phase. The size of each pie chart is proportional to the number of FSF in each phase. Four uninformative “not annotated” FSF (d.58.45 and e.30.1 of phase V, and a.125.1 and d.46.1 of phase VI) were not included in the analysis.
Figure 4.
Figure 4.
Optimal most-parsimonious phylogenomic trees of proteomes from 82 free-living organisms, generated using subsets of FSF corresponding to different phases of evolutionary history. (A) Ancient FSF, ndFSF < 0.174 (6727 steps; CI = 0.232, RI = 0.687; g1 = −0.316). (B) Intermediate FSF, 0.174 < ndFSF < 0.489 (38,405 steps; CI = 0.184, RI = 0.681; g1 = −0.299). (C) Young FSF, ndFSF > 0.489 (67,555 steps; CI = 0.234, RI = 0.709; g1 = −0.576). Terminal leaves are not labeled, as they would not be legible. Individual trees with taxon labels are shown in Supplemental Figure S3. Bootstrap support (BS) levels for branches are indicated with different shades and with numbers in nodes delimiting superkingdoms.
Figure 5.
Figure 5.
Cumulative frequency distribution of F (left) and FSF (right) along the trees of architectures that are unique or shared by organisms with (FL) free-living, (P) parasitic, or (OP) obligate parasitic lifestyles. (A) Venn and (B) bar diagrams show the distribution and range of age (nd, number of nodes from the root/total number of nodes in the tree) for architectures within one (FL, P, or OP) or more (FL-P, FL-OP, P-OP, and FL-P-OP) lifestyle categories. (C) Cumulative number of F or FSF architectures against nd.
Figure 6.
Figure 6.
Effect of lifestyle on use of protein F in proteomes. (A) F usage in proteomes, sorted in descending order. (FL) Free-living; (P) parasitic; (OP) obligate parasitic lifestyle. (B) Pie charts of the protein repertoire within the superkingdoms of life. The size of each pie chart is proportional to the genomic abundance of F within the respective superkingdom, and percentages represent the fraction of total abundance designated by each sector. F are identified as superkingdom-specific (A, B, or E), or shared by some (AB, BE, or AE) or by all (ABE) superkingdoms. ABE F are further divided into those that are omnipresent F shared by all organisms (ABEo) and those that appeared before (ABE < 439) or after (ABE > 439) d.229, the first F unique to Bacteria that delimits the upper bound of the organismal specification epoch at ndF = 0.439.

Similar articles

See all similar articles

Cited by 52 PubMed Central articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback