Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Dec 2;354(3):722-37.
doi: 10.1016/j.jmb.2005.08.071. Epub 2005 Sep 20.

Divergent Evolution Within Protein Superfolds Inferred From Profile-Based Phylogenetics

Affiliations
Free PMC article

Divergent Evolution Within Protein Superfolds Inferred From Profile-Based Phylogenetics

Douglas L Theobald et al. J Mol Biol. .
Free PMC article

Abstract

Many dissimilar protein sequences fold into similar structures. A central and persistent challenge facing protein structural analysis is the discrimination between homology and convergence for structurally similar domains that lack significant sequence similarity. Classic examples are the OB-fold and SH3 domains, both small, modular beta-barrel protein superfolds. The similarities among these domains have variously been attributed to common descent or to convergent evolution. Using a sequence profile-based phylogenetic technique, we analyzed all structurally characterized OB-fold, SH3, and PDZ domains with less than 40% mutual sequence identity. An all-against-all, profile-versus-profile analysis of these domains revealed many previously undetectable significant interrelationships. The matrices of scores were used to infer phylogenies based on our derivation of the relationships between sequence similarity E-values and evolutionary distances. The resulting clades of domains correlate remarkably well with biological function, as opposed to structural similarity, indicating that the functionally distinct sub-families within these superfolds are homologous. This method extends phylogenetics into the challenging "twilight zone" of sequence similarity, providing the first objective resolution of deep evolutionary relationships among distant protein families.

Figures

Figure 1
Figure 1
Topologies of the OB-fold, SH3 domain, and PDZ domain superfolds. (a) The OB-fold is shown with the five β-strands of the central barrel colour-coded: strand 1 is red, 2 is orange, 3 is yellow, 4 is blue, 5 is lavender. The SH3 and PDZ domains have corresponding β-strands coloured as in (a). The approximate canonical ligand-binding sites each fold are indicated by black ovals, (b) A schematic illustrating the relationships of the β-strand secondary structure among the three superfolds, coloured as in (a). SH3 doman β-strand 4 is permuted to the N terminus relative to the OB-fold, while the PDZ domain lacks β-strand 5.
Figure 2
Figure 2
Profile-based phylogenies for the OB-fold domains. Branch lengths are proportional to evolutionary distances. Black arrows show tree rootings, based on the connection of these clades to the original center node, (a) The OB-fold nucleic-acid binding clade. (b) The superantigen enterotoxin clade. (c) The molybdenum-binding clade. Inset trees: relationships between the ASTRAL SCOP domain sequences that are detectable by BLASTP and PSI-BLAST searches of the NCBI non-redundant protein database (final E-values < 0.01) are highlighted in colour. While PSI-BLAST is able to detect many of these relationships, it has a much higher false positive rate than COMPASS.
Figure 3
Figure 3
Profile-based phylogenies for the SH3 domains, (a) The RNA-binding Sm-like clade and signaling clade. (b) The plasmid-toxin clade. (c) The myosin-associated clade. Insets as in Figure 2.
Figure 4
Figure 4
Comparison of the functionally critical telomeric ssDNA-binding OB-fold domains of three telomeric end-binding proteins: (a) Saccharomyces cerevisiae Cdc13-DBD, (b) O. nova TEBP α OB1, and (c) Schizosaccharómyces pombe Pot1 OB1. Proteins are displayed in the same orientation based upon structural superposition. The β-strands of the canonical OB-fold β-barrel are shown in cyan. The functionally important yet dissimilar β1-β2, β2-β3, and β4-β5 loops of the OB-folds are indicated in magenta, yellow, and blue, respectively. The telomeric ssDNA ligands, represented in ball-and-stick, adopt diverse conformations and interact with different regions of the canonical OB-fold binding cleft. OB-fold β-strand 4, the region of greatest statistically significant sequence-profile similarity common to the domains, yet distant from the telomeric ssDNA-binding site, is highlighted in red.
Figure 5
Figure 5
Exponential decay approximates the change in expected log-odds score with time. The red line is a plot of the exact change in the expected log-odds score using a BLOSUM62 matrix and its implicit instantaneous rate matrix as given by equation (2). The black line is a χ2 best fit of equation (2) using a BLOSUM62 matrix to an exponential decay (〈S〉=3.31 e−0.8906−0.460, R = 0.99972).
Figure 6
Figure 6
Theoretical fits of the E-value/distance relationship from simulated protein evolution data. Representative data from simulated protein evolution of pairwise sequences is shown at left in red (a) and (c) and of sequence alignments is shown at right in blue (b) and (d). (Dawg, JC nucleotide evolution model, gamma rate variation α=1, negative binomial model of indel evolution, relative insertion probability=deletion probability =0.04). The upper two graphs (a) and (b) plot ln(E) versus evolutionary distance and show data fit to equation (C3), where 〈Scen〉 is given by equation (2) and 〈S 〉=γ. The lower two graphs (c) and (d) show the same data, plotted as ln(−ln E−γ) versus evolutionary distance and fit with equation (4). The constant ln C was not fit but was estimated as described in Materials and Methods. Because the variance increases with evolutionary distance in the latter two plots, these fits were weighted by the inverse of the evolutionary distance (analogous to weighting by the inverse of evolutionary distance in the phylogenetic least-squares analyses). In all graphs, the largest plotted distance corresponds to an E-value of 0.1.

Similar articles

See all similar articles

Cited by 19 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback