Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 28;9(4):64.
doi: 10.3390/biology9040064.

Phylogenetic Analyses of Sites in Different Protein Structural Environments Result in Distinct Placements of the Metazoan Root

Affiliations

Phylogenetic Analyses of Sites in Different Protein Structural Environments Result in Distinct Placements of the Metazoan Root

Akanksha Pandey et al. Biology (Basel). .

Abstract

Phylogenomics, the use of large datasets to examine phylogeny, has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life; this could reflect, at least in part, the poor-fit of the models used to analyze heterogeneous datasets. Some of the heterogeneity may reflect the different patterns of selection on proteins based on their structures. To test that hypothesis, we developed a pipeline to divide phylogenomic protein datasets into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had distinct signals for the topology of the deepest branches in the metazoan tree. We focused on a dataset that appeared to have a mixture of signals and we found that the most striking difference in phylogenetic signal reflected relative solvent accessibility. Analyses of exposed sites (residues located on the surface of proteins) yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge+ctenophore clade. These differences in phylogenetic signal were not ameliorated when we conducted analyses using a set of maximum-likelihood profile mixture models. These models are very similar to the Bayesian CAT model, which has been used in many analyses of deep metazoan phylogeny. In contrast, analyses conducted after recoding amino acids to limit the impact of deviations from compositional stationarity increased the congruence in the estimates of phylogeny for exposed and buried sites; after recoding amino acid trees estimated using the exposed and buried site both supported placement of ctenophores sister to all other animals. Although the central conclusion of our analyses is that sites in different structural environments yield distinct trees when analyzed using models of protein evolution, our amino acid recoding analyses also have implications for metazoan evolution. Specifically, our results add to the evidence that ctenophores are the sister group of all other animals and they further suggest that the placozoa+cnidaria clade found in some other studies deserves more attention. Taken as a whole, these results provide striking evidence that it is necessary to achieve a better understanding of the constraints due to protein structure to improve phylogenetic estimation.

Keywords: Ctenophora; Porifera; RY coding; heteropecilly; metazoan phylogeny; non-stationary models; protein structure; relative solvent accessibility.

PubMed Disclaimer

Conflict of interest statement

Authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Topologies for the deepest branches in the metazoan tree recovered in phylogenomic analyses. (a) Porifera (sponges) sister to all other metazoa. This hypothesis includes a clade designated Eumetazoa (E). (b) Ctenophora (comb jellies) sister to all other metazoa. (c) A sponge+ctenophore clade sister to all other animals. All trees shown include a clade named Parahoxozoa (P) [43]. The Parahoxozoa topology was fixed based on King and Rokas [10], but an alternative topology with bilateria sister to a placozoa+cnidaria clade is also plausible.
Figure 2
Figure 2
Analyses of sites from different structural environments reveal conflicting phylogenetic signals. We show simplified RaxML trees with both trees are limited to the metazoan ingroups and the choanozoan outgroup (i.e., only Apoikozoa sensu Budd and Jensen [78] are shown). The position of the root drawn in these trees was established by the outgroup taxa (the holozoans Capsaspora and Sphaeroforma and the fungi Saccharomyces and Spizellomyces). Bootstrap support for the positions of sponges and ctenophores given the general time reversible (GTR) model and the Le and Gascuel [52] (LG )model is indicated next to the arrow.
Figure 3
Figure 3
Heat map showing support for tree topologies obtained using various structural classes and taxon samples. In the online version colors indicate bootstrap support values (Dark green > 95, Lighter green >75, Yellow > 50 and Pink < 50; No color: Topology was not recovered).
Figure 4
Figure 4
Multidimensional scaling plot showing the Euclidean distances between various amino acids exchange rate matrices. Different colors indicate different categories of the matrices in the online version (Green: exposed residues, Orange: buried residues, Purple: secondary structure, and Pink: standard empirical models).
Figure 5
Figure 5
Heat map showing support for deuterostome monophyly for exposed and buried residues using GTR, LGL-F81, and LGL-BUR/EXP models. Colors indicate whether or not deuterostome were monophyly (No color: monophyletic, Purple: not monophyletic). Note that the bootstrap consensus tree for the LGL-BUR-C20 conflicted with the optimal tree; it had 53% support for monophyly.
Figure 6
Figure 6
Compositional variation and the impact of recoding on tree estimation. (a) Variation across taxa in the ratio of amino acids encoded by GC-rich codons (G, A, R, and P) to those encoded by AT-rich codons (F, Y, M, I, N, and K). To limit the impact of invariant sites we only considered parsimony informative sites. (b) Variation in base composition for parsimony informative first and second codon positions after back translation. (c) The results of tree searches after recoding as binary (purine-pyrimidine (RY)) characters. The tree topologies were identical, although the tree lengths did differ (note scale bars). Support for the node that defines T2 (i.e., the node that places Ctenophora sister to other Metazoa) is emphasized using a red box; support given the binary data is presented to the top and the value for six-state Dayhoff recoding is presented below. For other nodes, bootstrap values <100% are presented, with values for six-state recoding presented to the right. Since the trees obtained after binary and six-state coding of buried sites exhibited some topological conflicts the bootstrap support for six-state coding is not presented on that tree (except for the focal node).

Similar articles

Cited by

References

    1. Gee H. Ending incongruence. Nature. 2003;425:782. doi: 10.1038/425782a. - DOI - PubMed
    1. Rokas A., Williams B.I., King N., Carroll S.B. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature. 2003;425:798–804. doi: 10.1038/nature02053. - DOI - PubMed
    1. Nishihara H., Okada N., Hasegawa M., Rokas A., Williams B., King N., Carroll S., Soltis D., Albert V., Savolainen V., et al. Rooting the eutherian tree: The power and pitfalls of phylogenomics. Genome Biol. 2007;8:R199. doi: 10.1186/gb-2007-8-9-r199. - DOI - PMC - PubMed
    1. Misof B., Liu S., Meusemann K., Peters R.S., Donath A., Mayer C., Frandsen P.B., Ware J., Flouri T., Beutel R.G., et al. Phylogenomics resolves the timing and pattern of insect evolution. Science. 2014;346:763–767. doi: 10.1126/science.1257570. - DOI - PubMed
    1. Wickett N.J., Mirarab S., Nguyen N., Warnow T., Carpenter E., Matasci N., Ayyampalayam S., Barker M.S., Burleigh J.G., Gitzendanner M.A., et al. Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc. Natl. Acad. Sci. USA. 2014;111:E4859–E4868. doi: 10.1073/pnas.1323926111. - DOI - PMC - PubMed

LinkOut - more resources