Mixture models of nucleotide sequence evolution that account for heterogeneity in the substitution process across sites and across lineages
- PMID: 24927722
- DOI: 10.1093/sysbio/syu036
Mixture models of nucleotide sequence evolution that account for heterogeneity in the substitution process across sites and across lineages
Abstract
Molecular phylogenetic studies of homologous sequences of nucleotides often assume that the underlying evolutionary process was globally stationary, reversible, and homogeneous (SRH), and that a model of evolution with one or more site-specific and time-reversible rate matrices (e.g., the GTR rate matrix) is enough to accurately model the evolution of data over the whole tree. However, an increasing body of data suggests that evolution under these conditions is an exception, rather than the norm. To address this issue, several non-SRH models of molecular evolution have been proposed, but they either ignore heterogeneity in the substitution process across sites (HAS) or assume it can be modeled accurately using the distribution. As an alternative to these models of evolution, we introduce a family of mixture models that approximate HAS without the assumption of an underlying predefined statistical distribution. This family of mixture models is combined with non-SRH models of evolution that account for heterogeneity in the substitution process across lineages (HAL). We also present two algorithms for searching model space and identifying an optimal model of evolution that is less likely to over- or underparameterize the data. The performance of the two new algorithms was evaluated using alignments of nucleotides with 10 000 sites simulated under complex non-SRH conditions on a 25-tipped tree. The algorithms were found to be very successful, identifying the correct HAL model with a 75% success rate (the average success rate for assigning rate matrices to the tree's 48 edges was 99.25%) and, for the correct HAL model, identifying the correct HAS model with a 98% success rate. Finally, parameter estimates obtained under the correct HAL-HAS model were found to be accurate and precise. The merits of our new algorithms were illustrated with an analysis of 42 337 second codon sites extracted from a concatenation of 106 alignments of orthologous genes encoded by the nuclear genomes of Saccharomyces cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, S. castellii, S. kluyveri, S. bayanus, and Candida albicans. Our results show that second codon sites in the ancestral genome of these species contained 49.1% invariable sites, 39.6% variable sites belonging to one rate category (V1), and 11.3% variable sites belonging to a second rate category (V2). The ancestral nucleotide content was found to differ markedly across these three sets of sites, and the evolutionary processes operating at the variable sites were found to be non-SRH and best modeled by a combination of eight edge-specific rate matrices (four for V1 and four for V2). The number of substitutions per site at the variable sites also differed markedly, with sites belonging to V1 evolving slower than those belonging to V2 along the lineages separating the seven species of Saccharomyces. Finally, sites belonging to V1 appeared to have ceased evolving along the lineages separating S. cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, and S. bayanus, implying that they might have become so selectively constrained that they could be considered invariable sites in these species.
© CSIRO 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Similar articles
-
Evaluation of Ancestral Sequence Reconstruction Methods to Infer Nonstationary Patterns of Nucleotide Substitution.Genetics. 2015 Jul;200(3):873-90. doi: 10.1534/genetics.115.177386. Epub 2015 May 6. Genetics. 2015. PMID: 25948563 Free PMC article.
-
A class frequency mixture model that adjusts for site-specific amino acid frequencies and improves inference of protein phylogeny.BMC Evol Biol. 2008 Dec 16;8:331. doi: 10.1186/1471-2148-8-331. BMC Evol Biol. 2008. PMID: 19087270 Free PMC article.
-
ProtASR: An Evolutionary Framework for Ancestral Protein Reconstruction with Selection on Folding Stability.Syst Biol. 2017 Nov 1;66(6):1054-1064. doi: 10.1093/sysbio/syw121. Syst Biol. 2017. PMID: 28057858
-
Models of coding sequence evolution.Brief Bioinform. 2009 Jan;10(1):97-109. doi: 10.1093/bib/bbn049. Epub 2008 Oct 29. Brief Bioinform. 2009. PMID: 18971241 Free PMC article. Review.
-
Next-generation development and application of codon model in evolution.Front Genet. 2023 Jan 27;14:1091575. doi: 10.3389/fgene.2023.1091575. eCollection 2023. Front Genet. 2023. PMID: 36777719 Free PMC article. Review.
Cited by
-
A minimum reporting standard for multiple sequence alignments.NAR Genom Bioinform. 2020 Apr 14;2(2):lqaa024. doi: 10.1093/nargab/lqaa024. eCollection 2020 Jun. NAR Genom Bioinform. 2020. PMID: 33575581 Free PMC article.
-
Evaluation of Ancestral Sequence Reconstruction Methods to Infer Nonstationary Patterns of Nucleotide Substitution.Genetics. 2015 Jul;200(3):873-90. doi: 10.1534/genetics.115.177386. Epub 2015 May 6. Genetics. 2015. PMID: 25948563 Free PMC article.
-
ModelFinder: fast model selection for accurate phylogenetic estimates.Nat Methods. 2017 Jun;14(6):587-589. doi: 10.1038/nmeth.4285. Epub 2017 May 8. Nat Methods. 2017. PMID: 28481363 Free PMC article.
-
Roadmap to the study of gene and protein phylogeny and evolution-A practical guide.PLoS One. 2023 Feb 24;18(2):e0279597. doi: 10.1371/journal.pone.0279597. eCollection 2023. PLoS One. 2023. PMID: 36827278 Free PMC article.
-
Genomic-Scale Interaction Involving Complementary Sequences in the Hepatitis C Virus 5'UTR Domain IIa and the RNA-Dependent RNA Polymerase Coding Region Promotes Efficient Virus Replication.Viruses. 2018 Dec 28;11(1):17. doi: 10.3390/v11010017. Viruses. 2018. PMID: 30597844 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Research Materials
