Inference of gain and loss events from phyletic patterns using stochastic mapping and maximum parsimony--a simulation study
- PMID: 21971516
- PMCID: PMC3215202
- DOI: 10.1093/gbe/evr101
Inference of gain and loss events from phyletic patterns using stochastic mapping and maximum parsimony--a simulation study
Abstract
Bacterial evolution is characterized by frequent gain and loss events of gene families. These events can be inferred from phyletic pattern data-a compact representation of gene family repertoire across multiple genomes. The maximum parsimony paradigm is a classical and prevalent approach for the detection of gene family gains and losses mapped on specific branches. We and others have previously developed probabilistic models that aim to account for the gain and loss stochastic dynamics. These models are a critical component of a methodology termed stochastic mapping, in which probabilities and expectations of gain and loss events are estimated for each branch of an underlying phylogenetic tree. In this work, we present a phyletic pattern simulator in which the gain and loss dynamics are assumed to follow a continuous-time Markov chain along the tree. Various models and options are implemented to make the simulation software useful for a large number of studies in which binary (presence/absence) data are analyzed. Using this simulation software, we compared the ability of the maximum parsimony and the stochastic mapping approaches to accurately detect gain and loss events along the tree. Our simulations cover a large array of evolutionary scenarios in terms of the propensities for gene family gains and losses and the variability of these propensities among gene families. Although in all simulation schemes, both methods obtain relatively low levels of false positive rates, stochastic mapping outperforms maximum parsimony in terms of true positive rates. We further studied the factors that influence the performance of both methods. We find, for example, that the accuracy of maximum parsimony inference is substantially reduced when the goal is to map gain and loss events along internal branches of the phylogenetic tree. Furthermore, the accuracy of stochastic mapping is reduced with smaller data sets (limited number of gene families) due to unreliable estimation of branch lengths. Our simulator and simulation results are additionally relevant for the analysis of other types of binary-coded data, such as the existence of homologues restriction sites, gaps, and introns, to name a few. Both the simulation software and the inference methodology are freely available at a user-friendly server: http://gloome.tau.ac.il/.
Similar articles
-
GLOOME: gain loss mapping engine.Bioinformatics. 2010 Nov 15;26(22):2914-5. doi: 10.1093/bioinformatics/btq549. Epub 2010 Sep 27. Bioinformatics. 2010. PMID: 20876605
-
Inference and characterization of horizontally transferred gene families using stochastic mapping.Mol Biol Evol. 2010 Mar;27(3):703-13. doi: 10.1093/molbev/msp240. Epub 2009 Oct 6. Mol Biol Evol. 2010. PMID: 19808865 Free PMC article.
-
A likelihood framework to analyse phyletic patterns.Philos Trans R Soc Lond B Biol Sci. 2008 Dec 27;363(1512):3903-11. doi: 10.1098/rstb.2008.0177. Philos Trans R Soc Lond B Biol Sci. 2008. PMID: 18852099 Free PMC article.
-
CoPAP: Coevolution of presence-absence patterns.Nucleic Acids Res. 2013 Jul;41(Web Server issue):W232-7. doi: 10.1093/nar/gkt471. Epub 2013 Jun 8. Nucleic Acids Res. 2013. PMID: 23748951 Free PMC article.
-
Reconstructing Gene Gains and Losses with BadiRate.Methods Mol Biol. 2022;2569:213-232. doi: 10.1007/978-1-0716-2691-7_10. Methods Mol Biol. 2022. PMID: 36083450 Review.
Cited by
-
Machine learning enables prediction of metabolic system evolution in bacteria.Sci Adv. 2023 Jan 13;9(2):eadc9130. doi: 10.1126/sciadv.adc9130. Epub 2023 Jan 11. Sci Adv. 2023. PMID: 36630500 Free PMC article.
-
Mirage: estimation of ancestral gene-copy numbers by considering different evolutionary patterns among gene families.Bioinform Adv. 2021 Jul 30;1(1):vbab014. doi: 10.1093/bioadv/vbab014. eCollection 2021. Bioinform Adv. 2021. PMID: 36700099 Free PMC article.
-
Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer.Biol Direct. 2012 Dec 14;7:46. doi: 10.1186/1745-6150-7-46. Biol Direct. 2012. PMID: 23241446 Free PMC article.
-
FastML: a web server for probabilistic reconstruction of ancestral sequences.Nucleic Acids Res. 2012 Jul;40(Web Server issue):W580-4. doi: 10.1093/nar/gks498. Epub 2012 May 31. Nucleic Acids Res. 2012. PMID: 22661579 Free PMC article.
-
Supergroup C Wolbachia, mutualist symbionts of filarial nematodes, have a distinct genome structure.Open Biol. 2015 Dec;5(12):150099. doi: 10.1098/rsob.150099. Open Biol. 2015. PMID: 26631376 Free PMC article.
References
-
- Achtman M, Wagner M. Microbial diversity and the genetic nature of microbial species. Nat Rev Microbiol. 2008;6:431–440. - PubMed
-
- Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics. 2000;16:412–424. - PubMed
-
- Bollback J. Posterior mapping and posterior predictive distributions. In: Nielsen R, editor. Statistical methods in molecular evolution. New York: Springer; 2005. pp. 439–462.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
