ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference
- PMID: 33264284
- PMCID: PMC7735675
- DOI: 10.1371/journal.pbio.3001007
ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference
Abstract
Highly divergent sites in multiple sequence alignments (MSAs), which can stem from erroneous inference of homology and saturation of substitutions, are thought to negatively impact phylogenetic inference. Thus, several different trimming strategies have been developed for identifying and removing these sites prior to phylogenetic inference. However, a recent study reported that doing so can worsen inference, underscoring the need for alternative alignment trimming strategies. Here, we introduce ClipKIT, an alignment trimming software that, rather than identifying and removing putatively phylogenetically uninformative sites, instead aims to identify and retain parsimony-informative sites, which are known to be phylogenetically informative. To test the efficacy of ClipKIT, we examined the accuracy and support of phylogenies inferred from 14 different alignment trimming strategies, including those implemented in ClipKIT, across nearly 140,000 alignments from a broad sampling of evolutionary histories. Phylogenies inferred from ClipKIT-trimmed alignments are accurate, robust, and time saving. Furthermore, ClipKIT consistently outperformed other trimming methods across diverse datasets, suggesting that strategies based on identifying and retaining parsimony-informative sites provide a robust framework for alignment trimming.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
Similar articles
-
BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments.BMC Evol Biol. 2010 Jul 13;10:210. doi: 10.1186/1471-2148-10-210. BMC Evol Biol. 2010. PMID: 20626897 Free PMC article.
-
Bayesian coestimation of phylogeny and sequence alignment.BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83. BMC Bioinformatics. 2005. PMID: 15804354 Free PMC article.
-
Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.Syst Biol. 2015 Sep;64(5):778-91. doi: 10.1093/sysbio/syv033. Epub 2015 Jun 1. Syst Biol. 2015. PMID: 26031838 Free PMC article.
-
Tree disagreement: measuring and testing incongruence in phylogenies.J Biomed Inform. 2006 Feb;39(1):86-102. doi: 10.1016/j.jbi.2005.08.008. Epub 2005 Sep 28. J Biomed Inform. 2006. PMID: 16243006 Review.
-
Multiple sequence alignment: in pursuit of homologous DNA positions.Genome Res. 2007 Feb;17(2):127-35. doi: 10.1101/gr.5232407. Genome Res. 2007. PMID: 17272647 Review.
Cited by
-
Challenges in Assembling the Dated Tree of Life.Genome Biol Evol. 2024 Oct 9;16(10):evae229. doi: 10.1093/gbe/evae229. Genome Biol Evol. 2024. PMID: 39475308 Free PMC article.
-
Genomic transfers help to decipher the ancient evolution of filoviruses and interactions with vertebrate hosts.PLoS Pathog. 2024 Sep 3;20(9):e1011864. doi: 10.1371/journal.ppat.1011864. eCollection 2024 Sep. PLoS Pathog. 2024. PMID: 39226335 Free PMC article.
-
Revisiting the four Hexapoda classes: Protura as the sister group to all other hexapods.Proc Natl Acad Sci U S A. 2024 Sep 24;121(39):e2408775121. doi: 10.1073/pnas.2408775121. Epub 2024 Sep 19. Proc Natl Acad Sci U S A. 2024. PMID: 39298489
-
New chromosome-scale genomes provide insights into marine adaptations of sea snakes (Hydrophis: Elapidae).BMC Biol. 2023 Dec 8;21(1):284. doi: 10.1186/s12915-023-01772-2. BMC Biol. 2023. PMID: 38066641 Free PMC article.
-
Identification and characterization of andalusicin: N-terminally dimethylated class III lantibiotic from Bacillus thuringiensis sv. andalousiensis.iScience. 2021 Apr 29;24(5):102480. doi: 10.1016/j.isci.2021.102480. eCollection 2021 May 21. iScience. 2021. PMID: 34113822 Free PMC article.
References
Publication types
MeSH terms
Associated data
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
