Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 2;18(12):e3001007.
doi: 10.1371/journal.pbio.3001007. eCollection 2020 Dec.

ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference

Affiliations

ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference

Jacob L Steenwyk et al. PLoS Biol. .

Abstract

Highly divergent sites in multiple sequence alignments (MSAs), which can stem from erroneous inference of homology and saturation of substitutions, are thought to negatively impact phylogenetic inference. Thus, several different trimming strategies have been developed for identifying and removing these sites prior to phylogenetic inference. However, a recent study reported that doing so can worsen inference, underscoring the need for alternative alignment trimming strategies. Here, we introduce ClipKIT, an alignment trimming software that, rather than identifying and removing putatively phylogenetically uninformative sites, instead aims to identify and retain parsimony-informative sites, which are known to be phylogenetically informative. To test the efficacy of ClipKIT, we examined the accuracy and support of phylogenies inferred from 14 different alignment trimming strategies, including those implemented in ClipKIT, across nearly 140,000 alignments from a broad sampling of evolutionary histories. Phylogenies inferred from ClipKIT-trimmed alignments are accurate, robust, and time saving. Furthermore, ClipKIT consistently outperformed other trimming methods across diverse datasets, suggesting that strategies based on identifying and retaining parsimony-informative sites provide a robust framework for alignment trimming.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The 14 alignment trimming strategies tested differ in resulting MSAs and metrics of phylogenetic tree accuracy and support.
Principal component analysis of alignment length, nRF, and ABS values across the 14 MSA trimming strategies for 4 empirical datasets (A) and 4 simulated datasets (B). Insets of scree plots depict the percentage of variation explained (y-axis) for the first 5 dimensions (x-axis). Data were scaled prior to conducting principal component analysis. Note that the BMGE 0.3 and Gblocks strategies are not represented in Fig 1B because they frequently removed entire alignments and were therefore removed from the analysis of simulated sequenced. Data used to generate this figure can be found on figshare (doi: 10.6084/m9.figshare.12401618). ABS, average bipartition support; BMGE, Block Mapping and Gathering with Entropy; MSA, multiple sequence alignment; nRF, normalized Robinson–Foulds.
Fig 2
Fig 2. ClipKIT is a top-performing software for trimming MSAs.
Desirability-based integration of accuracy and support metrics per MSA facilitated the comparison of relative performance of the 14 different MSA trimming strategies for empirical (A–D) and simulated (E–H) datasets. Examination of performance for individual datasets and average performance across empirical (I) and simulated (J) datasets revealed that ClipKIT is a top-performing software. MSA trimming strategies are ordered along the x-axis from the highest-performing strategy to the lowest-performing one according to average desirability–based rank. Boxplots embedded in violin plots have upper, middle, and lower hinges that represent the first, second, and third quartiles. Whiskers extend to 1.5 times the interquartile range. Data used to generate this figure can be found on figshare (doi: 10.6084/m9.figshare.12401618). AA, amino acid; BMGE, Block Mapping and Gathering with Entropy; MSA, multiple sequence alignment; NT, nucleotide.

Similar articles

Cited by

References

    1. Talavera G, Castresana J. Improvement of Phylogenies after Removing Divergent and Ambiguously Aligned Blocks from Protein Sequence Alignments. Syst Biol. 2007;56: 564–577. 10.1080/10635150701472164 - DOI - PubMed
    1. Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–3. 10.1093/bioinformatics/btp348 - DOI - PMC - PubMed
    1. Criscuolo A, Gribaldo S. BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol. 2010;10:210 10.1186/1471-2148-10-210 - DOI - PMC - PubMed
    1. Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, et al. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science. 2014;346:1320–31. 10.1126/science.1253451 - DOI - PMC - PubMed
    1. Shen X-X, Steenwyk JL, Labella AL, Opulente DA, Zhou X, Kominek J, et al. Genome-scale phylogeny and contrasting modes of genome evolution in the fungal phylum Ascomycota. bioRxiv. 2020. 10.1126/sciadv.abd0079 - DOI - PMC - PubMed

Publication types