Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 May;32(5):1365-71.
doi: 10.1093/molbev/msv035. Epub 2015 Feb 19.

Gene-wide identification of episodic selection

Affiliations

Gene-wide identification of episodic selection

Ben Murrell et al. Mol Biol Evol. 2015 May.

Abstract

We present BUSTED, a new approach to identifying gene-wide evidence of episodic positive selection, where the non-synonymous substitution rate is transiently greater than the synonymous rate. BUSTED can be used either on an entire phylogeny (without requiring an a priori hypothesis regarding which branches are under positive selection) or on a pre-specified subset of foreground lineages (if a suitable a priori hypothesis is available). Selection is modeled as varying stochastically over branches and sites, and we propose a computationally inexpensive evidence metric for identifying sites subject to episodic positive selection on any foreground branches. We compare BUSTED with existing models on simulated and empirical data. An implementation is available on www.datamonkey.org/busted, with a widget allowing the interactive specification of foreground branches.

Keywords: branch-site model; episodic selection; evolutionary model; random effects model.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.
Fig. 1.
Depiction of our online widget used to interactively specify foreground branches, where such a priori information is available. This example is from a subset of the HIV-1 RT data set (Murrell, de Oliveira, et al. 2012), where terminal branches leading to samples taken after antiretroviral therapy are selected as foreground. Results for this analysis can be seen in table 2. The widget facilitates the annotation of large trees like this one (only a small subtree is shown for legibility), for example, by labeling all branches using a text pattern (e.g., here all branches of interest start with a “T”), and allowing automatic labeling of internal branches (e.g., using parsimony labeling).
F<sc>ig</sc>. 2.
Fig. 2.
Statistical performance of BUSTED and fitmodel. (A) Power of BUSTED as a function of ω3, for various nominal significance levels. The weight assigned to ω3 by the model was 0.1. See text for other simulation parameters. (B) Type 1 error rate as a function of the nominal significance level (null data), showing that BUSTED is conservative and fitmodel is anticonservative. (C) Power of BUSTED and fitmodel as a function of simulated selective strength (ω3), using test significance levels set to achieve 0.05 Type I error rate on null simulations (fitmodel was anticonservative).
F<sc>ig</sc>. 3.
Fig. 3.
Correlates of signal for episodic selection in the selectome data sets. Each panel depicts the fraction of all alignments reported by BUSTED as positively selected (at P0.05), as a function of (A) the length of the alignment (codons), censored at 2000 due to sparse sampling afterwards, (B) the number of sequences, (C) the total tree length (expected number of substitutions per codon site), (D) the maximum-likelihood estimate of the ω3 parameter, used as a proxy for the “strength” of selection. Plot points were chosen through an adaptive binning scheme, with each point representing at least 100 data sets. Lowess smoothing polynomials (smoothing span 0.25) are shown in solid light gray.

Similar articles

Cited by

References

    1. Anisimova M, Kosiol C. Investigating protein-coding sequence evolution with probabilistic codon substitution models. Mol Biol Evol. 2009;26:255–271. - PubMed
    1. Anisimova M, Yang Z. Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. Mol Biol Evol. 2007;24:1219–1228. - PubMed
    1. De Maio N, Holmes I, Schlötterer C, Kosiol C. Estimating empirical codon hidden markov models. Mol Biol Evol. 2013;30:725–736. - PMC - PubMed
    1. Delport W, Scheffler K, Botha G, Gravenor MB, Muse SV, Kosakovsky Pond SL. Codontest: modeling amino acid substitution preferences in coding sequences. PLoS Comput Biol. 2010;6:e1000885. - PMC - PubMed
    1. Delport W, Scheffler K, Seoighe C. Models of coding sequence evolution. Brief Bioinformatics. 2009;10:97–109. - PMC - PubMed

Publication types