Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Feb;32(2):542-54.
doi: 10.1093/molbev/msu318. Epub 2014 Nov 17.

A model of substitution trajectories in sequence space and long-term protein evolution

Affiliations

A model of substitution trajectories in sequence space and long-term protein evolution

Dinara R Usmanova et al. Mol Biol Evol. 2015 Feb.

Abstract

The nature of factors governing the tempo and mode of protein evolution is a fundamental issue in evolutionary biology. Specifically, whether or not interactions between different sites, or epistasis, are important in directing the course of evolution became one of the central questions. Several recent reports have scrutinized patterns of long-term protein evolution claiming them to be compatible only with an epistatic fitness landscape. However, these claims have not yet been substantiated with a formal model of protein evolution. Here, we formulate a simple covarion-like model of protein evolution focusing on the rate at which the fitness impact of amino acids at a site changes with time. We then apply the model to the data on convergent and divergent protein evolution to test whether or not the incorporation of epistatic interactions is necessary to explain the data. We find that convergent evolution cannot be explained without the incorporation of epistasis and the rate at which an amino acid state switches from being acceptable at a site to being deleterious is faster than the rate of amino acid substitution. Specifically, for proteins that have persisted in modern prokaryotic organisms since the last universal common ancestor for one amino acid substitution approximately ten amino acid states switch from being accessible to being deleterious, or vice versa. Thus, molecular evolution can only be perceived in the context of rapid turnover of which amino acids are available for evolution.

Keywords: epistasis; fitness landscape; molecular evolution.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.
Fig. 1.
Three categories of Markov chain models of protein evolution. The general time reversal models estimate the probability that a site is occupied by a specific nucleotide, Z. The probability of finding specific nucleotides at a site changes with time and the rate of change is described by a 4 × 4 matrix, R, because each of the four nucleotide can change into the other three nucleotides with a certain rate rij. The rij rates typically reflect the rate of mutation and, therefore, Z[t+τ]=Z[t]·eR·τ models the neutral rate of change of nucleotides across sites. As selection influences the rate of substitution in sites it is introduced as a parameter ω, with Z[t+τ]=Z[t]· eω·R·τ models. In that case ω > 1 reflects the action of positive selection and accelerates the rate of evolution and ω < 1 reflects negative selection slowing down the rate of change of Z. As the action of selection may be different in different sites, some models attempt to capture the resulting rate variation across sites by assigning a different ω to different sites. The covarion models reflect the possibility that the rate of evolution of a site is itself subject to change with time. They introduce extra parameters allowing for sites to switch among the different ω categories.
F<sc>ig</sc>. 2.
Fig. 2.
The fitness matrices of words encountered in the trajectory of substitutions WORD→GENE described by Maynard Smith (1970). The fitness matrix of a specific sequence reflects both the current (C) sequence, with the state C in the corresponding cell of the matrix, as well as the fitness impact of all possible single letter substitutions. For example, in the first word in the trajectory, “WORD” there are 16 available (A) substitutions, out of 100 total possible ones, that would lead to another word in English (having high fitness). All other 84 states are blocked (B), meaning that if such a substitution were to occur would not lead to a meaningful sequence of letters. A substitution that actually occurred in the trajectory is reflected by a bidirectional C↔A switch in two cells of the matrix. With every substitution the potential impact of other substitution also changes (changes between the current and the previous fitness matrix are shown in orange).
F<sc>ig</sc>. 3.
Fig. 3.
Switches between five states in the fitness matrix. The current amino acid state can switch into an available amino acid that is one nucleotide substitution away (C↔An), which reflects one amino acid substitution. With every C↔An switch γ amino acid states that were previously available to evolution become blocked (An/fBn/f) and vice versa, other γ amino acid states that were blocked become available (Bn/fAn/f). Furthermore, with every C↔An switch φ amino acid states that were previously in the mutational neighborhood become unreachable with one nucleotide mutation (AnAf or BnBf switches) and vice versa (AfAn or BfBn) switches. F never changes because it reflects those amino acid states that can never be found in a protein sequence.
F<sc>ig</sc>. 4.
Fig. 4.
Numerical evaluation of components of Z[t]. The initial condition is Z[0] = (1,0,0,0,0) with constants α = 0.06, γ = 5, m = 7.3.
F<sc>ig</sc>. 5.
Fig. 5.
Relative rate of protein evolution. Kc/K4 is shown by • and Kd/K4 by □ (from Povolotskaya and Kondrashov 2010). We fit the observed Kc/K4 to that calculated by the solution of equation (4) varying γ as a parameter. The optimal fit was found for γ ∼ 5 (thick solid line). Two near fits for γ = 4 and γ = 6 are depicted with thin solid lines. Thick dashed line shows Kc/K4 for γ significantly higher and thick dotted line for significantly lower values of γ.
F<sc>ig</sc>. 6.
Fig. 6.
Observed and predicted relative rates of sequence divergence. The observed values of Nt/Na shown by • and the predicted fit with our model using optimal parameters shown with □.
F<sc>ig</sc>. 7.
Fig. 7.
Distribution of estimated parameters for 119 COGs. The distribution of the number of nonforbidden amino acids per site (m), proportion of available amino acids over all available and blocked states (α), and the rate of A↔B switches (γ) are shown.
F<sc>ig</sc>. 8.
Fig. 8.
Estimating amino acid usage. Usage calculated as the most probable number of amino acids to be observed at a site as a function of the number of accumulated substitutions per site. The solid line represents the number of nonforbidden amino acids at a site (m).

Similar articles

Cited by

References

    1. Aita T, Ota M, Husimi Y. An in silico exploration of the neutral network in protein sequence space. J Theor Biol. 2003;221:599–613. - PubMed
    1. Andolfatto P. Adaptive evolution of non-coding DNA in Drosophila. Nature. 2005;437:1149–1152. - PubMed
    1. Bazykin GA, Kondrashov FA, Brudno M, Poliakov A, Dubchak I, Kondrashov AS. Extensive parallelism in protein evolution. Biol Direct. 2007;2:20. - PMC - PubMed
    1. Bollback JP, Huelsenbeck JP. Parallel genetic evolution within and between bacteriophage species of varying degrees of divergence. Genetics. 2009;181:225–234. - PMC - PubMed
    1. Breen MS, Kemena C, Vlasov PK, Notredame C, Kondrashov FA. Epistasis as the primary factor in molecular evolution. Nature. 2012;490:535–538. - PubMed

Publication types

LinkOut - more resources