Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 20;376(6595):823-830.
doi: 10.1126/science.abn6895. Epub 2022 May 19.

Epistatic drift causes gradual decay of predictability in protein evolution

Affiliations

Epistatic drift causes gradual decay of predictability in protein evolution

Yeonwoo Park et al. Science. .

Abstract

Epistatic interactions can make the outcomes of evolution unpredictable, but no comprehensive data are available on the extent and temporal dynamics of changes in the effects of mutations as protein sequences evolve. Here, we use phylogenetic deep mutational scanning to measure the functional effect of every possible amino acid mutation in a series of ancestral and extant steroid receptor DNA binding domains. Across 700 million years of evolution, epistatic interactions caused the effects of most mutations to become decorrelated from their initial effects and their windows of evolutionary accessibility to open and close transiently. Most effects changed gradually and without bias at rates that were largely constant across time, indicating a neutral process caused by many weak epistatic interactions. Our findings show that protein sequences drift inexorably into contingency and unpredictability, but that the process is statistically predictable, given sufficient phylogenetic and experimental data.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare that they have no competing interests.

Figures

Fig. 1.
Fig. 1.. Phylogenetic deep mutational scanning.
(A) Phylogeny of the DNA-binding domain (DBD) of steroid and related receptors. Circles, DBDs characterized here by deep mutational scanning. SRs, steroid receptors; ERs, estrogen receptors; kSRs, ketosteroid receptors—including glucocorticoid receptor (GR). Complete phylogeny in fig. S1. (B) Phylogenetic relations among the 9 characterized DBDs. Colors distinguish trajectories to C. teleta SR and human GR. Sequence divergence (percent) and number of sequence differences (parentheses) in each interval are shown. (C) Sort-seq assay for DBD activity. For each DBD, a library containing all possible single-amino acid mutations was generated using microarray-based synthesis and cassette assembly (fig. S3) and cloned into yeast carrying a GFP reporter; ERE, estrogen response element. Activity of each mutant was measured by sorting the library of cells into fluorescence bins, inferring the distribution of each mutant among bins by sequencing, and calculating the mean log10-GFP fluorescence (F). Hypothetical distributions for 3 variants with high, medium, and low F are shown. (D) Tracing epistatic change in mutational effect across the phylogeny using example mutation S9P. The effect on each DBD’s activity (points) was quantified as the change in mean log10-GFP fluorescence (ΔF). Horizontal axis, each DBD in order on the phylogeny, positioned by sequence divergence and colored by trajectory. ΔΔF, change in the mutation’s effect between a pair of DBDs, caused by epistatic interactions with intervening substitutions. Error bars, SEM (n = 3). Dashed lines, upper and lower measurement bounds.
Fig. 2.
Fig. 2.. Pervasive random changes in the effects of mutations.
(A) Maximum and minimum effect of each mutation (points) across the 9 DBDs, colored according to the stacked column at right, which shows the proportion of mutations in four categories: pink, significant effect of DBD background on ΔF and the sign of ΔF different between the maximum and minimum; red, significant effect of background but no sign difference; black, no significant effect of background and ΔF within measurement limits; blue, ΔF at the lower bound of measurement in all 9 DBDs. Significance was evaluated by Welch’s ANOVA, Benjamini-Hochberg FDR ≤ 0.1. (B) Number of mutations in each phylogenetic interval that changed significantly in ΔF (t-test between parent and child node, FDR ≤ 0.1), plotted versus the number of amino acids that diverged in the interval. (C) Distribution of epistatic change in the effect of every mutation during every phylogenetic interval (ΔΔF). Dark grey, ΔΔF significantly different from 0. Mutations always at the lower bound of measurement were excluded. (D) Fraction of mutations in each DBD with ΔF < 0 (circles) or ΔF at the lower bound of measurement (triangles). (E) Distribution of ΔΔF of all mutations for the protostome-annelid interval or the AncSR1-human GR interval. The variance of the distribution (Var) quantifies the total epistatic change in the effects of all mutations during an interval. d, sequence divergence. (F) Total epistatic change as a function of sequence divergence across the phylogeny. Red dots, each of the 8 independent phylogenetic intervals between characterized DBDs; black, all composite intervals. Dashed lines, best-fit power function for all (black) or the 8 independent intervals (red).
Fig. 3.
Fig. 3.. Effects of most mutations changed gradually at characteristic rates.
(A) Models of the tempo of epistatic change. Null model, the amount of change in a mutation’s effect per substitution in an interval (unit ΔΔF) is randomly drawn from a normal distribution centered at 0; the variance is the same among intervals, so the mutation’s effect changes gradually at a constant expected rate as substitutions accrue. Alternative model, the variance may differ among phylogenetic intervals (blue vs. red), leading to episodic changes in a mutation’s effect. (B) Distribution of the p-value of the likelihood-ratio test (LRT) comparing gradual and episodic models for each mutation. Darker grey, mutations for which the gradual model is rejected (FDR ≤ 0.2). Mutations always at the lower bound of measurement were excluded from this analysis. (C) Distribution of the normalized amount of epistatic change in each interval, for all mutations better fit by the gradual model (left) or the episodic model (right). Normalized ΔΔF, ΔΔF of a mutation in an interval divided by σd1/2, where σ is that mutation’s average rate of epistatic change and d is the length of the interval. Gray columns, observed data; red line, distribution expected under the null model. (D) Trajectory of changes in the effect of two example mutations that are better fit by the gradual model (left) or episodic model (right); in each category, one evolves rapidly and the other slowly. Each mutation’s p-value in the LRT is shown; gray box, normalized changes in the mutation’s effect across each of the 8 intervals. (E) Phylogenetic cross-validation. In the example shown, ΔΔF in interval 1 is predicted from the average rate of epistatic change measured across intervals 2–8 (grey box). (F) Distribution of observed ΔΔF during interval 1 (gray columns) and predicted by cross validation (red line). Mutations were grouped into deciles by their rate of epistatic change across intervals 2–8; predictions are shown for deciles with the slowest, median, or fastest rates. (G) Mutations’ relative rates of epistatic change are consistent across phylogenetic intervals. Points, deciles of mutations grouped by the predicted rate of epistatic change; observed epistatic change in an plotted against that predicted by cross-validation. r, Pearson’s correlation coefficient; ρ, Spearman’s rank correlation; dashed line, linear regression. (H) Among-interval differences in average rate of epistatic change. Each column shows the mean rate of epistatic change of all mutations in one phylogenetic interval, normalized so that the mean across all intervals equals 1. Error bars, estimated standard deviation obtained by bootstrap-resampling of mutations. Asterisks, intervals immediately following gene duplication. (I) Inferring the architecture of epistatic interactions between substitutions (black boxes) and a focal mutation (star) from phylogenetic DMS. Left, gradual changes in the mutation’s effect during evolution arise if many substitutions act as epistatic modifiers (arrows, with thickness showing the strength of interaction), yielding a normal distribution of ΔΔF per substitution. Right, episodic changes arise from interactions with only a few substitutions, yielding a distribution heavy at zero and the tails. In either case, strong vs. weak interactions cause rapid (top) vs. slow (bottom) epistatic change. The fraction of all mutations in each category in our experiments is shown.
Fig. 4.
Fig. 4.. Memory length of mutations and the timescale of historical contingency.
(A-D) Measuring the memory length of mutations. (A) Mutations were grouped into deciles by their rate of epistatic change (σ, expected standard deviation of ΔΔF per 1% sequence divergence). (B) The effects of mutations in each decile were compared between every pair of DBDs; shown are comparisons between AncSR and human GR (42% divergence). (C) The squared Pearson correlation coefficient (r2) for each DBD pair was plotted against the sequence divergence of that pair. Dotted line, best-fit exponential decay curve; memory half-life, sequence divergence at which r2 = 0.5. (D) Relationship between the rate of epistatic change and memory half-life inferred by fitting a power function (red) to the mean rate of epistatic change and memory half-life of the deciles. This relationship was used to calculate the memory half-life of each mutation from its rate of epistatic change. E) Distribution of memory half-life among mutations. Mutations were classified into short-, medium-, and long-memory categories using cutoffs of 50% and 200% divergence. (F) Comparing the effects of mutations between AncSR and human GR (42% divergence) for each memory category. Red dots, mutations with significant difference in ΔF (t-test, FDR ≤ 0.1); black, no significant difference.
Fig. 5.
Fig. 5.. Impact on sequence evolution of memory length and initial functional effect.
(A) The effect of a substitution at the time it fixed during history was calculated as the mean of ΔFs measured by DMS in the nearest ancestral and descendant nodes. (B) Comparing the effects of the 79 substitutions that occurred along the phylogenetic trajectories we characterized to the effects of all possible mutations. Substitutions are 29-fold enriched for ΔF ≥ −0.2 compared to mutations, providing an estimate of the threshold of accessibility during DBD evolution. (C) Distribution of the initial effect (ΔF on AncSR) of 275 substitutions that fixed between AncSR and any extant DBD in our phylogeny. Distributions are shown by memory half-life category. Enrichment of substitutions with ΔF ≥ −0.2 relative to mutations is shown. (D) Left, proportion of initially accessible mutations (ΔFAncSR ≥ −0.2) that become inaccessible in at least one descendant DBD. Right, proportion of initially inaccessible mutations that become accessible in at least one descendant DBD. (E) Distribution of the number of characterized DBDs in which each mutation is accessible (ΔF ≥ −0.2), classified by memory-length category. The percentage of mutations that were accessible in some but not all DBDs is shown.
Fig. 6.
Fig. 6.. Variation of memory half-life of mutations among and within sites.
(A) Distribution of memory half-life among sites. Each line shows the range of memory half-life of all mutations at one site in the DBD sequence. (B-C) Predicting the memory half-life of a mutation (points) by the median memory half-life of all possible mutations at the same site (B) or by the median of mutations of the same type (between the same wild type and mutant amino acid) at all sites (C). Dashed line, linear regression. (D-E) Effect of number of DBDs characterized by DMS on estimates of rate of epistatic change. The rate of epistatic change of every mutation was estimated using a subset of the 9 DMS experiments; the relationship between the estimated rate from each subset to that estimated from all 9 experiments was analyzed by linear regression. The graphs show the distribution of correlation coefficient (D) and best-fit regression slope (E) across every possible subset of a given size.

Similar articles

Cited by

References

    1. Starr TN, Thornton JW, Epistasis in protein evolution. Protein science 25, 1204–1218 (2016). - PMC - PubMed
    1. Blount ZD, Lenski RE, Losos JB, Contingency and determinism in evolution: Replaying life’s tape. Science 362, eaam5979 (2018). - PubMed
    1. Melamed D, Young DL, Gamble CE, Miller CR, Fields S, Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein. RNA 19, 1537–1551 (2013). - PMC - PubMed
    1. Olson CA, Wu NC, Sun R, A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain. Curr Biol 24, 2643–2651 (2014). - PMC - PubMed
    1. Podgornaia AI, Laub MT, Pervasive degeneracy and epistasis in a protein-protein interface. Science 347, 673–677 (2015). - PubMed