Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 10;15(4):e1008079.
doi: 10.1371/journal.pgen.1008079. eCollection 2019 Apr.

An experimental assay of the interactions of amino acids from orthologous sequences shaping a complex fitness landscape

Affiliations

An experimental assay of the interactions of amino acids from orthologous sequences shaping a complex fitness landscape

Victoria O Pokusaeva et al. PLoS Genet. .

Abstract

Characterizing the fitness landscape, a representation of fitness for a large set of genotypes, is key to understanding how genetic information is interpreted to create functional organisms. Here we determined the evolutionarily-relevant segment of the fitness landscape of His3, a gene coding for an enzyme in the histidine synthesis pathway, focusing on combinations of amino acid states found at orthologous sites of extant species. Just 15% of amino acids found in yeast His3 orthologues were always neutral while the impact on fitness of the remaining 85% depended on the genetic background. Furthermore, at 67% of sites, amino acid replacements were under sign epistasis, having both strongly positive and negative effect in different genetic backgrounds. 46% of sites were under reciprocal sign epistasis. The fitness impact of amino acid replacements was influenced by only a few genetic backgrounds but involved interaction of multiple sites, shaping a rugged fitness landscape in which many of the shortest paths between highly fit genotypes are inaccessible.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Combinatorial approach to the study of fitness landscapes.
A fitness landscape is the representation of fitness for all possible genotypes composed of a specific set of loci. a, Following Fig 1 from Sewall Wright ref. [6] consider the genotype space consisting of 5 loci, each with two allele states (lower and uppercase letters). The entire genotype space is 5-dimensional consisting of 25 genotypes. Given two genotypes found in extant species (abCde and ABCdE in this example), surveying combinations of extant alleles substantially reduces the dimensionality of the genotype space, concomitantly reducing the number of genotypes to assay. The surveyed area (blue cube) considers all combinations of allele substitutions that have occurred in the course of evolution between the two sequences (red line), avoiding the sampling of combinations with less relevance to the evolutionary trajectory (black lines). b, Given the entire multidimensional genotype space (black circle) our approach considers a multidimensional subspace consisting of the combinatorial set of amino acid states from extant species. The blue line represents the yeast phylogeny and the surrounding blue space represents a multidimensional set of combinations of extant amino acids of the sequence under consideration, one His3 gene segment in our study. By contrast, random mutagenesis studies consider only a local segment of the genotype space surrounding a specific genotype (green circle). с, A multiple alignment of orthologous sequences of His3 for segment 2 for which we incorporated almost all extant amino acid states from 21 yeast species (blue bars) and 10-100% extant states from a set of 396 orthologues (grey bars). d, The predicted structure of His3p with amino acid residues that were substituted in our library.
Fig 2
Fig 2. Visual representations of the fitness landscape.
a, The fitness landscape for all assayed genotypes in segment 7. Nodes represent unique amino acid sequences with edges connecting those separated by a single amino acid replacements. Colour saturation represents the minimum fitness of the two connected nodes. b, For segment 7, fitness of ancestral and extant nodes and genotypes one amino acid replacement away from the nodes in the background of S. cerevisiae gene on the yeast phylogeny (black lines), are shown in colour ranging from grey (lowest fitness) to blue (highest fitness). The abbreviations represent the following species: Scer: Saccharomyces cerevisiae, Soct: Schizosaccharomyces octosporus, Sbay: Saccharomyces bayanus, Cgla: Candida glabrata, Scas: Saccharomyces (Naumovozyma) castellii, Kwal: Kluyveromyces waltii, Klac: Kluyveromyces lactis, Sklu: Saccharomyces (Lachancea) kluyveri, Agos: Ashbya gossypii, Clus: Clavispora (Candida) lusitaniae, Dhan: Debaryomyces hansenii (Candida famata), Cgui: Candida (Pichia) guilliermondii, Ctro: Candida tropicalis, Calb: Candida albicans, Cpar: Candida parapsilosis, Lelo: Lodderomyces elongisporus, Ylip: Yarrowia lipolytica, Anid: Aspergillus nidulans, Ncra: Neurospora crassa, Sjap: Schizosaccharomyces japonicus, Spom: Schizosaccharomyces pombe.
Fig 3
Fig 3. Fitness distributions.
a, the distribution of fitness for genotypes composed of combination of extant amino acid states (green) and non-extant amino acid states (purple) at the same positions. b, The fraction of unfit genotypes per segment among genotypes consisting entirely from extant amino acid states (green) and those incorporating non-extant amino acid states (purple). c, Amino acid states can be found in a different number of fit backgrounds. In this figure, we show the number of amino acid states that were found in the lowest number of fit genetic backgrounds. Some non-extant amino acid states were found in just a few genotypes with high fitness. The most infrequent extant amino acid state was found in at least 300 different genotypes that had high fitness. d, The percent of backgrounds in which a specific amino acid replacement is neutral (white), beneficial (dark blue) or deleterious (dark grey). The region marked in green shows amino acid replacements that never have large effects (> 0.4) on fitness. Beneficial and deleterious effects are shown only if the frequency for a given amino acid replacement was higher than the false discovery rate (S2 Supporting Information). Data from segment 9 were excluded for this figure.
Fig 4
Fig 4. Schematic representation of a deep learning approach able to fit any arbitrary fitness function.
a, each genotype was encoded as a binary vector (x). During training, each of the amino acid replacements was assigned a coefficient (ci), comprising a vector of coefficients (c). The multiplication of these two vectors is the fitness potential of the genotype. After going through three layers, each with a sigmoid activation function, the predicted fitness is obtained. b, The fit of a mock fitness function (yellow) and the fit achieved by our neutral network (black). The mock fitness function was created by generating a set of amino acid states with defined coefficients (effects on fitness potential), which were then combined to generate genotypes across a range of fitness potential values.
Fig 5
Fig 5. Epistasis and the His3 fitness landscape for segments 2, 5 and 7.
a, Fitness as a function of a single fitness potential (black curve, the fitness of individual genotypes is orange). b, A network depiction of sign epistasis between amino acid replacements. Colour coded sites with reciprocal sign epistasis (black lines) and unidirectional interactions (grey arrows) are shown. c, Genotypes containing replacements with a higher number of sign epistatic interactions are less likely to be fit by the threshold function of the fitness potential. d, Increasing the number of neurons in the first layers of the neural network, which is equivalent to increasing the number of underlying fitness potentials, leads to more accurate models for segments with detected sign epistasis. Each dot corresponds to an independent optimization of model parameters. e, Fitness as a function of two fitness potentials (black dots, measured fitness is depicted in orange).
Fig 6
Fig 6. Sign epistasis.
a, Amino acid replacement C->S at site 141 in segment 2 more frequently has a positive effect on fitness in the background of T at site 143, a negative effect in the background of 143I and is equally likely to be strongly deleterious or strongly beneficial in the background of 143V. b, Predicted change in folding free energy using Rosetta [64] following a C141S replacement in all genetic backgrounds with an I or T at 143 and that are closer than four mutations away from S. cerevisiae. c, the fraction of genotypes in which the amino acid replacement under sign epistasis has the less frequent effect on fitness.
Fig 7
Fig 7. Analysis of evolutionary pathway accessibility.
a, A threshold fitness potential function can lead to some paths being inaccessible between two genotypes of high fitness (abe, ABE) if the joint contribution of several alleles to the fitness potential (abE, aBE) leads to the fitness potential below the threshold. b, The fraction of unfit intermediate genotypes between two fit genotypes as a function of their average fitness potential. c, The grey area represents all genotypes in segment 7. When two fit genotypes (red dots) have high fitness potential, many paths between them will be accessible because many intermediate genotypes will also have high fitness potential and fitness (blue dots). d, Two fit genotypes in our dataset with intermediate genotypes between them. Intermediate genotypes are those that can be found between the two genotypes when making amino acid replacements from one genotype to the other. By this definition, the intermediate genotypes are located on the shortest paths connecting the two genotypes, with shortest paths are those that include only forward replacements. Accessible paths are those that incorporate only fit genotypes. The fraction of accessible shortest paths between two fit genotypes from data (orange) shows a modest decline as a function of Hamming distance between the two genotypes. The fraction of accessible shortest paths between two fit genotypes declines more rapidly when the same number of unfit genotypes as observed in real data are randomly drawn from intermediate genotypes (grey), demonstrating that in real data unfit genotypes are clustered in sequence space. Error bars are standard deviation. e, Genotypes can be represented by a graph with edges connecting genotypes if they can be connected by one amino acid replacement. We calculate the degree of connectivity (number of edges for each node) when all genotypes one replacement away are connected (blue), only unfit (fitness = 0) genotypes are connected (orange) and when unfit genotypes are connected with unfit genotypes selected at random keeping the same number of unfit genotypes as in our data (grey). With the exception of segment 9, in all segments the connectivity of unfit genotypes is higher than random, confirming that unfit genotypes are clustered in sequence space.

Similar articles

Cited by

References

    1. Dean A. M. & Thornton J. W. Mechanistic approaches to the study of evolution: the functional synthesis. Nature Rev. Genet. 8, 675–688 (2007). 10.1038/nrg2160 - DOI - PMC - PubMed
    1. Kogenaru M., de Vos M. G. J., & Tans S. J. Revealing evolutionary pathways by fitness landscape reconstruction. Crit. Rev. Biochem. Mol. Biol. 44, 169–174 (2009). 10.1080/10409230903039658 - DOI - PubMed
    1. Mackay T. F. C. Epistasis and quantitative traits: using model organisms to study gene-gene interactions. Nature Rev. Genet. 15, 22–33 (2014). 10.1038/nrg3627 - DOI - PMC - PubMed
    1. de Visser J. A. G. M. & Krug J. Empirical fitness landscapes and the predictability of evolution. Nature Rev. Genet. 15, 480–490 (2014). 10.1038/nrg3744 - DOI - PubMed
    1. de Visser J. A. G. M., Cooper T. F., & Elena S. F. The causes of epistasis. Proc. Biol. Sci. 278, 3617–3624 (2011). 10.1098/rspb.2011.1537 - DOI - PMC - PubMed

Publication types

MeSH terms