Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
, 12 (9), 1721-1742
eCollection

Recent Insights Into the Genotype-Phenotype Relationship From Massively Parallel Genetic Assays

Affiliations
Review

Recent Insights Into the Genotype-Phenotype Relationship From Massively Parallel Genetic Assays

Harry Kemble et al. Evol Appl.

Abstract

With the molecular revolution in Biology, a mechanistic understanding of the genotype-phenotype relationship became possible. Recently, advances in DNA synthesis and sequencing have enabled the development of deep mutational scanning assays, capable of scoring comprehensive libraries of genotypes for fitness and a variety of phenotypes in massively parallel fashion. The resulting empirical genotype-fitness maps pave the way to predictive models, potentially accelerating our ability to anticipate the behaviour of pathogen and cancerous cell populations from sequencing data. Besides from cellular fitness, phenotypes of direct application in industry (e.g. enzyme activity) and medicine (e.g. antibody binding) can be quantified and even selected directly by these assays. This review discusses the technological basis of and recent developments in massively parallel genetics, along with the trends it is uncovering in the genotype-phenotype relationship (distribution of mutation effects, epistasis), their possible mechanistic bases and future directions for advancing towards the goal of predictive genetics.

Keywords: distribution of fitness effects; epistasis; fitness landscapes; genotype–phenotype maps; high‐throughput genetics; phenotypic models.

Figures

Figure 1
Figure 1
Mutant library types and massively parallel sequencing‐resolved assays for high‐throughput genotype–phenotype mapping. Bulk mutagenesis is used to construct an in vitro or in vivo genotype library, with the phenotype(s) of interest associated with the genotype either by “display” or encapsulation. Phenotypic measurements are then linked to genotypes by deep‐sequencing of the library before and after some selection procedure. Selection procedures include binding to a target for direct binding phenotypes, particle sorting for any optically assayable phenotype (e.g. fluorescence or cell dimensions) and simple propagation under selective conditions for competitive fitness. Filled rectangles (straight or curved): nucleic acids; filled circles: proteins. Illustrated library examples are named in bold
Figure 2
Figure 2
A sample of experimentally characterized distributions of mutational effects (DMEs) in various proteins, all showing at least 2 modes. (a) Distribution of fluorescence intensities resulting from single mutations across the length of a green fluorescent protein (blue). [Reprinted by permission from Springer Nature: Nature, Local fitness landscape of the green fluorescent protein (Sarkisyan et al., 2016), Copyright 2016]. (b) Distributions of yeast growth rate effects of all single point mutations in a 9‐amino‐acid region of 8 variants of a native chaperone protein (the wild‐type (black) and 7 single‐mutant variants). [Reprinted by permission of the Society for Molecular Biology and Evolution: Molecular Biology and Evolution, 32(1), 232, A systematic survey of an intragenic epistatic landscape (Bank et al., 2015)]. (c) Distribution of yeast fitness effects of all single point mutations across the length of native ubiquitin. [Reprinted from Journal of Molecular Biology, 425(8), 1,366, Analyses of the effects of all ubiquitin point mutants on yeast growth rate (Roscoe et al., 2013). Copyright (2013), with permission from Elsevier]. (d) Distributions of bacterial fitness effects of all single point mutations across the length of a non‐native metabolic enzyme, on which the host strain has been made dependent for nitrogen supply, in the presence of 3 different amide substrates. [Reprinted from (Wrenbeck et al., 2017), licensed under CC BY 4.0]. (e) Distributions of yeast fitness effects of single mutations in the β‐barrel core region of 3 phylogenetically divergent orthologues of a metabolic enzyme, on which the host strain has been made dependent for tryptophan biosynthesis. [Reprinted from (Chan et al., 2017), licensed under CC BY 4.0]. (f) Distribution of “gene fitness” effects of all single codon substitutions across the length of a native bacterial antibiotic resistance gene, whose product's cellular activity is linked to fitness via a synthetic genetic circuit (grey: missense, blue: nonsense, red: synonymous). [Reprinted by permission of the Society for Molecular Biology and Evolution: Molecular Biology and Evolution, 31(6), 1583, A comprehensive, high‐resolution map of a gene's fitness landscape (Firnberg et al., 2014)]. (g) Distribution of minimum inhibitory concentrations of antibiotic observed for random single point mutations across the length of a native bacterial antibiotic resistance gene (same as f) (coloured bars; white bars: wild type). [Reprinted from Proceedings of the National Academy of Sciences, 110(32), 13,068, Capturing the mutational landscape of the beta‐lactamase TEM‐1 (Jacquier et al., 2013)]. (h) Distributions of complementation assay protein–protein interaction scores resulting from single point mutations in the leucine zipper domains of 2 human transcription factor subunits (red and blue). [Reprinted from (Diss & Lehner, 2018), licensed under CC BY 4.0]
Figure 3
Figure 3
Illustration of the thermodynamic hypothesis for DMEs in proteins. Left panel—Black sigmoid curve shows the fraction of natively folded protein molecules as a function of the free energy of folding, ΔG, following: Pnat=11+eΔG/kbT, where kb is the Boltzmann constant and T is temperature (kbT is set here to 0.62, as in (Wylie & Shakhnovich, 2011)). Dashed line marks a hypothetical wild‐type protein stability (−3 kcal/mole), located on the plateau of the sigmoid, for illustration. Red curve shows a hypothetical distribution of mutant ΔG values, resulting from a DME on ΔG that is Gaussian with a mean of +1, following (Wylie & Shakhnovich, 2011), but here with a larger standard deviation of 3. The stability sigmoid could be steepened by effects such as irreversible aggregation or degradation of misfolded species (Tokuriki & Tawfik, 2009). Right panel: The resulting DME on the relative fraction of natively folded protein molecules, which is bimodal under these parameter values
Figure 4
Figure 4
A sample of experimentally characterized elasticity functions, all of a saturating concave form. (a) Function: expression level—growth rate; protein: chaperone; organism: Saccharomyces cerevisiae. [Reprinted from (Jiang et al., 2013), licensed under CC BY 4.0]. (b) Function: enzymatic performance (k cat/K m)—fitness, under 2 different coenzymes; protein: oxidoreductase (amino‐acid biosynthesis); organism: Escherichia coli. [From Science, 310(5,747), 501, The biochemical architecture of an ancient adaptive landscape (Lunzer et al.., 2005). Reprinted with permission from AAAS]. (c) Left‐right, top‐bottom. Functions: enzymatic activity—metabolic flux (first 4), gene dose—growth rate, enzymatic efficiency (V max/K m)—metabolic flux, gene dose—DNA repair rate, enzymatic efficiency—metabolic flux; proteins: lyase, transferase, ligase, aminotransferase (all from same amino‐acid biosynthesis pathway), carboxylase (nucleotide biosynthesis), oxidoreductase (melanin biosynthesis), unknown gene defective in a xeroderma pigmentosum patient (nucleotide repair), oxidoreductase (ethanol oxidation); organism: Neurospora crassa (first 4), Saccharomyces cerevisiae, Mus musculus, Homo sapiens, Drosophila melanogaster. [Republished with permission of Genetics Society of America, from Genetics, 97(3–4), 642, The molecular basis of dominance (Kacser & Burns, 1981); permission conveyed through Copyright Clearance Center, Inc.]. (d) Function: expression level—growth rate; protein: oxidoreductase (cofactor biosynthesis); organism: Escherichia coli. [Reprinted from Molecular Cell, 49(1), 137, Protein quality control acts on folding intermediates to shape the effects of mutations on organismal fitness (Bershtein et al., 2013). Copyright (2013), with permission from Elsevier]. (e) Function: enzymatic activity—fitness; proteins: sugar:proton symporter and hydrolase (sugar catabolism); organism: Escherichia coli. [Republished with permission of Genetics Society of America, from Genetics, 115(1), 29, Metabolic flux and fitness (Dykhuizen et al., 1987); permission conveyed through Copyright Clearance Center, Inc.]
Figure 5
Figure 5
Expression–fitness functions for a diverse set of protein‐coding yeast genes. Red lines mark wild‐type expression levels. [Reprinted from Cell, 166(5), 1,286, Massively parallel interrogation of the effects of gene expression levels on fitness (Keren et al., 2016). Copyright (2016), with permission from Elsevier]
Figure 6
Figure 6
Categories of epistasis possible for different types of mutation pairs. “A” and “B” are mutations, and superscript “+” and “−” denote that these individual mutations increase or decrease the value of the measured phenotype, P. In all cases, the white point is wild type and the orange point is the AB double mutant. Grey dashed line marks the sum of P A and P B, that is the expected value for the double mutant. Epistasis measures the deviation from this expectation, which may be either negative or positive, and can be categorized as either magnitude (the direction of mutational effects do not depend on the other mutation) or sign type. Sign epistasis can be further categorized as simple (effect of one mutation changes sign in the presence of the other) or reciprocal (effects of both mutations change sign in the presence of the other). The three examples shown are (left‐right): no epistasis between a pair of positive‐effect mutations, positive simple sign epistasis between a pair of negative‐effect mutations, and negative magnitude epistasis between a positive‐effect and negative‐effect mutation
Figure 7
Figure 7
Trends of epistasis predicted by thermodynamic model of mutation effects. Black sigmoid curve shows the logarithm of a phenotype that increases proportionally with the fraction of natively folded protein molecules as a function of the free energy of folding, ΔG. A small background value (phenotype of 0.1 in the absence of any correctly folded molecules) has been applied to capture the situation for nonessential genes and/or the effect of measurement background/limits. If the phenotype truly approaches 0 in the limit of very high ΔG, the stability curve is no longer sigmoidal on this log scale, but has a concave shape, causing epistasis (see below) to become increasingly negative, in a linear fashion, as ΔG increases. In reality, however, biology or experimental limitations often result either in a background phenotype value in the absence of correctly folded protein, resulting in a log‐sigmoid as shown here, or in a threshold being applied below which all mutants are considered null and therefore not considered for epistasis analysis. The formula and parameter values are as for Figure 3, with the addition of the 0.1 phenotype background. Dashed vertical line marks a hypothetical wild‐type protein stability (−3 kcal/mole), located on the stability plateau. Blue curve shows the epistasis that would occur between pairs of mutations of identical ΔG effects, each of which individually displaces ΔG from the wild‐type value to the value indicated by the x‐axis. Dashed horizontal line marks the boundary between positive and negative epistasis (i.e. zero epistasis). A transition from negative to positive epistasis occurs as mutations become more strongly destabilizing, due to the sigmoidality of the stability curve. The shape of the epistasis curve could explain why both negative and positive epistasis are observed between mutations within proteins, as well as the existence of certain correlations between mutation effect size and epistasis (see below)
Figure 8
Figure 8
Two‐enzyme activity–fitness functions predicted from metabolic control analysis. E1 and E2 are the activities of two enzymes acting at adjacent steps of a linear metabolic pathway. In both plots, fitness is assumed to depend solely on the steady‐state concentration of a pathway intermediate, in a Gaussian manner (i.e. stabilizing selection is assumed to operate on the intermediate). The only difference is that, in one case, the intermediate lies downstream of the two enzymes (left), and in the other, it lies between them (right). The two landscapes have strikingly different forms, resulting in different expectations of interenzyme epistasis. Further, in both cases, trends of interenzyme epistasis will depend on the position of the wild type and the distribution of mutation effects on enzyme activities. [Republished with permission of Genetics Society of America, from Genetics, 133(1), 129–130, Do deleterious mutations act synergistically? Metabolic Control Theory provides a partial answer (Szathmary, 1993); permission conveyed through Copyright Clearance Center, Inc.]
Figure 9
Figure 9
Barcoded‐Tracking of Combinatorial Engineered Libraries (bTRACE), a general high‐throughput method for analysing the effect of known genome‐wide mutation combinations. A multiplex genome‐engineering method is used to construct a cell library in which each clone can contain multiple mutations throughout the genome, and this library is itself transformed with a library of plasmids carrying highly diverse DNA barcodes (triangles), such that each cell now contains a unique barcode. Single cells are then encapsulated in emulsion droplets, where they are lysed, and a targeted binary PCR assembly reaction is performed to ligate barcodes adjacent to chosen genomic regions. The emulsion is broken, and deep‐sequencing of the assembled product pool allows reconstruction of the complete phased genotype associated with each barcode. In parallel, the library can be phenotyped by one of the deep‐sequencing techniques discussed previously, with only the small DNA barcodes now requiring sequencing, allowing genome‐wide mutation combinations to be linked to an amenable trait at high throughput. Figure based on Figure 1 from Zeitoun et al. (2017)
Figure 10
Figure 10
A genome‐wide network of gene–gene interaction profile similarities. Nodes are yeast genes, and edges connect genes with similar genome‐wide fitness interaction profiles, revealing functional modules. [From Science, 353(6,306), aaf1420‐2, A global genetic interaction network maps a wiring diagram of cellular function (Costanzo et al., 2016). Reprinted with permission from AAAS]

Similar articles

See all similar articles

Cited by 2 articles

References

    1. Adamson B., Norman T. M., Jost M., Cho M. Y., Nuñez J. K., Chen Y., … Weissman J. S. (2016). A multiplexed single‐cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell, 167(7), 1867–1882.e21. 10.1016/j.cell.2016.11.048 - DOI - PMC - PubMed
    1. Agashe D., Sane M., Phalnikar K., Diwan G. D., Habibullah A., Martinez‐Gomez N. C., … Marx C. J. (2016). Large‐effect beneficial synonymous mutations mediate rapid and parallel adaptation in a bacterium. Molecular Biology and Evolution, 33(6), 1542–1553. 10.1093/molbev/msw035 - DOI - PMC - PubMed
    1. Araya C. L., Fowler D. M., Chen W., Muniez I., Kelly J. W., & Fields S. (2012). A fundamental protein property, thermodynamic stability, revealed solely from large‐scale measurements of protein function. Proceedings of the National Academy of Sciences, 109(42), 16858–16863. 10.1073/pnas.1209751109 - DOI - PMC - PubMed
    1. Avery L., & Wasserman S. (1992). Ordering gene function: The interpretation of epistasis in regulatory hierarchies. Trends in Genetics, 8(9), 312–316. 10.1016/0168-9525(92)90263-4 - DOI - PMC - PubMed
    1. Baba T., Ara T., Hasegawa M., Takai Y., Okumura Y., Baba M., … Mori H. (2006). Construction of Escherichia coli K‐12 in‐frame, single‐gene knockout mutants: The Keio collection. Molecular Systems Biology, 2, 2006.0008 10.1038/msb4100050 - DOI - PMC - PubMed

LinkOut - more resources

Feedback