Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Sep 4;513(7516):120-3.
doi: 10.1038/nature13695.

Saturation Editing of Genomic Regions by Multiplex Homology-Directed Repair

Affiliations
Free PMC article

Saturation Editing of Genomic Regions by Multiplex Homology-Directed Repair

Gregory M Findlay et al. Nature. .
Free PMC article

Abstract

Saturation mutagenesis--coupled to an appropriate biological assay--represents a fundamental means of achieving a high-resolution understanding of regulatory and protein-coding nucleic acid sequences of interest. However, mutagenized sequences introduced in trans on episomes or via random or "safe-harbour" integration fail to capture the native context of the endogenous chromosomal locus. This shortcoming markedly limits the interpretability of the resulting measurements of mutational impact. Here, we couple CRISPR/Cas9 RNA-guided cleavage with multiplex homology-directed repair using a complex library of donor templates to demonstrate saturation editing of genomic regions. In exon 18 of BRCA1, we replace a six-base-pair (bp) genomic region with all possible hexamers, or the full exon with all possible single nucleotide variants (SNVs), and measure strong effects on transcript abundance attributable to nonsense-mediated decay and exonic splicing elements. We similarly perform saturation genome editing of a well-conserved coding region of an essential gene, DBR1, and measure relative effects on growth that correlate with functional impact. Measurement of the functional consequences of large numbers of mutations with saturation genome editing will potentially facilitate high-resolution functional dissection of both cis-regulatory elements and trans-acting factors, as well as the interpretation of variants of uncertain significance observed in clinical sequencing.

Conflict of interest statement

Sequence data used for this analysis are available in SRA under accession number SRP044126. Reprints and permissions information is available at www.nature.com/reprints. The authors declare competing financial interests: we are in the process of filing a provisional patent application on the method. Readers are welcome to comment on the online version of the paper.

Figures

Extended Data Figure 1
Extended Data Figure 1. Distributions and pair-wise correlations of hexamer abundances
(a) The relative abundance of hexamers within the HDR library (red), gDNA (blue), cDNA data (green) are shown for a single experiment. The vertical black line represents our threshold of 10 gDNA reads. (b–d) Scatterplots from a single replicate show pair-wise correlations between sequencing counts for the HDR library, gDNA, and cDNA for hexamers with at least 10 observations in the gDNA library, excluding wild type and control hexamers (n = 3,633). The HDR library and the gDNA data are most highly correlated (R 95% CI: 0.596–0.636), followed by the gDNA and cDNA (R 95% CI: 0.419–0.471) and the HDR library and cDNA (R 95% CI: 0.341–0.394).
Extended Data Figure 2
Extended Data Figure 2. Correlations for hexamer genome editing efficiency and enrichment scores between biological replicates
(a) gDNA counts for all hexamers with at least ten reads in each of two gDNA preps from separate transfections with the same HDR library (n = 2,980) exhibited moderate correlation (R 95% CI: 0.355–0.416). (b) However, hexamer editing rates, defined as gDNA counts normalized to HDR library counts, were substantially less correlated (R 95% CI: 0.084–0.155), consistent with a hexamer’s HDR library abundance contributing more to its gDNA abundance than systematic differences in HDR efficiency secondary to the hexamer sequence itself. (c) Hexamer enrichment scores for two pools of cells from a single transfection split on D3 were well-correlated (R 95% CI: 0.643–0.681). (d) Pooling data from cells split on D3 replicates from a single transfection yielded an improved correlation between biological replicates (i.e. independent transfections; R 95% CI: 0.690–0.722).
Extended Data Figure 3
Extended Data Figure 3. Comparison of genome-based hexamer enrichment scores to plasmid-based hexamer scores
(a) There was a modest correlation between ESS and ESE hexamers defined by a previous study (x-axis) and the enrichment scores calculated here (y-axis; Spearman ρ = 0.524). The previous study also interrogated hexamers positioned +5 to +10 nucelotides relative to a splice junction, but was plasmid-based rather than genome-based and in the context of different exons. (b) To reveal effects of GC content on hexamer abundance, histograms display the distribution of enrichment scores for each possible GC level (0–6). Hexamers containing two or fewer GC base pairs exhibited broadly lower enrichment scores than hexamers containing three or more GC base pairs.
Extended Data Figure 4
Extended Data Figure 4. Experimental schematic for genome editing and functional analysis of BRCA1 exon 18
Cultured cells were co-transfected with a single Cas9-sgRNA construct (CRISPR) and an HDR library. Each HDR library was generated from cloning of an oligonucleotide synthesized with 3% nucleotide degeneracy (97wt:1:1:1) for approximately half of the exon and a selective PCR site introduced to the other (fixed) half of the exon (red). CRISPR-induced HDR integrates mutant exons into the genome. Cells were cultured for five days post-transfection, and then harvested for gDNA and total RNA. After reverse transcription, selective PCR was performed prior to sequencing the edited pools of gDNA and cDNA. Each exon haplotype’s enrichment score was measured by dividing cDNA reads by gDNA reads, and effect sizes for each SNV were calculated via weighted linear regression.
Extended Data Figure 5
Extended Data Figure 5. Positional SNV editing rates and replication of effect sizes
(a) Editing rates for each SNV in BRCA1 exon 18 were calculated by dividing each SNV’s gDNA sequencing abundance by its HDR library abundance. Editing rates were then plotted across the exon for each library (red = L, blue = R, green = R2) with locations of their selective PCR sites and the CRISPR-targeted PAM illustrated below. For HDR libraries R and R2, there was a subtle decrease in editing rate with increasing distance from the Cas9 cleavage site (rhoR = −0.264, pR = 4.1×10−3; rhoR2 = −0.361, pR2 = 4.8×10−5). For library L, which allowed re-cutting by not destroying the PAM, there was a sharp peak of editing centered on the Cas9 cleavage site, and a rapid decline in efficiencies in the 5′ direction (further from the 3′ selective PCR handle). (b–c) SNV effect sizes were concordant across biological replicates for libraries R2 (b) and L (c) (library R shown in Figure 2). Notably, variants of high effect size scored similarly across independent transfections.
Extended Data Figure 6
Extended Data Figure 6. Biological replicate effect size reproducibility for all libraries
Three separate HDR libraries (R, R2, and L) containing 3% nucleotide degeneracy in either half of BRCA1 exon 18 were introduced to the genome via co-transfection with pCas9-sgBRCA1x18. Enrichment scores were calculated for each haplotype observed at least ten times in the gDNA, and effect sizes of SNVs were determined by weighted linear regression. Effect sizes of individual variants for libraries R2 (left), R (middle), and L (right) were well correlated between biological replicates. Dashed lines represent SNVs that introduce nonsense codons.
Extended Data Figure 7
Extended Data Figure 7. Correlation between effect sizes and predicted disruption of splicing motifs and indel effects
(a) MutPred Splice was used to predict the functional impact of all 234 single nucleotide substitutions on splicing in BRCA1 exon 18 (x-axis), and these scores were compared to absolute values of our empirically measured effect sizes (y-axis; ρ = 0.322). Although nonsense variants contributed to this trend, the sense variants with the largest effect sizes generally had high MutPred Splice scores. (b) For indels observed in gDNA from library 2 (virtually all of which occur at the Cas9 cleavage site), size frequencies are plotted. Indel size = 0 includes all haplotypes with wild type length. (c) For each indel size, enrichment scores were calculated and normalized to that of the average full length exon. As predicted by nonsense-mediated decay, indels that shift the coding frame were associated with low transcript abundance.
Extended Data Figure 8
Extended Data Figure 8. Experimental schematic for saturation genome editing and multiplex functional analysis of DBR1 exon 2
Hap1 cells were co-transfected with a single Cas9-2A-EGFP-sgRNA construct (CRISPR) and an HDR library cloned from array-synthesized oligonucleotides containing programmed SNVs (orange, blue) and active site codon substitutions (green). The HDR library exon haplotypes also included two synonymous mutations (red) to disrupt PAM and protospacer sequences to prevent Cas9 re-cutting, and a 6 bp selective PCR site (light blue) substituted in the downstream intron. Successfully transfected cells (EGFP+) were selected on D2 by FACS, and cultured. On D5, D8, and D11, samples of cells were taken and selective PCR was performed prior to targeted sequencing of gDNA. Each haplotype’s enrichment score, a measure of the haplotype’s fitness in cell culture, was calculated by dividing D8 or D11 abundance by D5 abundance.
Extended Data Figure 9
Extended Data Figure 9. DBR1 editing rates by position and comparison of haplotype abundances between D5 and the HDR library, D8, and D11
(a) Editing rates for programmed SNVs represented in the DBR1 gDNA library above threshold (n = 216) were calculated by normalizing each SNV’s gDNA abundance by its HDR library abundance. Rates are plotted by position, with the locations of the targeted PAM (orange) and selective PCR site (purple) indicated below. The editing rate did not significantly change with position (P > 0.05), consistent with positional effects being negated by eliminating re-cutting and performing selective PCR from a distal site. (b) Scatterplots display the frequencies at which each haplotype was observed in the D5 sample vs. the HDR library, D8, and D11 samples. To account for bottlenecking from editing of a limited number of cells in this representative experiment, analysis of individual haplotypes was restricted to those present at frequencies above 5E-5 in the D5 sample (n = 377; represented by the vertical line). Selection was evident by the depletion of many haplotypes in D8 and D11 samples.
Extended Data Figure 10
Extended Data Figure 10. Performance of computational predictions of deleterious DBR1 mutations and reproducibility between biological replicates
(a) D11 enrichment scores from a single experiment were used to empirically define deleterious mutations as those with scores four-fold below wild type (vertical line). (b) Three in silico metrics of functional impairment were tested for their ability to anticipate the deleteriousness of these mutations as indicated by the area under the receiver operating characteristic curve (AUC): BLOSUM62 (AUC = 0.672, 214 SNVs), PolyPhen-2 (AUC = 0.671, 155 non-synonymous SNVs), and CADD (AUC = 0.701, 214 SNVs). Despite the different approaches of these algorithms, all three exhibited comparably moderate predictive power. (c) A biological replicate of the DBR1 experiment was performed and D11 enrichment scores for amino acid substitutions were well correlated (gray lines on scatterplot indicate the “deleteriousness” threshold of four-fold depletion). The distribution of amino-acid level enrichment scores for each experiment is displayed along each axis, reflecting bimodality. Notably, unexpected effects (i.e. nonsense mutations scoring as tolerated) were among the relatively small percentage of effects not consistent between replicates.
Figure 1
Figure 1. Saturation genome editing and multiplex functional analysis of a hexamer region influencing BRCA1 splicing
(a) Experimental schematic. Cultured cells were co-transfected with a single Cas9-sgRNA construct (CRISPR) and a complex homology-directed repair (HDR) library containing an edited exon that harbors a random hexamer (blue, green, orange) and a fixed selective PCR site (red). CRISPR-induced cutting stimulated homologous recombination with the HDR library, inserting mutant exons into the genomes of many cells. At five days post-transfection, cells were harvested for gDNA and RNA. After reverse transcription, selective PCR was performed followed by sequencing of gDNA and cDNA derived amplicons. Hexamer enrichment scores were calculated by dividing cDNA counts normalized by gDNA counts. (b) Correlation of enrichment scores between biological replicates for hexamers observed in each experiment with positions of previously identified exonic splicing enhancers (ESEs), exonic splicing silencers (ESSs) and stop codons indicated. (c) Rank-ordered plot of enrichment scores with positions of ESEs, ESSs, and stop codons indicated.
Figure 2
Figure 2. Multiplex homology-directed repair reveals effects of single nucleotide variants on transcript abundance
Three separate HDR libraries (R, R2, and L) containing a 3% mutation rate (97% wt, 1% each non-wt base) in either half of BRCA1 exon 18 were introduced to the genome via co-transfection with pCas9-sgBRCA1x18. Enrichment scores were calculated for each haplotype observed at least 10 times in the gDNA, and effect sizes of SNVs were determined by weighted linear regression modeling. ‘Sense’ includes both missense and synonymous SNVs. (a) Effect sizes calculated from replicate transfections of HDR library R, consisting of a 3% per-nucleotide mutation rate in the 3′-most 39 bases and the same selective PCR site used in Fig. 1, were highly correlated (R = 0.846). (b) Library R2 harbored a selective PCR site composed of 5 synonymous changes, none of which are present in Library R. When effect sizes derived from experiments with library R2 were plotted against those from library R, there was a strong correlation (R = 0.847), indicating reproducibility and demonstrating that differences between selective PCR sites did not strongly influence scores. (c) Effect sizes for SNVs across the exon are displayed. Datasets from libraries R and L were combined to span the entire exon. Dashed lines represent SNVs that introduce nonsense codons.
Figure 3
Figure 3. Saturation genome editing and multiplex functional analysis at an essential gene, DBR1, in Hap1 cells
An HDR library targeting a highly conserved region of DBR1 exon 2 was used with pCas9-EGFP-sgDbr1x2 to introduce point mutations across 75 bp and all possible codon substitutions at three residues believed to participate at the enzyme’s active site. (a) Sequencing of gDNA from the HDR library and populations of edited cells at D5, D8, and D11 reveals selection for synonymous mutations, and depletion of frameshift, nonsense, and missense variants. (b) Mean D11 enrichment scores are plotted as line segments for SNVs in the 3′-most 73 bases of exon 2 and two bases of intron 2. Above the enrichment scores in ascending order are the wt nucleotide at each position, each one bp genome edit, the wild-type amino acid (AA), and the AA derived from each genome edit (asterisk indicates a stop codon). Segment color indicates mutation type, faded segments indicate discordant effects between replicates, and AAs are colored according to the Lesk color scheme (small nonpolar – orange, hydrophobic – green, polar – magenta, negatively-charged – red, and positively charged – blue). The first nine bases shown correspond to the active site residues. (c), D8 and D11 amino acid level enrichment scores were calculated for active site residues N84, H85, E86 after excluding discordant observations between replicates (Extended Data Figure 10c). On both D8 and D11 we observe strong selective effects and tolerance of only synonymous (green boxes) and a few missense variants.

Comment in

Similar articles

See all similar articles

Cited by 119 articles

See all "Cited by" articles

References

    1. Myers RM, Tilly K, Maniatis T. Fine structure genetic analysis of a beta-globin promoter. Science. 1986;232:613–618. - PubMed
    1. Cunningham BC, Wells JA. High-resolution epitope mapping of hGH-receptor interactions by alanine-scanning mutagenesis. Science. 1989;244:1081–1085. - PubMed
    1. Patwardhan RP, et al. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nature biotechnology. 2009;27:1173–1175. doi: 10.1038/nbt.1589. - DOI - PMC - PubMed
    1. Fowler DM, et al. High-resolution mapping of protein sequence-function relationships. Nature methods. 2010;7:741–746. doi: 10.1038/nmeth.1492. - DOI - PMC - PubMed
    1. Botstein D, Shortle D. Strategies and applications of in vitro mutagenesis. Science. 1985;229:1193–1201. - PubMed

Publication types

MeSH terms

Substances

Associated data

Feedback