Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 10 (1), 3583

Saturation Mutagenesis of Twenty Disease-Associated Regulatory Elements at Single Base-Pair Resolution

Affiliations

Saturation Mutagenesis of Twenty Disease-Associated Regulatory Elements at Single Base-Pair Resolution

Martin Kircher et al. Nat Commun.

Abstract

The majority of common variants associated with common diseases, as well as an unknown proportion of causal mutations for rare diseases, fall in noncoding regions of the genome. Although catalogs of noncoding regulatory elements are steadily improving, we have a limited understanding of the functional effects of mutations within them. Here, we perform saturation mutagenesis in conjunction with massively parallel reporter assays on 20 disease-associated gene promoters and enhancers, generating functional measurements for over 30,000 single nucleotide substitutions and deletions. We find that the density of putative transcription factor binding sites varies widely between regulatory elements, as does the extent to which evolutionary conservation or integrative scores predict functional effects. These data provide a powerful resource for interpreting the pathogenicity of clinically observed mutations in these disease-associated regulatory elements, and comprise a rich dataset for the further development of algorithms that aim to predict the regulatory effects of noncoding mutations.

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Saturation mutagenesis MPRA of disease-associated regulatory elements. a Saturation mutagenesis MPRA. Error-prone PCR is used to generate sequence variants in a regulatory region of interest. The resulting PCR products with ~1/100 changes compared with the template region are integrated in a plasmid library containing random tag sequences in the 3′ UTR of a reporter gene. Associations between tags and sequence variants are learned through high-throughput sequencing. High complexity MPRA libraries (50k–2M) are transfected as plasmids into cell lines of interest. RNA and DNA is collected and sequence tags are used as a readout. Variant expression correlation (min. ten tags required) between full replicates of b LDLR (LDLR; LDLR.2) and c SORT1 (SORT1; SORT1.2). d Log2 variant effect of all SNVs (min. required tags ten) ordered by their RefSeq transcript position in NM_000527.4 of the hypercholesterolemia-associated LDLR promoter. Upper part shows the LDLR experiment, lower the full replicate LDLR.2. Significance level (red/green lines) is 10−5 in both expression profiles
Fig. 2
Fig. 2
Saturation mutagenesis MPRA of the cancer-associated TERT promoter. a Log2 variant effect of all SNVs (min. ten tags required) ordered by their RefSeq transcript position in NM_198253.2 of TERT. Upper panel shows the TERT experiment in HEK293T cells and the lower in GBM (SF7996) cells, where the E2F repressor site is marked. A significance threshold of 10−5 was used (red vs. green vertical lines). b Expression profile of TERT-GBM-siScramble (gray). Ninety five percent confidence intervals of variants from TERT-GBM-siScramble (green) and TERT-GBM-siGABPA (red), that were significantly different between the two experiments, are overlaid. In addition, predicted ETS-related motifs in the reference sequence (green) or variant induced ETS-related motifs (blue) are marked. c Position weight matrix (PWM) score change of variants that show a significant difference between siGABPA and the scramble siRNA experiment. Motif scores are plotted as boxplots with median center line, upper and lower quartiles box limits, and 1.5× interquartile range whiskers. Variants were only used if they overlapped an ETS-related factor motif (GABPA, ETS1, ELK4, ETV1, and ETV4-6) with a score (reference or alternative sequence) larger than the 80th percentile of the best possible motif match to the PWM. TERT-GBM-siGABPA variant effects were divided by the effect measured in the siRNA scramble experiment. Three asterisks mark a significance level of 10−9 by the two-sided Wilcoxon Rank Sum test (activating n = 34, repressing n = 162)
Fig. 3
Fig. 3
Saturation mutagenesis MPRA of a myocardial infarction-associated SORT1 enhancer. Expression effects of SNVs from experiments SORT1, SORT1.2, and SORT1.flip. Direction of SORT1 and SORT1.2 was from left to right in the experiments. In the SORT1.flip experiment, the direction was reversed (right to left in the figure). Highlighted area in red, close to the experimental promoter site in SORT1 and SORT1.2, is different between the SORT1/SORT1.2 and SORT1.flip experiments. In this region, JASPAR annotates an EBF1 motif (MA0154.3). A significance threshold of 10−5 was used (red vs. green vertical lines)
Fig. 4
Fig. 4
Current computational tools are poor predictors of expression effects. Expression effects of a LDLR and b TERT (significance threshold 10−5; red vs. green vertical lines) compared with PhastCons conservation scores, combined scores of functional genomics data (CADD v1.4, DeepSEA, Eigen, FATHMM-MKL, and number of overlapping 10th percentile scoring JASPAR motifs), and annotated motifs by ENCODE and Ensembl Regulatory Build (ERB) v90
Fig. 5
Fig. 5
Spearman correlation of computational scores with measured expression effects. The figure reports Spearman correlation coefficients (in percent) of the absolute expression effect for all SNVs with at least ten tags in each region with various measures agnostic to the cell type, like conservation (mammalian PhyloP, mammalian PhastCons, and GERP++), overlapping TFBS as predicted in JASPAR 2018 (counting those in the top 10th percentile of motif scores across all elements, all motifs, and additional percentiles are available in Supplementary Table 15), and computational tools that integrate large sets of functional genomics data in combined scores (CADD v1.4, DeepSEA, Eigen, FATHMM-MKL, FunSeq2, GWAVA region model, LINSIGHT, and ReMM). In addition, we compared a subset of experiments (10/21) to absolute deltaSVM scores available for specific cell types (HEK293T, HeLa S3, HepG2, K562, and LNCaP). In cases where an annotation is based on positions rather than alleles, we assumed the same value for all substitutions at each position. The column Type assigns each region as either enhancer (enh.), promoter (prom.), or ultraconserved element (UC). MYC (rs11986220) and MYC (rs6983267) are abbreviated to MYCs1 and MYCs2, respectively. Blue bars denote positive and red bars negative correlation

Similar articles

See all similar articles

Cited by 4 articles

References

    1. Shendure J, Akey JM. The origins, determinants, and consequences of human mutations. Science. 2015;349:1478–1483. doi: 10.1126/science.aaa9119. - DOI - PubMed
    1. Li X, et al. The impact of rare variation on gene expression across tissues. Nature. 2017;550:239–243. doi: 10.1038/nature24267. - DOI - PMC - PubMed
    1. Chatterjee S, Ahituv N. Gene regulatory elements, major drivers of human disease. Annu. Rev. Genom. Hum. Genet. 2017;18:45–63. doi: 10.1146/annurev-genom-091416-035537. - DOI - PubMed
    1. Maurano MT, et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337:1190–1195. doi: 10.1126/science.1222794. - DOI - PMC - PubMed
    1. Cusanovich DA, et al. A single-cell atlas of in vivo mammalian chromatin accessibility. Cell. 2018;174:1309–1324.e18. doi: 10.1016/j.cell.2018.06.052. - DOI - PMC - PubMed

Publication types

Feedback