Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct 23;18(1):194.
doi: 10.1186/s13059-017-1322-z.

Systematic identification of regulatory variants associated with cancer risk

Affiliations

Systematic identification of regulatory variants associated with cancer risk

Song Liu et al. Genome Biol. .

Abstract

Background: Most cancer risk-associated single nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) are noncoding and it is challenging to assess their functional impacts. To systematically identify the SNPs that affect gene expression by modulating activities of distal regulatory elements, we adapt the self-transcribing active regulatory region sequencing (STARR-seq) strategy, a high-throughput technique to functionally quantify enhancer activities.

Results: From 10,673 SNPs linked with 996 cancer risk-associated SNPs identified in previous GWAS studies, we identify 575 SNPs in the fragments that positively regulate gene expression, and 758 SNPs in the fragments with negative regulatory activities. Among them, 70 variants are regulatory variants for which the two alleles confer different regulatory activities. We analyze in depth two regulatory variants-breast cancer risk SNP rs11055880 and leukemia risk-associated SNP rs12142375-and demonstrate their endogenous regulatory activities on expression of ATF7IP and PDE4B genes, respectively, using a CRISPR-Cas9 approach.

Conclusions: By identifying regulatory variants associated with cancer susceptibility and studying their molecular functions, we hope to help the interpretation of GWAS results and provide improved information for cancer risk assessment.

Keywords: CRISPR interference; Cancer susceptibility; GWAS; Regulatory variants; STARR-seq.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

The study protocol was approved by the Institutional review board of Institute of Basic Medical Science, Chinese Academy of Medical Sciences. Written informed consent was obtained from each of the participants and all the experimental methods were in compliance with the Declaration of Helsinki.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
The workflow to screen for regulatory SNPs associated with cancer risk. The genomic DNA from ten individuals was pooled and sonicated into fragments of ~ 500 bp. Regions containing 10,673 SNPs in LD with 996 GWAS-identified cancer risk SNPs were captured using a custom designed array. The captured fragments were inserted into a modified STARR-seq vector using Gibson assembly to generate a plasmid library, which was sequenced as the input library and then transfected into HEK293T cells. The RNAs were extracted from cells and sequenced as the output library. The regulatory activities were calculated based on the ratio of normalized fragment counts in the output library against the input library. The regulatory SNPs were detected by the changes in allelic ratios in the output library compared to those in the input library
Fig. 2
Fig. 2
Regulatory regions identified in the screen and validation. a Correlation of the activities for the SNP-bound regions between two screens. The p value was calculated by Wald test, p value = 2.2 × 10−16. b Validation of identified enhancers using dual luciferase reporter assay; r represents Pearson’s correlation coefficient. The p value was calculated by Wald test, p value = 2.56 × 10−14. Identified positive regulatory regions (PRE) are in red, negative regulatory regions (NRE) are in blue, and inactive fragments are in grey. c, d Enrichments of epigenetic markers in the identified PREs and NREs, respectively. The p values were calculated by Fisher’s exact test; *p value < 0.05; error bars represent the confidence interval for the odds ratio
Fig. 3
Fig. 3
Identification and validation of regulatory SNPs. a Distribution of effect sizes and DESeq2 p values for all the SNPs that have two alleles covered. b Distribution of effect sizes of all the tested SNPs against the activities of the SNP-containing regions. The regulatory SNPs in PREs are shown in red and those in NREs in blue. c Luciferase reporter assay validation of the estimated effect sizes for 14 regulatory SNPs. r represents the Pearson correlation coefficient. d Differences in predicted TF binding scores between two alleles for different classes of SNPs
Fig. 4
Fig. 4
Regulatory SNP rs11055880 is in an intergenic enhancer regulating the expression of the ATF7IP gene. a Genomic context of rs11055880 shown in the integrative genome viewer. ChIA-PET signals in MCF7 cells (the interaction between rs11055880 and ATF7IP shown by the purple boxes), ENCODE annotations of DNase hypersensitive sites, H3K4me3, and H3K27ac in MCF7 cells, and DHSs and H3K4me3 marks in HEK293 cells are shown in tracks 1–6. The regulatory activities are shown in track 7. Red dots represent SNPs in PREs and the enlarged one is rs11055880. The blue dots represent SNPs in NREs and the black dots represent other tested SNPs in this region.  b Activities of two alleles of rs11055880 in our screen. Two-tailed paired t-test was used, *p value = 0.047. c Activities of two alleles of rs11055880 in the luciferase reporter assay. Two tailed t-test, ***p value = 2.0 × 10−4. d Expression levels of ATF7IP by qPCR in HEK293T cells expressing sgRNAs targeting the rs11055880 loci (rs11055880-sg2 and rs11055880-sg5) after KRAB-dCas9 activation. P values were calculated by t-test compared to a non-targeting (NT) group from three replicates; *p value = 0.016, ***p value = 4.0 × 10−4. For bd, the error bars represent standard erorrs
Fig. 5
Fig. 5
rs12142375 confers acute lymphoblastic leukemia risk mechanistically through modulating PDE4B gene expression. a Genomic map of the rs12142375 locus, with tracks of DNase I hypersensitive sites, H3K4me1, H3K4me2, H3K4me3, H3K27ac, H3K9ac marks, and Pol2 ChIP-seq signals in GM12878 cells. The red dots repesent the SNPs in PREs and the black d ots represent other tested SNPs in this region. rs12142375 is represented as the big red dot. b Two alleles of rs12142375 conferred different activities in our screen. Two-tailed t-test was used to calculate the p value, n = 4, **p value = 0.008. c Activities of two alleles of rs12142375 in the dual-luciferase reporter assay. The p value was calculated by two tailed t-test, n = 3, ***p value = 0.001. d PDE4B expression levels in peripheral blood mononuclear cells (normal, n = 74) and B cells of childhood acute lymphoblastic leukemia (tumor, n = 359) (data from the Haferlach Leukemia study). The p value was assessed by the Mann–Whitney U test. e Expression levels of PDE4B by qPCR in HEK293T cells expressing sgRNAs targeting the rs12142375 loci (rs12142375-sgRNA2, 24 bp upstream of the SNP, and rs12142375-sgRNA5, 11 bp downstream of the SNP) after KRAB-dCas9 activation. P values were calculated by Student’s t-test compared to the non-targeting (NT) group, n = 3, ***p value < 0.001. f eQTL results in TCGA diffuse large B-cell lymphoma dataset for the association of rs12141375 with PDE4B expression. The p value was calculated by one-tailed Student’s t-test, *p value = 0.023; ns not significant. For (b, c, e), the error bars represent standard errors

Similar articles

Cited by

References

    1. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, Klemm A, Flicek P, Manolio T, Hindorff L, Parkinson H. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42:D1001–6. doi: 10.1093/nar/gkt1229. - DOI - PMC - PubMed
    1. Ward LD, Kellis M. Interpreting noncoding genetic variation in complex traits and human disease. Nat Biotechnol. 2012;30:1095–106. doi: 10.1038/nbt.2422. - DOI - PMC - PubMed
    1. Gao P, Wei GH. Genomic insight into the role of lncRNA in cancer susceptibility. Int J Mol Sci. 2017;18(6):1239. - PMC - PubMed
    1. Saunders MA, Liang H, Li WH. Human polymorphism at microRNAs and microRNA target sites. Proc Natl Acad Sci U S A. 2007;104:3300–5. doi: 10.1073/pnas.0611347104. - DOI - PMC - PubMed
    1. Joehanes R, Zhang X, Huan T, Yao C, Ying SX, Nguyen QT, Demirkale CY, Feolo ML, Sharopova NR, Sturcke A, et al. Integrated genome-wide analysis of expression quantitative trait loci aids interpretation of genomic association studies. Genome Biol. 2017;18:16. doi: 10.1186/s13059-016-1142-6. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances