Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Filters applied. Clear all
. 2018 Apr 1;78(7):1579-1591.
doi: 10.1158/0008-5472.CAN-17-3486. Epub 2018 Jan 19.

Integrative Genomic Analysis Predicts Causative Cis-Regulatory Mechanisms of the Breast Cancer-Associated Genetic Variant rs4415084

Free PMC article

Integrative Genomic Analysis Predicts Causative Cis-Regulatory Mechanisms of the Breast Cancer-Associated Genetic Variant rs4415084

Yi Zhang et al. Cancer Res. .
Free PMC article


Previous genome-wide association studies (GWAS) have identified several common genetic variants that may significantly modulate cancer susceptibility. However, the precise molecular mechanisms behind these associations remain largely unknown; it is often not clear whether discovered variants are themselves functional or merely genetically linked to other functional variants. Here, we provide an integrated method for identifying functional regulatory variants associated with cancer and their target genes by combining analyses of expression quantitative trait loci, a modified version of allele-specific expression that systematically utilizes haplotype information, transcription factor (TF)-binding preference, and epigenetic information. Application of our method to a breast cancer susceptibility region in 5p12 demonstrates that the risk allele rs4415084-T correlates with higher expression levels of the protein-coding gene mitochondrial ribosomal protein S30 (MRPS30) and lncRNA RP11-53O19.1 We propose an intergenic SNP rs4321755, in linkage disequilibrium (LD) with the GWAS SNP rs4415084 (r2 = 0.988), to be the predicted functional SNP. The risk allele rs4321755-T, in phase with the GWAS rs4415084-T, created a GATA3-binding motif within an enhancer, resulting in differential GATA3 binding and chromatin accessibility, thereby promoting transcription of MRPS30 and RP11-53O19.1. MRPS30 encodes a member of the mitochondrial ribosomal proteins, implicating the role of risk SNP in modulating mitochondrial activities in breast cancer. Our computational framework provides an effective means to integrate GWAS results with high-throughput genomic and epigenomic data and can be extended to facilitate rapid functional characterization of other genetic variants modulating cancer susceptibility.Significance: Unification of GWAS results with information from high-throughput genomic and epigenomic profiles provides a direct link between common genetic variants and measurable molecular perturbations. Cancer Res; 78(7); 1579-91. ©2018 AACR.

Conflict of interest statement

Conflict of interest: The authors declare no potential conflicts of interest.


Figure 1
Figure 1
(a) Schematic representation of the integrated analysis workflow for identifying (causative SNP, TF, target gene) triplets. For inferring target genes (left part), eQTL analysis and a modified version of allele-specific expression analysis using the TCGA data are combined. For identifying causative SNPs and corresponding TFs (right part), epigenetics information, motif analysis and TF-target expression correlation analysis are used to filter the list of candidate causative variants. ChIP-seq data, allele-specific binding events and 3D chromatin interaction data are analysed when available. SNP: single-nucleotide polymorphism; eQTL: expression quantitative trait loci; LCASE: local chromosome allele-specific expression; LD: linkage disequilibrium; DHS: DNase I hypersensitive sites; TF: transcription factor; ASB: allele-specific binding; ChIA-PET: Chromatin Interaction Analysis by Paired-End Tag Sequencing; Hi-C: High-throughput chromosome conformation capture. (b) Visual illustration of the genomic analysis pipeline. Candidate SNPs are selected among the SNPs in strong LD with a GWAS SNP (yellow block) by overlapping with DHS (top track). The entire analysis is restricted to the topologically associated domain (TAD) containing the GWAS SNP.
Figure 2
Figure 2
Linkage structure and epigenetic annotation in the 5p12 region. Top triangle shows the linkage (color-coded by r2 value) among 5p12 SNPs ordered according to their genomic locations. Middle track shows genes annotated by GENCODE v19. In the lower tracks, three GWAS SNPs in the 5p12 region are shown, followed by ChromHMM enhancer annotations in the breast cancer cell line MCF-7 and human mammary epithelial cells (HMEC). DNase I hypersensitive sites in T-47D and MCF-7 are also shown to represent open chromatin regions.
Figure 3
Figure 3
The risk allele of the GWAS SNP rs4415084 correlates with elevated MRPS30/RP11-53O19.1 expression. (a) Violin plots of MRPS30 and RP11-53O19.1 expression levels divided into the imputed genotypes at rs4415084, using the TCGA ER+ breast cancer patient data. The p-values are for the multivariate linear regression coefficients of genotype. See Supplementary Table 4 for a full list of eQTL genes and GWAS SNPs in 5p12. (b) A schematic representation of local chromosome allele-specific expression (LCASE) analysis. For a certain exonic SNP of interest, we obtain all patients who have heterozygous genotypes both at the GWAS SNP and at the exonic SNP. Haplotype phasing is performed for the chromosome segment covering the GWAS SNP, the exonic SNP and all intermediate SNPs (Methods). The reference and alternative alleles of a biallelic SNP are denoted as 0 and 1, respectively. In this figure, patient 1 and patient 2 have the 1 allele of the exonic SNP phased with the GWAS risk allele, whereas patient K has the 0 allele. RNA-seq read coverage is then counted in each patient to measure differential transcription activity between the risk chromosome (red) and the protective chromosome (blue). (c) LCASE analysis of exonic SNPs in the protein-coding gene MRPS30. The proportion of reads containing the protective alleles are plotted with the confidence intervals. Four of the six patient samples show significantly fewer reads emanating from the chromosome harboring the protective allele of rs4415084 (one-sided binomial test; p = 1.3 × 10−4, p = 9.7 × 10−17 for patient 1 and patient 2 at rs61754779, respectively; p = 6.7 × 10−47 for patient 4 at rs34522103; p = 1.2 × 10−3 for patient 5 at rs79210252), while patient 3 and patient 6 have non-significant p-values. (d) The genomic locations of LCASE SNPs in the protein-coding MRPS30 and MRPS30 3’ non-coding transcript. The p-values are from Wilcoxon signed-rank test with the red color showing transcription preference towards the risk chromosome.
Figure 4
Figure 4
The predicted causal SNP rs4321755 in LD with the GWAS SNP rs4415084 may regulate GATA3 binding. (a) Subsequence containing the risk allele T of rs4321755 matches the GATA3 motif, while the protective allele C disrupts the motif. The risk and protective alleles are determined by phasing with the alleles of GWAS SNP rs4415084 (r2 = 0.988). (b) GATA3 expression positively correlates with predicted target gene expression. The correlation structure depends on the rs4321755 genotype status; i.e., as the number of risk allele increases, the correlation also increases. (c) ChIP-seq and DNase-seq data in T-47D show that rs4321755 is at the center of GATA3, FOXA1, and DNase I peaks (two replicate experiments of DNase-seq are shown: ENCODE accessions ENCFF001EGW and ENCFF001EHA). Shown for each experiment are the read coverage and raw aligned reads (positive strand: yellow; negative strand: cyan). In the read coverage figure, the range of y-axis values is indicated on top right, and the coverage of the putative causative SNP is color-coded based on the risk (red) and protective (blue) allele counts. (d) Zoomed-in view of ENCODE TF binding and PhyloP conservation track near rs4321755. (e) GATA3 ChIP-seq, PGR ChIP-seq and DNase-seq data show a significant skew towards the rs4321755-T risk allele. Replicates are pooled together and reads are deduplicated; the p-values are calculated by one-sided binomial test.
Figure 5
Figure 5
An illustration of the regulation model for MRPS30/RP11-53O19.1. The top chromosome carrying the protective allele C of the causal SNP rs4321755 has a disrupted GATA3 binding motif, thereby weakening the association between MRPS30/RP11-53O19.1 divergent promoter and the enhancer harboring the SNP. By contrast, the bottom chromosome carrying the risk allele rs4321755-T acquires a strong GATA3 motif, resulting in stronger binding of GATA3 and recruitment of other cofactors like FOXA1 and PGR, which together make this enhancer more active in regulating its target genes MRPS30 and RP11-53O19.1 via chromatin looping.

Similar articles

  • Evidence that the 5p12 Variant rs10941679 Confers Susceptibility to Estrogen-Receptor-Positive Breast Cancer through FGF10 and MRPS30 Regulation.
    Ghoussaini M, French JD, Michailidou K, Nord S, Beesley J, Canisus S, Hillman KM, Kaufmann S, Sivakumaran H, Moradi Marjaneh M, Lee JS, Dennis J, Bolla MK, Wang Q, Dicks E, Milne RL, Hopper JL, Southey MC, Schmidt MK, Broeks A, Muir K, Lophatananon A, Fasching PA, Beckmann MW, Fletcher O, Johnson N, Sawyer EJ, Tomlinson I, Burwinkel B, Marme F, Guénel P, Truong T, Bojesen SE, Flyger H, Benitez J, González-Neira A, Alonso MR, Pita G, Neuhausen SL, Anton-Culver H, Brenner H, Arndt V, Meindl A, Schmutzler RK, Brauch H, Hamann U, Tessier DC, Vincent D, Nevanlinna H, Khan S, Matsuo K, Ito H, Dörk T, Bogdanova NV, Lindblom A, Margolin S, Mannermaa A, Kosma VM; kConFab/AOCS Investigators, Wu AH, Van Den Berg D, Lambrechts D, Floris G, Chang-Claude J, Rudolph A, Radice P, Barile M, Couch FJ, Hallberg E, Giles GG, Haiman CA, Le Marchand L, Goldberg MS, Teo SH, Yip CH, Borresen-Dale AL; NBCS Collaborators, Zheng W, Cai Q, Winqvist R, Pylkäs K, Andrulis IL, Devilee P, Tollenaar RA, García-Closas M, Figueroa J, Hall P, Czene K, Brand JS, Darabi H, Eriksson M, Hooning MJ, Koppert LB, Li J, Shu XO, Zheng Y, Cox A, Cross SS, Shah M, Rhenius V, Choi JY, Kang D, Hartman M, Chia KS, Kabisch M, Torres D, Luccarini C, Conroy DM, Jakubowska A, Lubinski J, Sangrajrang S, Brennan P, Olswold C, Slager S, Shen CY, Hou MF, Swerdlow A, Schoemaker MJ, Simard J, Pharoah PD, Kristensen V, Chenevix-Trench G, Easton DF, Dunning AM, Edwards SL. Ghoussaini M, et al. Am J Hum Genet. 2016 Oct 6;99(4):903-911. doi: 10.1016/j.ajhg.2016.07.017. Epub 2016 Sep 15. Am J Hum Genet. 2016. PMID: 27640304 Free PMC article.
  • Identification of breast cancer associated variants that modulate transcription factor binding.
    Liu Y, Walavalkar NM, Dozmorov MG, Rich SS, Civelek M, Guertin MJ. Liu Y, et al. PLoS Genet. 2017 Sep 28;13(9):e1006761. doi: 10.1371/journal.pgen.1006761. eCollection 2017 Sep. PLoS Genet. 2017. PMID: 28957321 Free PMC article.
  • On the identification of potential regulatory variants within genome wide association candidate SNP sets.
    Chen CY, Chang IS, Hsiung CA, Wasserman WW. Chen CY, et al. BMC Med Genomics. 2014 Jun 11;7:34. doi: 10.1186/1755-8794-7-34. BMC Med Genomics. 2014. PMID: 24920305 Free PMC article.
  • Variation in the Untranslated Genome and Susceptibility to Infections.
    Ramsuran V, Ewy R, Nguyen H, Kulkarni S. Ramsuran V, et al. Front Immunol. 2018 Sep 7;9:2046. doi: 10.3389/fimmu.2018.02046. eCollection 2018. Front Immunol. 2018. PMID: 30245696 Free PMC article. Review.
  • Quantitative Trait Loci Identify Functional Noncoding Variation in Cancer.
    Heyn H. Heyn H. PLoS Genet. 2016 Mar 3;12(3):e1005826. doi: 10.1371/journal.pgen.1005826. eCollection 2016 Mar. PLoS Genet. 2016. PMID: 26938653 Free PMC article. Review.
See all similar articles

Cited by 7 articles

See all "Cited by" articles

Publication types

MeSH terms