Identification of breast cancer associated variants that modulate transcription factor binding

PLoS Genet. 2017 Sep 28;13(9):e1006761. doi: 10.1371/journal.pgen.1006761. eCollection 2017 Sep.


Genome-wide association studies (GWAS) have discovered thousands loci associated with disease risk and quantitative traits, yet most of the variants responsible for risk remain uncharacterized. The majority of GWAS-identified loci are enriched for non-coding single-nucleotide polymorphisms (SNPs) and defining the molecular mechanism of risk is challenging. Many non-coding causal SNPs are hypothesized to alter transcription factor (TF) binding sites as the mechanism by which they affect organismal phenotypes. We employed an integrative genomics approach to identify candidate TF binding motifs that confer breast cancer-specific phenotypes identified by GWAS. We performed de novo motif analysis of regulatory elements, analyzed evolutionary conservation of identified motifs, and assayed TF footprinting data to identify sequence elements that recruit TFs and maintain chromatin landscape in breast cancer-relevant tissue and cell lines. We identified candidate causal SNPs that are predicted to alter TF binding within breast cancer-relevant regulatory regions that are in strong linkage disequilibrium with significantly associated GWAS SNPs. We confirm that the TFs bind with predicted allele-specific preferences using CTCF ChIP-seq data. We used The Cancer Genome Atlas breast cancer patient data to identify ANKLE1 and ZNF404 as the target genes of candidate TF binding site SNPs in the 19p13.11 and 19q13.31 GWAS-identified loci. These SNPs are associated with the expression of ZNF404 and ANKLE1 in breast tissue. This integrative analysis pipeline is a general framework to identify candidate causal variants within regulatory regions and TF binding sites that confer phenotypic variation and disease risk.

MeSH terms

  • Alleles
  • Breast Neoplasms / genetics*
  • Breast Neoplasms / pathology
  • Chromatin / genetics
  • DNA-Binding Proteins / genetics*
  • Endonucleases / genetics*
  • Female
  • Gene Expression Regulation, Neoplastic
  • Genetic Predisposition to Disease
  • Genome-Wide Association Study*
  • Humans
  • Nucleotide Motifs / genetics
  • Polymorphism, Single Nucleotide
  • Protein Binding
  • Quantitative Trait Loci / genetics
  • Regulatory Sequences, Nucleic Acid
  • Transcription Factors / genetics


  • Chromatin
  • DNA-Binding Proteins
  • Transcription Factors
  • ZNF404 protein, human
  • ANKLE1 protein, human
  • Endonucleases