Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec 1;44(21):10106-10116.
doi: 10.1093/nar/gkw691. Epub 2016 Aug 4.

Evaluating the Impact of Single Nucleotide Variants on Transcription Factor Binding

Affiliations
Free PMC article

Evaluating the Impact of Single Nucleotide Variants on Transcription Factor Binding

Wenqiang Shi et al. Nucleic Acids Res. .
Free PMC article

Abstract

Diseases and phenotypes caused by disrupted transcription factor (TF) binding are being identified, but progress is hampered by our limited capacity to predict such functional alterations. Improving predictions may be dependent on expanding the set of bona fide TF binding alterations. Allele-specific binding (ASB) events, where TFs preferentially bind to one of the two alleles at heterozygous sites, reveal the impact of sequence variations in altered TF binding. Here, we present the largest ASB compilation to our knowledge, 10 765 ASB events retrieved from 45 ENCODE ChIP-Seq data sets. Our analysis showed that ASB events were frequently associated with motif alterations of the ChIP'ed TF and potential partner TFs, allelic difference of DNase I hypersensitivity and allelic difference of histone modifications. For TF dimers bound symmetrically to DNA, ASB data revealed that central positions of the TF binding motifs were disproportionately important for binding. Lastly, the impact of variation on TF binding was predicted by a classification model incorporating all the investigated features of ASB events. Classification models using only DNase I hypersensitivity and sequence data exhibited predictive accuracy approaching the models with substantially more features. Taken together, the combination of ASB data and the classification model represents an important step toward elucidating regulatory variants across the human genome.

Figures

Figure 1.
Figure 1.
Transcription factor binding sites (TFBSs) motif score analysis at heterozygous site binding events. In each panel, we plotted the motif score at heterozygous sites on the favored allele (harboring higher amount of mapped chromatin immunoprecipitation followed by sequencing (ChIP-Seq) reads, x-axis) and unfavored allele (y-axis) at predicted TFBSs. Allele-specific binding (ASB) (left panel) and non-ASB (right panel) events were plotted separately. The black diagonal lines indicated an identical motif score on the two alleles. Note that the figure was generated using all heterozygous site binding events for all compiled TFs in GM12878 and HeLa-S3.
Figure 2.
Figure 2.
Information content and positional impact of each position within TFBS. (A) Correlation between positional impact and information content. Each point corresponded to a position (given in parenthesis) within TFBSs associated to ChIP'ed TFs. Positions were plotted with respect to their associated information content (x-axis) from the TF motif and positional impact (y-axis). The trend line was drawn by the locally weighted scatterplot smoothing method. (B) Exceptional example of CEBPB motif with its positional impact distribution (upper), TF binding motif logo (middle) and TF–DNA interface (lower; Protein Data Bank ID: 2e42).
Figure 3.
Figure 3.
Alteration of comotif correlated with TF allelic imbalance. The name of each panel specified the ChIP'ed TF followed by the comotif name and the cell line in parentheses. Each dot represented one heterozygous site binding event (red for ASB and blue for non-ASB events) found within the predicted TFBSs of the comotif. The comotif alteration (x-axis) represented the log ratio of motif P-values between the reference and alternative alleles. The allelic binding imbalance (y-axis) indicated the fraction of reads mapped on the reference allele over the whole read coverage at that position. We tested the correlation between the two properties for each ChIP'ed TF and its enriched HOMER motifs, and only significantly correlated pairs were plotted (FDR < 0.05).
Figure 4.
Figure 4.
Allelic coordination between heterozygous site binding events for multiple TFs and chromatin properties in HeLa-S3. The heatmap represented the -log(P-value) of Pearson correlation between allele imbalance of TF ChIP-Seq reads at heterozygous site binding events and chromatin properties (DHS and histone modifications).
Figure 5.
Figure 5.
Performance of ASB classification models and key features. (A) AUPRC of the deltaSVM, Seq, Seq+DHS and Full models across all the investigated TF ChIP-Seq experiments. Seq model was based only on sequence-related features; Seq+DHS model added DHS data on top of the Seq model; and Full model further added histone and cobound TFs. Details on each model and features can be found in Materials and Methods. (B) Top frequent key features in the Full models for all 27 TFs with known motifs. The suffix ‘favor’ and ‘unfavor’ referred to the favored and unfavored alleles at heterozygous sites. The ‘motif_pvalue_ratio’ was the log ratio between two alleles in terms of motif score P-value. The ‘peak_dis’ indicated the distance of the SNV to ChIP-Seq peak maximum position where the highest number of reads were mapped within the peak.

Similar articles

See all similar articles

Cited by 14 articles

See all "Cited by" articles

References

    1. Li M.J., Wang L.Y., Xia Z., Sham P.C., Wang J. GWAS3D: Detecting human regulatory variants by integrative analysis of genome-wide associations, chromosome interactions and histone modifications. Nucleic Acids Res. 2013; 41:W150–W158. - PMC - PubMed
    1. Khurana E., Fu Y., Colonna V., Mu X.J., Kang H.M., Lappalainen T., Sboner A., Lochovsky L., Chen J., Harmanci A. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science. 2013; 342:1235587. - PMC - PubMed
    1. Andersen M.C., Engstrom P.G., Lithwick S., Arenillas D., Eriksson P., Lenhard B., Wasserman W.W., Odeberg J. In silico detection of sequence variations modifying transcriptional regulation. PLoS Comput. Biol. 2008; 4:e5. - PMC - PubMed
    1. Macintyre G., Bailey J., Haviv I., Kowalczyk A. is-rSNP: a novel technique for in silico regulatory SNP detection. Bioinformatics. 2010; 26:i524–i530. - PMC - PubMed
    1. Wang J., Batmanov K. BayesPI-BAR:A new biophysical model for characterization of regulatory sequence variations. Nucleic Acids Res. 2015; 43:e147. - PMC - PubMed

Publication types

Feedback