Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb 24;18(1):39.
doi: 10.1186/s13059-017-1165-7.

BaalChIP: Bayesian Analysis of Allele-Specific Transcription Factor Binding in Cancer Genomes

Affiliations
Free PMC article

BaalChIP: Bayesian Analysis of Allele-Specific Transcription Factor Binding in Cancer Genomes

Ines de Santiago et al. Genome Biol. .
Free PMC article

Abstract

Allele-specific measurements of transcription factor binding from ChIP-seq data are key to dissecting the allelic effects of non-coding variants and their contribution to phenotypic diversity. However, most methods of detecting an allelic imbalance assume diploid genomes. This assumption severely limits their applicability to cancer samples with frequent DNA copy-number changes. Here we present a Bayesian statistical approach called BaalChIP to correct for the effect of background allele frequency on the observed ChIP-seq read counts. BaalChIP allows the joint analysis of multiple ChIP-seq samples across a single variant and outperforms competing approaches in simulations. Using 548 ENCODE ChIP-seq and six targeted FAIRE-seq samples, we show that BaalChIP effectively corrects allele-specific analysis for copy-number variation and increases the power to detect putative cis-acting regulatory variants in cancer genomes.

Keywords: Allele frequency; Allele-specific binding; Bayesian statistics; Cancer; ChIP-sequencing; Copy-number change; FAIRE-sequencing.

Figures

Fig. 1
Fig. 1
Description of BaalChIP model. a The basic inputs for Baal are the ChIP-seq raw read counts in a standard BAM alignment format, a BED file with the genomic regions of interest (such as ChIP-seq peaks), and a set of heterozygous SNPs in a tab-delimited text file. Optionally, genomic DNA BAM files can be specified for RAF computation. Alternatively, the user can specify the pre-computed RAF scores for each variant. b The first module of BaalChIP consists of (1) computing allelic read counts for each heterozygous SNP in peak regions and (2) a round of filters to exclude heterozygous SNPs that are susceptible to generating artifactual ASB effects. (3) The reference mapping (RM) bias and the reference-allele frequency (RAF) are computed internally and the output consists of a data matrix where RM and RAF scores are included alongside information about allele counts for each heterozygous SNP. The column Peak contains binary data to indicate the called peaks. c The second module of BaalChIP consists of calling ASB binding events. (4) BaalChIP uses a beta-binomial Bayesian model to consider RM and RAF bias when detecting ASB events. d The output from BaalChIP is a posterior distribution for each SNP. A threshold to identify SNPs with allelic bias is specified by the user (default value is a 95% interval). (5) The output of BaalChIP is a credible interval (lower and upper) calculated based on the posterior distribution. This interval corresponds to the true AR in read counts (i.e., after correcting for RM and RAF biases). An ASB event is called if the lower and upper limits of the interval are outside the 0.4–0.6 interval. Alt alternative, AR allelic ratio, ASB allelic-specific binding, gDNA genomic DNA, Het. heterozygous, MAPQ, mapping quality, NA not applicable, RAF reference-allele frequency, Ref reference, Rep repeat, RM reference mapping, SNP single-nucleotide polymorphism, TF transcription factor
Fig. 2
Fig. 2
The ROC curve comparison between BaalChIP and other allele-specific SNP finding methods: binomial test and iASeq, using a simulated data set. The BaalChIP result is shown by solid red line. Binomial test and iASeq are shown in dashed blue and black lines. The number of TFs able to bind at a given SNP and the number of reads per TF increases from TF = 3, Reads per TF = 1 (a, b, c) to TF = 5, Reads per TF = 8 (d, e, f) to TF = 15, Reads per TF = 15 (g, h, i). RAF is decreasing from 0.5 (a, d, g) to 0.3 (b, e, g) to 0.1 (c, f, i)
Fig. 3
Fig. 3
Examples of cancer and non-cancer cell lines from SNP and ChIP-seq ENCODE data. a B allele frequencies (BAFs) for chromosome 1 for three cancer cell lines (MCF-7, K562, and SK-N-SH) and one non-cancer cell line (GM12878). Individual SNPs are colored according to genotype values: homozygous AA or BB (blue) and heterozygous AB (orange). b Correlations between the BAF values and the ChIP-seq AR of heterozygous SNPs. RAF corresponds to the BAF value with respect to the reference allele (RAF is equal to BAF if the reference allele corresponds to the B allele; RAF is equal to 1 − BAF if the reference allele corresponds to the A allele). The fitted linear model (blue line) and the Spearman correlation coefficient (cor) show the relationship between BAF and ChIP-seq ARs at heterozygous sites. AR allelic ratio, BAF B allele frequency, chr1 chromosome 1, cor correlation, RAF reference-allele frequency, SNP single-nucleotide polymorphism
Fig. 4
Fig. 4
ASB detection from FAIRE targeted sequencing data. a Correlations between the allelic ratios obtained from gDNA and FAIRE-seq data. b Density plots showing the distribution of allelic ratios before (green) and after (orange) BaalChIP correction. The adjusted AR values were estimated by the BaalChIP model after taking into account the RAF scores computed directly from the control gDNA samples. AR allelic ratio, ASB allelic-specific binding, cor correlation, gDNA genomic DNA, RAF reference-allele frequency
Fig. 5
Fig. 5
Comparison of BaalChIP with other available methods. Left y-axis corresponds to the frequency of ASB events called by BaalChIP or the binomial tests (red, blue and purple lines). Right y-axis corresponds to the mean of the posterior probability given by the iASeq method (black line). The numbers at the top of the plot show the total number of tested heterozygous sites in each bin. a SNPs were grouped in bins of different RAF intervals. The RAF intervals increase in terms of distance to the diploid value (RAF = 0.5). The binomial test (without RAF correction; purple line) and the iASeq methods (black line) are biased towards the detection of ASB events in regions of altered copy numbers. b SNPs were grouped in bins of different depth of coverage. SNPs in regions RAF < 0.4 or RAF > 0.6 were excluded from this analysis. When applying the binomial test (purple and blue lines), the frequency of ASB detection increases for higher covered sites, while the same is not true when applying BaalChIP or the iASeq methods. The effect is particularly visible for the FAIRE-seq data set. ASB allelic-specific binding, binom binomial, RAF reference-allele frequency, SNP single-nucleotide polymorphism

Similar articles

See all similar articles

Cited by 4 articles

References

    1. McDaniell R, Lee BK, Song L, Liu Z, Boyle AP, Erdos MR, et al. Heritable individual-specific and allele-specific chromatin signatures in humans. Science. 2010;328(5975):235–9. doi: 10.1126/science.1184655. - DOI - PMC - PubMed
    1. Reddy TE, Gertz J, Pauli F, Kucera KS, Varley KE, Newberry KM, et al. Effects of sequence variation on differential allelic transcription factor occupancy and gene expression. Genome Res. 2012;22(5):860–9. doi: 10.1101/gr.131201.111. - DOI - PMC - PubMed
    1. Kasowski M, Kyriazopoulou-Panagiotopoulou S, Grubert F, Zaugg JB, Kundaje A, Liu Y, et al. Extensive variation in chromatin states across humans. Science. 2013;342(6159):750–2. doi: 10.1126/science.1242510. - DOI - PMC - PubMed
    1. Kilpinen H, Waszak SM, Gschwind AR, Raghav SK, Witwicki RM, Orioli A, et al. Coordinated effects of sequence variation on DNA binding, chromatin structure, and transcription. Science. 2013;342(6159):744–7. doi: 10.1126/science.1242463. - DOI - PMC - PubMed
    1. McVicker G, van de Geijn B, Degner JF, Cain CE, Banovich NE, Raj A, et al. Identification of genetic variants that affect histone modifications in human cells. Science. 2013;342(6159):747–9. doi: 10.1126/science.1242429. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources

Feedback