Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Nov 18;6:37324.
doi: 10.1038/srep37324.

YY1 Binding Association With Sex-Biased Transcription Revealed Through X-linked Transcript Levels and Allelic Binding Analyses

Free PMC article

YY1 Binding Association With Sex-Biased Transcription Revealed Through X-linked Transcript Levels and Allelic Binding Analyses

Chih-Yu Chen et al. Sci Rep. .
Free PMC article


Sex differences in susceptibility and progression have been reported in numerous diseases. Female cells have two copies of the X chromosome with X-chromosome inactivation imparting mono-allelic gene silencing for dosage compensation. However, a subset of genes, named escapees, escape silencing and are transcribed bi-allelically resulting in sexual dimorphism. Here we conducted in silico analyses of the sexes using human datasets to gain perspectives into such regulation. We identified transcription start sites of escapees (escTSSs) based on higher transcription levels in female cells using FANTOM5 CAGE data. Significant over-representations of YY1 transcription factor binding motif and ChIP-seq peaks around escTSSs highlighted its positive association with escapees. Furthermore, YY1 occupancy is significantly biased towards the inactive X (Xi) at long non-coding RNA loci that are frequent contacts of Xi-specific superloops. Our study suggests a role for YY1 in transcriptional activity on Xi in general through sequence-specific binding, and its involvement at superloop anchors.


Figure 1
Figure 1. Classification and differential expression analysis of sexes using FANTOM5 CAGE datasets.
(A) Plots showing distances between FANTOM5 CAGE samples in the first 2 dimensions generated using multi-dimensional scaling, Minimum Curvilinear embedding, and non-centred Minimum Curvilinear embedding methods from the proximity matrix of the Random Forest sex classifier. Each circle represents a FANTOM5 sample with its labeled sex: male (blue) or female (red). (B) Scatter plot with log2 ratio of the mean expression levels comparing male to female (with a constant of the 5th percentile expression added to avoid denominators of 0) on the x-axis, and -log10 transformation of raw p-values from the differential transcription analysis between sexes on the y-axis. Each point represents a TSS in non-PAR region of chrX, and TSSs with significantly higher expression in female (escTSSs) and male cells are denoted with circles and crosses (Bonferroni-corrected p-value ≤ 0.05), with small dots for non-differentially transcribed TSSs. The escTSSs nearest to the XIST gene are denoted using open triangles. The grey-scale gradient represents the average expression across all samples in quartiles. The vertical dashed line represents a log2 ratio of 0, where there is no difference between sexes. (C) Venn diagram depicting the overlapping sets of escapees from three published studies with those identified in this report. The numbers within the Venn diagram represent the overlaps between sets, and the numbers in bracket under each list name are precision and recall values where genes reported in more than one list are taken to constitute true escapees.
Figure 2
Figure 2. DNAm comparison between sexes on chrX in urothelial bladder cancer (BLCA) samples from TCGA.
(A) DNA methylation status for positions (i.e. probes from the Illumina 450 k array) near TSSs in both sexes from BLCA samples, where the β values (Y-axis) range from 0 (unmethylated) to 1 (fully methylated). The three TSSs are most proximal to the following genes (from top to bottom): XIST, an escapee (ZFX) and a subject gene (HMGB3). Each square represents a sample for the BLCA dataset. Red or blue color represents a female or male sample, respectively. Each violin plot in gray lines shows the distribution of beta values for each sex at each probe. Plots (B,C) show MA plots for chrX probes and autosomal probes on chr7 between sexes, respectively. Each dot represents a probe from the array. M (difference) on y-axis is the logged differential methylation value between sexes, and A (magnitude) on x-axis is the logged average methylation value (as indicated in Methods). The fitted robust regression line is represented in gray, with the corresponding function and correlation reported. Green and red colors in plot (B) represent probes nearest to escapees and subject genes previously reported in Cotton et al. 2015. Gold and gray colors represent probes nearest to XIST and genes not in either three categories. (D) Violin plots showing the distributions of DNA methylation similarity scores between sexes for probes within 50 bps of escTSSs and non-differentially transcribed (nonDT) TSSs on chrX. The similarity score of DNA methylation on y-axis is the residual of M as a function of A on chrX. Only TSSs with at least one probe within 50 bps were plotted, and for those TSSs within 50 bps of multiple probes, the average similarity scores of probes were obtained. The p-value from the Wilcoxon test is reported above the violin plots.
Figure 3
Figure 3. Over-representation analyses of TF binding motifs and ENCODE TF ChIP-seq peaks at escape TSSs.
(A) Barplots showing the proportions of merged TSS regions (x-axis) containing the JASPAR TF motifs labeled on the y-axis for escTSSs (green) and non-differentially transcribed set on X with Bonferroni-corrected p-values equal to 1 (nonDT; purple). The proportions for %GC composition and length matched background set sampled from the genome for escTSSs (escTSSs_bg) is shown in shaded green. The top motifs with Fisher scores greater than 95th percentile in escTSSs compared to either background set (escTSSs_bg and nonDT) are plotted. The top motifs derived with each background set are marked with asterisks under the background bars with corresponding color. Motifs are presented in decreasing order of Fisher score sums from both comparisons. Figures (B,D) compare bi-escTSSs to matched background TSSs on chrX (X_bg) for over-representation of TF ChIP-seq peaks, whereas figures (C,E) compare to matched background TSSs on autosomes (Auto_bg). (B,C): Scatter plots showing the percentages of TSSs that overlap a peak comparing between bi-escTSSs (y-axis) and matched background TSSs (x-axis). Each dot represents a uniformly processed TF ChIP-seq dataset from ENCODE. Red and blue colors represent female and male cells in all plots, respectively. The dashed gray lines are the baselines reflecting no differences between proportions of escTSSs and background TSSs overlapping peaks. Datasets with significant over-representation of peaks in escTSSs compared to background TSSs are displayed as triangles (Bonferroni-corrected p-values ≤ 0.05). Significantly over-represented YY1 datasets are labeled on figure (B). (D,E) Violin plots showing the distributions of log2 ratio of escTSSs to background TSSs in female and male cells. The p-value from comparing the log2 ratios between male and female cells (one-sided Wilcoxon test) and the p-values of one-sample Wilcoxon tests for the distributions are shown. (F) Figure listing TF motifs over-represented compared to both background sets, and the top 10 over-represented TFs in ranking of significance. The semi-circle lines link TFs within the same structural classes. ‘N/A’ indicates that data is unavailable. ’Y’ and ‘NS’ indicate significant or not significant over-representation, respectively.
Figure 4
Figure 4. Input and YY1 ChIP-seq read depths around bi-escTSSs for both sexes.
The read depth plots for ENCODE ChIP-seq input samples in two female (A) and male cell lines (B) within 5 kb of three TSS sets: a subset of bi-escTSSs in green with a filter of unique escTSS per gene symbol, background TSSs on chrX with matched averaged expression (X_bg) in violet dashed line, and autosomal TSSs with matched averaged expression (Auto_bg) in gray dashed line. The read depth plots for ENCODE YY1 ChIP-seq data in the same female (C) and male cells (D). The scales on y-axis of figures A and B are the same, while the scale of C is two times the scale of D to reflect the expected X-copies in female and male cells. For the ease of visualization, the narrow panel on the right of each plot displays the read depth within 50 bps of the TSSs.
Figure 5
Figure 5. Allelic imbalance at heterozygous sites within ChIP-seq peaks on chrX in the GM12878 cell line.
(A) Scatterplot showing the allelic imbalance of replicated YY1 ChIP-seq data sets from ENCODE (see Methods). For visualization purposes, allelic imbalance is represented by the log2 ratio of (Xa + 1) to (Xi + 1). A positive log2 ratio value indicates more reads on Xa, while a negative value represents more reads on Xi. Each of the 67 dots represents a heterozygous site within a YY1 binding peak. The Pearson correlation between allelic imbalance of the replicated datasets is 0.87. Dotted lines indicate the baselines for balanced allelic binding. Heterozygous sites within 50 bps of escTSSs are indicated by green circles. The intensity of shading of each dot reflects the total number of YY1 reads from both replicates at the heterozygous site (where read counts were assigned to five 20 percentile bins). (B) Heatmap showing heterozygous sites (rows) significantly Xi-biased in more than one dataset (ChIP-seq and DNase I data; column). Only datasets that are significantly Xi-biased at more than four heterozygous sites are listed. Datasets are denoted by the feature name followed by the ENCODE lab where data was generated, and heterozygous sites are denoted by the gene name of the nearest TSS followed by the chrX coordinate of the site. Colors in the heatmap represent log2 odds ratio values reflecting Xa- or Xi-biased binding of the TF with positive (gold) or negative (brown) values, respectively. The log2 odds ratio distinguishes Xi bias (negative) and Xa bias (positive). White boxes indicate zero read counts at the corresponding site-data pair. The degrees of significance estimated by FDR-corrected p-values are indicated on a scale of 1 to 5 asterisks with the corresponding p-value thresholds shown in the legend. The datasets from left to right are ordered in increasing counts of higher significance scales denoted by the gray triangle, and the heterozygous sites are ordered using genomic coordinates on chrX. The four lncRNAs previously reported to be associated with Xi-specific superloops are marked with purple bars and colored in purple.

Similar articles

See all similar articles

Cited by 8 articles

See all "Cited by" articles


    1. Dorak M. T. & Karpuzoglu E. Gender differences in cancer susceptibility: an inadequately addressed issue. Front Genet 3, 268, doi: 10.3389/fgene.2012.00268 (2012). - DOI - PMC - PubMed
    1. Werling D. M. & Geschwind D. H. Sex differences in autism spectrum disorders. Curr Opin Neurol 26, 146–153, doi: 10.1097/WCO.0b013e32835ee548 (2013). - DOI - PMC - PubMed
    1. Roeters van Lennep J. E., Westerveld H. T., Erkelens D. W. & van der Wall E. E. Risk factors for coronary heart disease: implications of gender. Cardiovasc Res 53, 538–549 (2002). - PubMed
    1. Whitacre C. C. Sex differences in autoimmune disease. Nat Immunol 2, 777–780, doi: 10.1038/ni0901-777 (2001). - DOI - PubMed
    1. Clayton J. A. & Collins F. S. Policy: NIH to balance sex in cell and animal studies. Nature 509, 282–283 (2014). - PMC - PubMed

Publication types