CpG domains downstream of TSSs promote high levels of gene expression

Nucleic Acids Res. 2014 Apr;42(6):3551-64. doi: 10.1093/nar/gkt1358. Epub 2014 Jan 9.


CpG dinucleotides are known to play a crucial role in regulatory domains, affecting gene expression in their natural context. Here, we demonstrate that intragenic CpG frequency and distribution impacts transgene and genomic gene expression levels in mammalian cells. As shown for the Macrophage Inflammatory Protein 1α, de novo RNA synthesis correlates with the number of CpG dinucleotides, whereas RNA splicing, stability, nuclear export and translation are not affected by the sequence modification. Differences in chromatin accessibility in vivo and altered nucleosome positioning in vitro suggest that increased CpG levels destabilize the chromatin structure. Moreover, enriched CpG levels correlate with increased RNA polymerase II elongation rates in vivo. Interestingly, elevated CpG levels particularly at the 5' end of the gene promote efficient transcription. We show that this is a genome-wide feature of highly expressed genes, by identifying a domain of ∼700 bp with high CpG content downstream of the transcription start site, correlating with high levels of transcription. We suggest that these 5' CpG domains are required to distort the chromatin structure and to increase gene activity.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • CHO Cells
  • Cell Line
  • Chemokine CCL3 / genetics
  • Chromatin / chemistry
  • CpG Islands*
  • Cricetinae
  • Cricetulus
  • Granulocyte-Macrophage Colony-Stimulating Factor / genetics
  • HEK293 Cells
  • Humans
  • Mice
  • Molecular Sequence Data
  • Protein Biosynthesis
  • Protein Processing, Post-Translational
  • Regulatory Elements, Transcriptional*
  • Transcription Elongation, Genetic
  • Transcription Initiation Site*
  • Transcription, Genetic*
  • Transgenes


  • Chemokine CCL3
  • Chromatin
  • Granulocyte-Macrophage Colony-Stimulating Factor

Associated data

  • GENBANK/M11220
  • GENBANK/X03020
  • RefSeq/NM_002983
  • RefSeq/NM_011337