Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 49 (1), 65-74

Pan-cancer Analysis of Somatic Copy-Number Alterations Implicates IRS4 and IGF2 in Enhancer Hijacking


Pan-cancer Analysis of Somatic Copy-Number Alterations Implicates IRS4 and IGF2 in Enhancer Hijacking

Joachim Weischenfeldt et al. Nat Genet.


Extensive prior research focused on somatic copy-number alterations (SCNAs) affecting cancer genes, yet the extent to which recurrent SCNAs exert their influence through rearrangement of cis-regulatory elements (CREs) remains unclear. Here we present a framework for inferring cancer-related gene overexpression resulting from CRE reorganization (e.g., enhancer hijacking) by integrating SCNAs, gene expression data and information on topologically associating domains (TADs). Analysis of 7,416 cancer genomes uncovered several pan-cancer candidate genes, including IRS4, SMARCA1 and TERT. We demonstrate that IRS4 overexpression in lung cancer is associated with recurrent deletions in cis, and we present evidence supporting a tumor-promoting role. We additionally pursued cancer-type-specific analyses and uncovered IGF2 as a target for enhancer hijacking in colorectal cancer. Recurrent tandem duplications intersecting with a TAD boundary mediate de novo formation of a 3D contact domain comprising IGF2 and a lineage-specific super-enhancer, resulting in high-level gene activation. Our framework enables systematic inference of CRE rearrangements mediating dysregulation in cancer.

Conflict of interest statement

Competing Financial Interests Statement:

The authors declare no competing financial interests.


Figure 1
Figure 1. CESAM: framework for uncovering SCNAs driving gene dysregulation in cis.
(a) Principle behind CESAM. TADs are depicted as Hi-C-based contact maps with grey shading indicating locus interactions (darker shading indicates stronger interactions as measured by Hi-C). SCNA breakpoints are binned within each TAD (referred to as SCNA breakpoint search space). (b) Detailed analysis workflow of CESAM. (c) Volcano plot of CESAM hits in a pan-cancer setting, with nominal P-values plotted versus expression fold-change (Fc). Candidate genes identified by CESAM are shown as black dots (genes discussed in the text highlighted in green). Grey dots denote loci removed based on CESAM’s filtering criteria (which includes removal of expression alterations driven by gene dosage change). (d) Relative distance to nearest annotated enhancers at distal breakpoints of SCNAs identified by CESAM (‘CESAM hits’) versus SCNAs not implicated by CESAM, which are here used as control (CTRL) (P=0.001; based on 1000 permutations using the standard deviation of the observed proximity). Negative values refer to closer proximity to genomic feature relative to background.
Figure 2
Figure 2. Analysis of the TERT locus: a CESAM pan-cancer hit.
(a) Depiction of the TERT locus, the abnormal expression of which CESAM inferred to be mediated by cis SCNAs, for adrenocortical carcinoma (ACC) and summarized across cancer types (pan-cancer copy-number gains and losses). Gene expression values reflecting fold changes versus non-carrier ACC samples are indicated adjacent to each SCNA. (b) Fraction of donors per tumor type for which CESAM inferred TERT dysregulation along with SCNAs in cis in at least 3 donors (c) TERT expression values (unadjusted RSEM gene expression values) for different cancer types broken down by SCNA class (see Supplementary Table 1 for tumor-type abbreviations).
Figure 3
Figure 3. Recurrent SCNAs in cis associate with marked IRS4 expression increase.
(a) Recurrent deletions at a TAD boundary near IRS4, and IRS4 amplifications, associate with IRS4 dysregulation in LUSC. A region near IRS4 exhibiting clustered transcription factor (TF) binding sites (candidate CRE) is highlighted with an arrow. The recurrent deletions were evident both in male and female samples (indicating that both hemizygous and complete losses result in IRS4 overexpression). Summarized SCNAs across cancer types (pan-cancer copy-number gains and losses) shown as heatmaps. The full list of pan-cancer SCNAs at the locus is in Supplementary Table 2. Deletion-carrier samples (del, highlighted in blue) exhibited marked H3K27ac at the IRS4 promoter and adjacent candidate CRE. SCNA carrier samples in which chromatin analyses were performed were confirmed to exhibit outlier expression using semi-quantitative RT-PCR and qPCR (Supplementary Fig. 3, Supplementary Table 3, and data not shown). Asterisks depict differentially occupied peaks identified by genome-wide H3K27ac analysis (values adjacent to asterisks show Fc in H3K27ac signal for deletion-carriers vs. non-carriers). Lastly, 4C-Seq experiments using the candidate CRE as a viewpoint in carrier versus non-carrier samples are depicted. dup, duplication; WT, wild-type locus. (b) LUSC expression measurements (unadjusted RSEM gene expression values) for carriers versus non-carriers, revealing IRS4 as the most plausible target. IRS4 expression analyses revealed ~400-fold upregulation in deletion-carriers and >1000-fold for gene amplification carriers (number of control=470; del=24; dup=1; amp=2).
Figure 4
Figure 4. SCNAs associating with marked IGF2 locus overexpression in cis in CRC.
(a) Recurrent somatic duplications at the IGF2 locus (green) associating with IGF2 overexpression encompass a TAD boundary and a super enhancer (yellow) in the adjacent TAD, but do not encompass the known IGF2 cognate enhancer (light blue). Somatic deletions in cis extend over additional TAD boundaries. (b) Boxplots depict expression-SCNAs relationships for all protein-coding genes within the respective TAD, with IGF2 showing the by far most marked relationship making it the most likely target of these recurrent SCNAs in cis (boxplots separating into del carriers, dup carriers, amplification (amp; >4 copies) carriers, and control samples lacking SCNAs in cis). (c) Volcano plot of CESAM hits in CRC, with nominal P-values plotted versus log2-expression change based on all samples with SCNAs in TAD (CESAM hits are depicted in black; IGF2 is highlighted). (d) Structural variant detection by long-insert size paired-end sequencing, followed by DELLY2 analysis, identified presence of TAD-spanning IGF2 locus tandem duplication in spheroid samples CRCP5S and CRCP7S (IGF2 outlier expression was verified in both samples by qPCR; see Supplementary Fig. 13, Supplementary Table 4).
Figure 5
Figure 5. Verification of IGF2 enhancer hijacking and model for mechanism involving de novo contact domain formation.
(a) ChIP-seq for H3K27ac yielding signals consistent with the activity of a previously annotated lineage-specific super-enhancer in the TAD adjacent to the IGF2 locus, but within the region accompanied by the recurrent somatic tandem duplication (TanDup). (b) 4C-Seq experiments using IGF2 promoter region as viewpoint demonstrate physical interaction between IGF2 and the super-enhancer in TanDup-carrier samples, but not in non-carrier samples (WT). (c) 4C-Seq experiments using the super-enhancer as viewpoint verify the highly specific physical interaction with IGF2 in TanDup-carriers (but not in WT samples). Further control data, for an additional WT sample, are in Supplementary Figure 11. (d) New model for high-level gene overexpression at the IGF2 locus in CRC, which involves TanDup-mediated de novo contact domain formation resulting in the hijacking of a lineage-specific super-enhancer.

Comment in

  • Copy Number Alterations Unmasked as Enhancer Hijackers
    R Beroukhim et al. Nat Genet 49 (1), 5-6. PMID 28029156.
    Our understanding of how DNA copy number changes contribute to disease, including cancer, has to a large degree been focused on the changes in gene dosage that they gener …

Similar articles

See all similar articles

Cited by 62 PubMed Central articles

See all "Cited by" articles

Publication types

MeSH terms


LinkOut - more resources