Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 24 (2), 515-527

A Pan-Cancer Compendium of Genes Deregulated by Somatic Genomic Rearrangement Across More Than 1,400 Cases

Affiliations

A Pan-Cancer Compendium of Genes Deregulated by Somatic Genomic Rearrangement Across More Than 1,400 Cases

Yiqun Zhang et al. Cell Rep.

Abstract

A systematic cataloging of genes affected by genomic rearrangement, using multiple patient cohorts and cancer types, can provide insight into cancer-relevant alterations outside of exomes. By integrative analysis of whole-genome sequencing (predominantly low pass) and gene expression data from 1,448 cancers involving 18 histopathological types in The Cancer Genome Atlas, we identified hundreds of genes for which the nearby presence (within 100 kb) of a somatic structural variant (SV) breakpoint is associated with altered expression. While genomic rearrangements are associated with widespread copy-number alteration (CNA) patterns, approximately 1,100 genes-including overexpressed cancer driver genes (e.g., TERT, ERBB2, CDK12, CDK4) and underexpressed tumor suppressors (e.g., TP53, RB1, PTEN, STK11)-show SV-associated deregulation independent of CNA. SVs associated with the disruption of topologically associated domains, enhancer hijacking, or fusion transcripts are implicated in gene upregulation. For cancer-relevant pathways, SVs considerably expand our understanding of how genes are affected beyond point mutation or CNA.

Keywords: TCGA; cancer; genomic rearrangement; pan-cancer; structural variation; whole genome sequencing.

Figures

Figure 1.
Figure 1.. CNAs Associated with Genomic Rearrangements in Human Cancers
(A) By cancer type (denoted by TCGA project name), boxplot of cytoband-level CNA (log2 tumor/normal copy numbers) corresponding to structural variant (SV) breakpoints. SVs with both breakpoints occurring within the same cyto-band are represented only once. Cytobands in X and Y chromosomes are not represented. Foreach cancer type, the median log2 CNA across all cases and cytobands is approximately zero. Boxplots represent 5%, 25%, 50%, 75%, and 95%. Analysis involves 1,465 cases with both WGS and copy data. The maximum log2 tumor/normal CNA value is set to 3.6 by the SNP array analysis, approximating >24 copies. (B) For SVs associated with cytoband-level gain (average log2 tumor/normal copy >1) or loss (average log2 tumor/normal copy <−0.5) breakdown by SV class. p values by chi-square test. (C) Fraction of cancer cases with high-level amplifications for a given gene, according to SV breakpoint occurring within the gene body, upstream of the gene (0–20 kb, 20–50 kb, 50–100 kb), downstream of the gene (0–20 kb, 20–50 kb, 50–100 kb), or >100 kb from the gene. Results are shown for MYC, ERBB2, and EGFR, as well as the averages for 1,102 genes with amplification in >4% of the cases Where multiple breakpoints occur in proximity to a gene, the breakpoint closest to the gene is assigned to the given case. Error bars, SDs. See also Figure S1 and Table S3.
Figure 2.
Figure 2.. Genes with Altered Expression Associated with Nearby SV Breakpoints
(A) Numbers of SV breakpoints identified as occurring within a gene body, upstream of a gene (0–20 kb, 20–50 kb, 50–100 kb) or downstream of a gene (0–20 kb, 20–50 kb, 50–100 kb). For each SV set, the breakdown by alteration class is indicated. SVs located within a given gene are not included in the other upstream or downstream SV sets for that same gene. (B) For each of the SV sets from (A), numbers of significant genes (FDR <0.1) showing correlation between expression and associated SV event. Numbers above and below the zero point of the y axis denote positively and negatively correlated genes, respectively. Linear regression models also evaluated significant associations when correcting for cancer type (red) and for both cancer type and gene-level CNA (green). (C) Heatmap of significance patterns for genes from (B) (from the model correcting for both cancer type and CNA). Significant positive correlation (red), significant negative correlation (blue), not significant (p > 0.05) or not assessed (<3 SV events for given gene in the given genomic region) (black). (D) Significantly enriched Gene Ontology (GO) terms for genes positively correlated (FDR <0.1, with corrections for cancer type and CNA) with occurrence of SV breakpoint in proximity to the gene (for any region considered). p values by one-sided Fisher’s exact test. (E) Patterns of SV versus expression for selected gene sets from (D) (positive regulation of cell size [top], β-catenin-TCF complex assembly [middle], phosphatidylinositol 3-kinase activity [bottom]). Differential gene expression patterns relative to the median across sample profiles. Cases with genes associated with high-level gene amplification or with gene fusion event are respectively indicated. See also Figure S2 and Table S4.
Figure 3.
Figure 3.. Identification of Gene Fusions by Both RNA-Seq and WGS
(A) Of 2,398 candidate fusion events identified by RNA-seq analysis (Yoshihara et al., 2015), numbers of events with support from WGS analysis are indicated (SV found within both genes, SV found within one gene, or fusion found to have both RNA-seq and WGS support in another sample). (B) Of the 1,318 gene body SV events associated with overexpressed genes (from Figure 2C and Table S4, 174 genes with FDR <0.1 correcting for cancer type and CNA), the fractions of events associated with either gene fusion by RNA-seq analysis or high-level gene amplification are indicated. (C) Across 433 cancer cases with at least one gene fusion identified (with both RNA-seq and WGS support), incidences for 20 recurrent fusions (fusions between two specific genes identified in more than one cancer case) are shown. Of the 433 cases, 98 harbored a recurrent fusion and the rest harbored at least one “singleton” fusion (i.e., a fusion between two specific genes being identified in a single case). Named singleton fusions involve at least one gene also involved in a recurrent fusion. For cases with recurrent fusion, the “gene overexpressed” track indicates whether at least one of the two involved genes also showed relatively higher mRNA levels (defined as >0.4 SDs from the median across all sample profiles). Cancer type (denoted by TCGA project) is indicated along the bottom and in the coloring of the recurrent fusion event, as well as in the coloring of the text in cases of highlighted singleton fusions. See also Table S5.
Figure 4.
Figure 4.. SVs Associated with Key Oncogenic or Tumor-Suppressive Pathways, Including p53 and Rb1
(A) For the set of genes with differential expression patterns associated with SV breakpoints occurring within the gene (from Figure 2C and Table S4, FDR <0.1 with cancer type and CNA corrections, overexpressed and underexpressed gene sets considered separately), respective overlaps with predefined (Chen et al., 2018) sets of 31 oncogene-associated genes and 72 tumor suppressor-associated genes. p values by one-sided Fisher’s exact test. (B) For selected predefined pathways, associated non-silent gene mutations (single nucleotide variant [SNV]), CNA events, and DNA methylation silencing were cataloged across 1,379 TCGA cancers as previously described (using cases with available exome sequencing, RNA-seq, and WGS data) (Chen et al., 2017). For each pathway, the number of cases also affected by SV is shown (for tumor suppressor genes, expression <−0.4 SDs from the median and within-gene SV breakpoints associated with underexpression per Figure 3C and Table S4; for oncogenes, SV breakpoint events and genes taken from Figure 2E, for which expression is >0.4 SDs from the median). Cases affected by SV but not by mutation or CNA are highlighted. (C) Genomic rearrangements (represented in circos plot) involving TP53 or RB1, based on analysis of 1,493 cases with WGS data. (D) Alterations involving TP53 or RB1 (somatic mutation, copy alteration, SV) found in the set of 1,448 cancer cases having both WGS and RNA-seq data available. (E) Boxplots of expression for RB1 (left) and for TP53 (right) by their respective alteration classes. Boxplots represent 5%, 25%, 50%, 75%, and 95%. p values by t test on log-transformed values. See also Table S6.
Figure 5.
Figure 5.. SVs Associated with CNA and Increased Expression of TERT, ERBB2, CDK12, and CDK4
(A) Circos plot showing all intra- and inter-chromosomal rearrangements within TERT or 0–100 kb upstream (left). Gene expression levels of TERT corresponding to SVs located in the genomic region 20 kb downstream to 100 kb up-stream of the gene (47 SV breakpoints involving 29 cases) (middle); dotted lines denote breakpoints within the same sample and solid lines denote common SV event. Gene expression levels of TERT corresponding to CNA (log2 tumor/ normal ratio) (right). The maximum log2 tumor/ normal CNA value is set to 3.6 by the SNP array analysis, approximating >24 copies. (B) Similar to (A), but for the ERBB2 gene (circos plot, rearrangements within ERBB2 or 0–100 kb upstream; scatterplot, genomic region 20 kb downstream to 100 kb upstream, 243 breakpoints involving 41 cases). (C) Similar to (A), but for the CDK12 gene (circos plot, rearrangements within CDK12 or 0–100 kb upstream or 0–20 kb downstream; scatterplot, genomic region 20 kb downstream to 100 kb up-stream, 185 breakpoints involving 40 cases). (D) Similar to (A), but for the CDK4 gene (circos plot, rearrangements within CDK4 or 0–20 kb upstream or 0–50 kb downstream; scatterplot, genomic region 0–50 kb downstream to 50 kb upstream, 22 breakpoints involving 13 cases). See also Figure S3 and Table S7.
Figure 6.
Figure 6.. SVs Associated with Disruption of TADs and Translocated Enhancers
(A) As compared to all SVs (based on cases with both WGS and RNA-seq data), a fraction of the SVs involving TAD disruption (i.e., SVs with breakpoints spanning TAD boundaries), for SVs with breakpoints located in proximity to a gene and associated with its overexpression (FDR <0.1 for the gene within the given region window, with corrections for cancer type and CNA, and expression >0.4 SDs from the median for the case harboring the breakpoint). SVs are broken down according to their breakpoint occurrence within the gene body, upstream of the gene (0–20 kb, 20–50 kb, 50–100 kb), and downstream of the gene (0–20 kb, 20–50 kb, 50–100 kb). p values by one-sided Fisher’s exact test. (B) Depiction of the TERT locus and associated TADs and SVs. Top: TADs as Hi-C-based contact maps (Dixon et al., 2012), with gray shading indicating locus interactions (darker shading indicates stronger interactions as measured by Hi-C) (adapted from Weischenfeldt et al., 2017). Bottom: gene expression levels of TERT corresponding to SV breakpoints (involving 65 cases and 15 cancer types) located in the genomic region. SV breakpoints are annotated as TAD preserving (i.e., both breakpoints fall within the same TAD) or TAD disrupting; for SV breakpoints involving cases with high TERT expression (defined as expression >0.4 SDs from the median), dotted lines denote breakpoints within the same sample and solid lines denote common SV event. Of all of the genes listed, only TERT was associated with increased expression in proximity to SV breakpoints (Table S4). (C) For the entire set of SV breakpoint associations occurring 0–100 kb upstream of a gene and with breakpoint mate on the distal side from the gene (for cases with WGS), as well as forthe subset of SV breakpoint associations involving gene overexpression (defined as expression >0.4 SDs from the median for the case harboring the breakpoint and FDR <0.1 for gene overexpression, with corrections for cancer type and CNA), the fraction of SV breakpoint associations involving the translocation of an active in vivo-transcribed enhancer (Andersson et al., 2014) within 0.5 Mb of the gene (where the unaltered gene had no enhancer within 1 Mb). p value by chi-square test. (D) By gene and by cancer type, the number of SV breakpoint associations involving the translocation of an active in vivo-transcribed enhancer, which involved 41 genes and 83 SV events. See also Figure S4 and Table S8.

Similar articles

See all similar articles

Cited by 11 PubMed Central articles

See all "Cited by" articles

Publication types

Feedback