Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 50 (10), 1388-1398

Integrative Detection and Analysis of Structural Variation in Cancer Genomes

Affiliations

Integrative Detection and Analysis of Structural Variation in Cancer Genomes

Jesse R Dixon et al. Nat Genet.

Abstract

Structural variants (SVs) can contribute to oncogenesis through a variety of mechanisms. Despite their importance, the identification of SVs in cancer genomes remains challenging. Here, we present a framework that integrates optical mapping, high-throughput chromosome conformation capture (Hi-C), and whole-genome sequencing to systematically detect SVs in a variety of normal or cancer samples and cell lines. We identify the unique strengths of each method and demonstrate that only integrative approaches can comprehensively identify SVs in the genome. By combining Hi-C and optical mapping, we resolve complex SVs and phase multiple SV events to a single haplotype. Furthermore, we observe widespread structural variation events affecting the functions of noncoding sequences, including the deletion of distal regulatory sequences, alteration of DNA replication timing, and the creation of novel three-dimensional chromatin structural domains. Our results indicate that noncoding SVs may be underappreciated mutational drivers in cancer genomes.

Figures

Figure 1 ∣
Figure 1 ∣. Overall strategy of SV detection in cancer genomes.
a. The pipeline of SV detection, validation, and functional analysis. b. An example of the same translocations detected by different technologies in Caki2 cells (hg38 coordinates: chr2:204,260,308 and chr3:179,694,900). c. WGS, Hi-C and optical mapping detect SVs at different scales. Hi-C can detect SVs genome-wide at a scale of up to chromosomal size, while optical mapping can detect SVs and build genome maps at ~10kb resolution. Combining Hi-C and optical mapping can resolve complex rearrangements and reconstruct local genome structure. WGS detects SVs at base pair resolution. d. Cancer genomes possess more CNVs and translocations in comparison with karyotypically normal GM12878 cells. Tracks from outer to inner circles are chromosome coordinates, copy number, duplications (red) and deletions (blue), and rearrangements including inversions, inter-chr translocations (TLs) and unclassified rearrangements. Outward red bars in CNV track indicate gain of copies (>2, 2-8 copies), and inward blue loss of copies (<2, 0–2 copies). CNVs are profiled by WGS with 50,000 bp bin size. Duplications, deletion, and TLs are detected by at least two methods from WGS, Irys, and Hi-C.
Figure 2 ∣
Figure 2 ∣. Detection of SVs using Hi-C in cancer genomes.
a,b. Inter-chromosomal (a) and intra-chromosomal rearrangements (b) detected by using Hi-C data (marked by arrow sign). In panel a, GM12878 heat maps are shown at 100kb resolution, and Caki2 are shown at 1Mb resolution c. A complex translocation (TL) (chr6-chr16-chr6) in K562 cells validated by fluorescence in situ hybridization (FISH). Similar results for FISH validation experiments were performed using 20 independent metaphase nuclei. Scale bars (white) represent 5μM. d. Number of inter-chromosomal and intra-chromosomal rearrangements detected by Hi-C in 29 cancer genomes and 9 normal genomes. e. An example of the impact of TLs on replication timing (RT). RT profiles of chr5 and chr10 of SK-N-MC, when plotted to the reference genome, show abrupt shifts at the TL breakpoints (←, left panels), and they are smoothly connected due to their juxtaposition in the cancer genome (right panel, normal chr10 is absent in SK-N-MC). Solid black (chr10) and red (chr5) lines indicate loess smoothened RT data. As RT experiments were designed for validation purposes, one replicate was performed for RT experiments.
Figure 3 ∣
Figure 3 ∣. Comparison of SVs detected by different methods.
a. Overlap of deletions in T47D cells detected by optical mapping and WGS. b. Size distribution of deletions detected by optical mapping (n=1108) and WGS (n=2964, P = 1.33X10−36, two-sided Wilcoxon rank-sum test). For boxplots, the box represents the interquartile range (IQR), and the whiskers extend to 1.5 times the IQR or to the maximum/minimum if less than 1.5x IQR. c. Optical mapping detects a 6Kb deletion within chrX:96,041,289–96,072,340 that is missed by WGS. d. Reconstruction of the complex local structure of a derivative chromosome in K562 cells through integration of optical mapping, Hi-C and WGS. The rearranged allele consists of 5 regions: A (chr13:80.5–80.8Mb), B (chr13:89.7–93.3Mb), C (chr13:107.8–108Mb), D (chr9:130.7–131.3Mb), and an unalignable region. Further, segment B consists of three smaller regions (B1, B2, and B3 in the figure). We reconstructed a global view of the genome structures in this region by stitching several optical mapping contigs together (middle panel). Each junction of the optical mapping genome map can be validated by Hi-C data. WGS data can provide bp-resolution breakpoints for specific breakpoint junctions. Each line in the WGS panel represents a read pair. WGS reads that support the breakpoint site are marked as purple (forward strand) and red (reverse strand). e. Strategy of using Hi-C to reconstruct SVs. Hi-C shows increased interaction frequency if two translocated regions are directly joined (→) or if they are not immediately adjacent (*), but are linked to the same rearranged allele.
Figure 4 ∣
Figure 4 ∣. The impact of SVs on enhancers.
a. Copy number changes in T47D cells of Refseq genes, sorted by copy number. Genes that are frequently mutated in breast-cancer are labeled if they show amplification (red dots) or deletion (yellow dots). The right panel of this figure displays the density plot of gene copy numbers. b. A ~3.4kb deletion (chr3:179,546,826–179,550,207) in T47D overlaps an HMEC specific enhancer. Hi-C data from HMEC indicates that there is an interaction between the deleted enhancer and the promoter of gene GNB4. This enhancer-promoter linkage is also reported in GM12878 cells by the Capture Hi-C data. According to WGS data, the local region is amplified and has 6 copies in T47D cells, but the enhancer is deleted in 5 of the 6 copies. c. Compared with HMEC, all the genes in this region in T47D are up-regulated potentially due to the local amplification, except for GNB4, whose expression is reduced by ~50%. d. Functional pathway analysis of deleted enhancers (n=1859) by GREAT tool (P-value from two-sided Binomial test). e. Genes with deleted enhancers show reduced expression levels (two-sided Wilcoxon rank-sum test). Genes with exon deletions or copy number loss are excluded. 534 genes are linked by Capture Hi-C data to at least one deleted enhancer (green), and 10,677 genes are linked to enhancers that show no deletions (gray). For boxplots, the box represents the interquartile range (IQR), and the whiskers extend to 1.5 times the IQR or to the maximum/minimum if less than 1.5x IQR.
Figure 5 ∣
Figure 5 ∣. Rearrangements and TAD fusions.
a. Fusion TAD formation as a result of a translocation in Panc-1 cells. The left box shows the rearranged region on chromosome 9, while the right box shows the rearranged region on chromosome 18. The breakpoint fusion lies in the middle. Triangle Hi-C heat maps show intra-chromosomal interactions. The diamond heat map shows the breakpoint crossing Hi-C signal, indicating the presence of a TAD fusion. b. Aggregate analysis of TAD fusions. Breakpoint crossing Hi-C signals were averaged and centered on bins between the nearest TAD boundaries (left) or shuffled TAD boundaries (right - randomization performed 1000 times). Dashed lines show expected neo-TAD borders based on the intersections of the nearest breakpoint proximal TAD boundaries. c. Model for neo-TAD formation. TADs are rearranged due to breaks and fusions, juxtaposing regulatory sequences with non-target genes. d. Violin plots showing the distribution of allelic expression bias for genes within rearranged (n=1004) or non-rearranged (n=74184) TADs. Vertical bars represent the median (p-value is from two-sided Wilcoxon rank-sum test). e. RNA-seq for MYCN/N-Myc (green) and MYC/c-Myc in neuroblastoma cell lines. Cell lines with TAD fusions at the MYC locus show high levels of MYC expression (marked in red), and the cell line that lacks a TAD fusion at the MYC locus lacks MYC expression (yellow). f. Hi-C data from SK-N-SH cells showing a TAD fusion at the MYC locus. g. Hi-C data in SK-N-AS cells showing a TAD fusion at the MYC locus.

Similar articles

See all similar articles

Cited by 19 PubMed Central articles

See all "Cited by" articles

Publication types

Feedback