Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 May 9;153(4):919-29.
doi: 10.1016/j.cell.2013.04.010.

Diverse Mechanisms of Somatic Structural Variations in Human Cancer Genomes

Free PMC article

Diverse Mechanisms of Somatic Structural Variations in Human Cancer Genomes

Lixing Yang et al. Cell. .
Free PMC article

Erratum in

  • Cell. 2014 Jun 19;157(7):1736


Identification of somatic rearrangements in cancer genomes has accelerated through analysis of high-throughput sequencing data. However, characterization of complex structural alterations and their underlying mechanisms remains inadequate. Here, applying an algorithm to predict structural variations from short reads, we report a comprehensive catalog of somatic structural variations and the mechanisms generating them, using high-coverage whole-genome sequencing data from 140 patients across ten tumor types. We characterize the relative contributions of different types of rearrangements and their mutational mechanisms, find that ~20% of the somatic deletions are complex deletions formed by replication errors, and describe the differences between the mutational mechanisms in somatic and germline alterations. Importantly, we provide detailed reconstructions of the events responsible for loss of CDKN2A/B and gain of EGFR in glioblastoma, revealing that these alterations can result from multiple mechanisms even in a single genome and that both DNA double-strand breaks and replication errors drive somatic rearrangements.


Figure 1
Figure 1. Example of a complex deletion generated by FoSTeS/MMBIR and a pipeline for predicting SV mechanisms
(A) A complex deletion is predicted by three discordant clusters. The sequence in light blue on the reference is deleted; the sequence in red on the reference is duplicated and inserted into the deletion breakpoints. Three read pairs from the donor are shown above the donor sequence. Three discordant read pairs mapped to the reference are shown above the reference sequence. (B) Reads covering the breakpoints of insertion. The breakpoints are covered by 27 and 11 reads, respectively (only four are shown for each). Reads matching different parts of the reference genome are shown in the corresponding colors. (C) Nucleotide sequences of the reads covering the breakpoints of insertion. Black and red colors indicate the reads and the reference sequences that match each other and the grey sequences indicate unmatched references. There are a 2 bp microhomology (shown in purple) at the breakpoint on the left and a 9 bp insertion of unknown source (shown in dark green) at the breakpoint on the right. (D) Sequencing depth. Blue and red lines denote the predicted deletion and the predicted insertion donor sites, respectively, showing that the copy number is consistent with the SV call. (E) This flowchart, adapted mainly from Kidd et al. 2010, shows the breakpoint features for determining the mechanism that is likely to have generated the observed SV. Six types of mechanisms are assigned: transposable element insertion (TEI), variable number of tandem repeats (VNTR), non-homologous end joining (NHEJ), alternative end joining (alt-EJ), non-allelic homologous recombination (NAHR) and fork stalling and template switching/microhomology mediated break induced repair (FoSTeS/MMBIR). See also Figure S1 and Table S1 and S2.
Figure 2
Figure 2. Spectrum of somatic SV types and mechanisms
(A) Frequencies of types of somatic SVs identified in each patient. Each horizontal bar displays the number of SVs for one sample. The colored bar charts on the left show the number of events scaled by the maximum number of events (as noted) in each tumor type. The black bar charts on the right show the number of events for all patients on the same scale. A HapMap genome (NA18507) is shown at the top as an example of germline events; see Figure S2 for germline events for all patients. Most (59%) of the translocations in NA18507 are TE insertions, as described previously (Lee et al., 2012), 18% are repeat-related events including TE insertions not identified by Lee et al. 2012, and the remaining ones might be events too complex to be identified by Meerkat. (B) Frequencies of somatic deletion mechanisms. The order of the samples is the same as in (A). (C) Frequencies of somatic translocation mechanisms. The order of the samples is the same as in (A). See also Figure S2 and Table S3, S4 and S5.
Figure 3
Figure 3. Proportion of homologies at the breakpoints of somatic tandem duplications and complex deletions compared with NA18507
Homologies in base pairs are shown for each breakpoint as a positive number. A blunt end has a homology of 0 bp. Small insertions with unknown source are shown as negative numbers. Somatic tandem duplications and complex tandem duplications that are responsible for EGFR and CDK4 amplifications in GBM patients are shown in a separate category. See also Figure S3.
Figure 4
Figure 4. CDKN2A/B losses in GBM patients
Profiles in the lower part of the plots show copy ratios (tumor vs. matched normal). Above the copy ratio profiles, predicted somatic SVs are represented by lines with the breakpoints indicated by dots. SVs corresponding to a notable copy number change are colored, with the color indicating the orientation of the breakpoints. A red cluster typically suggests a tandem duplication; a blue cluster typically suggests a deletion. The number of supporting discordant read pairs for each SV is shown on the left using the same color-coding. The copy-loss regions are highlighted with blue shades. (A) GBM0208, an arm level loss and a focal deletion. (B) GBM1086, two focal deletions. (C) GBM0648, complex rearrangements. See also Figure S5.
Figure 5
Figure 5. EGFR amplifications in GBM patients
SVs and copy ratios are displayed as described in Figure 4. The copy-loss and gain regions are highlighted with blue and red shades, respectively. (A) GBM0155, three tandem duplications. (B) GBM0145, one tandem duplication and a deletion with insertion at the breakpoints. Two vertical black lines connecting two single events denote a complex deletion, which was predicted by combining two discordant read pair clusters. The solid blue and red lines represent segments that have been deleted and duplicated. The dashed lines denote a region of no copy number change. (C) GBM0214, one tandem duplication and complex rearrangements. See also Figure S6.
Figure 6
Figure 6. Amplifications of EGFR and chromosome 12 in GBM0152
(A) Copy ratio and rearrangements involving EGFR. Colored boxes with arrows denote the amplified regions and their orientations. (B) Diagram of the resulting rearrangements. Three segments of DNA from chromosome 7 and chromosome 12 are merged into one and tandem-duplicated. (C) Copy ratio and somatic rearrangements on chromosome 12. The three grey dashed lines in copy ratio panel (bottom of this figure) denote copy ratios of 40, 75 and 110. The rearrangements marked by “a”, “b”, “c” and “d” have approximately twice as many supporting discordant read pairs as other rearrangements. These rearrangements are also marked in (D), (E) and (F). (D) The 14 Mb region of chromosome 12 shown in (C) was segmented according to copy ratios. Each segment was re-scaled and assigned an identifier from 0 to 40. The rearrangement marked with a black arrow is not involved in the amplifications of other segments on chromosome 12, but is involved in the amplification of EGFR on chromosome 7 as displayed in (A). (E) Each segment in (D) is shown as a numbered node connected by arrows and lines. Black arrows connected by lines denote concordant connections. Ratios of segments are denoted by the number of dots above the segment IDs inside each node. Non-amplified segments are not shown. The connection marked with “e” (also marked in (F)) is a germline deletion. (F) This diagram shows one possible solution on how segments are connected. Segments with a white background are in an inverted orientation. Colored dashed lines denote discordant connections while black lines denote concordant connections.

Similar articles

See all similar articles

Cited by 128 articles

See all "Cited by" articles

Publication types

LinkOut - more resources