Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun 2;534(7605):47-54.
doi: 10.1038/nature17676. Epub 2016 May 2.

Landscape of Somatic Mutations in 560 Breast Cancer Whole-Genome Sequences

Serena Nik-Zainal  1   2 Helen Davies  1 Johan Staaf  3 Manasa Ramakrishna  1 Dominik Glodzik  1 Xueqing Zou  1 Inigo Martincorena  1 Ludmil B Alexandrov  1   4   5 Sancha Martin  1 David C Wedge  1 Peter Van Loo  1   6 Young Seok Ju  1 Marcel Smid  7 Arie B Brinkman  8 Sandro Morganella  9 Miriam R Aure  10   11 Ole Christian Lingjærde  11   12 Anita Langerød  10   11 Markus Ringnér  3 Sung-Min Ahn  13 Sandrine Boyault  14 Jane E Brock  15 Annegien Broeks  16 Adam Butler  1 Christine Desmedt  17 Luc Dirix  18 Serge Dronov  1 Aquila Fatima  19 John A Foekens  7 Moritz Gerstung  1 Gerrit K J Hooijer  20 Se Jin Jang  21 David R Jones  1 Hyung-Yong Kim  22 Tari A King  23 Savitri Krishnamurthy  24 Hee Jin Lee  21 Jeong-Yeon Lee  25 Yilong Li  1 Stuart McLaren  1 Andrew Menzies  1 Ville Mustonen  1 Sarah O'Meara  1 Iris Pauporté  26 Xavier Pivot  27 Colin A Purdie  28 Keiran Raine  1 Kamna Ramakrishnan  1 F Germán Rodríguez-González  7 Gilles Romieu  29 Anieta M Sieuwerts  7 Peter T Simpson  30 Rebecca Shepherd  1 Lucy Stebbings  1 Olafur A Stefansson  31 Jon Teague  1 Stefania Tommasi  32 Isabelle Treilleux  33 Gert G Van den Eynden  18   34 Peter Vermeulen  18   34 Anne Vincent-Salomon  35 Lucy Yates  1 Carlos Caldas  36 Laura van't Veer  16 Andrew Tutt  37   38 Stian Knappskog  39   40 Benita Kiat Tee Tan  41   42 Jos Jonkers  16 Åke Borg  3 Naoto T Ueno  24 Christos Sotiriou  17 Alain Viari  43   44 P Andrew Futreal  1   45 Peter J Campbell  1 Paul N Span  46 Steven Van Laere  18 Sunil R Lakhani  30   47 Jorunn E Eyfjord  31 Alastair M Thompson  28   48 Ewan Birney  9 Hendrik G Stunnenberg  8 Marc J van de Vijver  20 John W M Martens  7 Anne-Lise Børresen-Dale  10   11 Andrea L Richardson  15   19 Gu Kong  22 Gilles Thomas  44 Michael R Stratton  1
Free PMC article

Landscape of Somatic Mutations in 560 Breast Cancer Whole-Genome Sequences

Serena Nik-Zainal et al. Nature. .
Free PMC article

Erratum in

  • Author Correction: Landscape of somatic mutations in 560 breast cancer whole-genome sequences.
    Nik-Zainal S, Davies H, Staaf J, Ramakrishna M, Glodzik D, Zou X, Martincorena I, Alexandrov LB, Martin S, Wedge DC, Van Loo P, Ju YS, Smid M, Brinkman AB, Morganella S, Aure MR, Lingjærde OC, Langerød A, Ringnér M, Ahn SM, Boyault S, Brock JE, Broeks A, Butler A, Desmedt C, Dirix L, Dronov S, Fatima A, Foekens JA, Gerstung M, Hooijer GKJ, Jang SJ, Jones DR, Kim HY, King TA, Krishnamurthy S, Lee HJ, Lee JY, Li Y, McLaren S, Menzies A, Mustonen V, O'Meara S, Pauporté I, Pivot X, Purdie CA, Raine K, Ramakrishnan K, Rodríguez-González FG, Romieu G, Sieuwerts AM, Simpson PT, Shepherd R, Stebbings L, Stefansson OA, Teague J, Tommasi S, Treilleux I, Van den Eynden GG, Vermeulen P, Vincent-Salomon A, Yates L, Caldas C, Van't Veer L, Tutt A, Knappskog S, Tan BKT, Jonkers J, Borg Å, Ueno NT, Sotiriou C, Viari A, Futreal PA, Campbell PJ, Span PN, Van Laere S, Lakhani SR, Eyfjord JE, Thompson AM, Birney E, Stunnenberg HG, van de Vijver MJ, Martens JWM, Børresen-Dale AL, Richardson AL, Kong G, Thomas G, Stratton MR. Nik-Zainal S, et al. Nature. 2019 Feb;566(7742):E1. doi: 10.1038/s41586-019-0883-2. Nature. 2019. PMID: 30659290


We analysed whole-genome sequences of 560 breast cancers to advance understanding of the driver mutations conferring clonal advantage and the mutational processes generating somatic mutations. We found that 93 protein-coding cancer genes carried probable driver mutations. Some non-coding regions exhibited high mutation frequencies, but most have distinctive structural features probably causing elevated mutation rates and do not contain driver mutations. Mutational signature analysis was extended to genome rearrangements and revealed twelve base substitution and six rearrangement signatures. Three rearrangement signatures, characterized by tandem duplications or deletions, appear associated with defective homologous-recombination-based DNA repair: one with deficient BRCA1 function, another with deficient BRCA1 or BRCA2 function, the cause of the third is unknown. This analysis of all classes of somatic mutation across exons, introns and intergenic regions highlights the repertoire of cancer genes and mutational processes operating, and progresses towards a comprehensive account of the somatic genetic basis of breast cancer.


Extended Data Figure 1
Extended Data Figure 1. Landscape of driver mutations
(A) Summary of subtypes of cohort of 560 breast cancers (B) Driver mutations by mutation type (C) Distribution of rearrangements throughout the genome. Black line represents background rearrangement density (calculation based on rearrangement breakpoints in intergenic regions only). Red lines represent frequency of rearrangement within breast cancer genes.
Extended Data Figure 2
Extended Data Figure 2. Rearrangements in oncogenes
(A) Variation in rearrangement and copy number events affecting ESR1. Clear amplification in topmost panel, transection of ESR1 in middle panel and focused tandem duplication events in lower panel. (B) Predicted outcomes of some rearrangements affecting ETV6. Red crosses indicate exons deleted as a result of rearrangements within the ETV6 genes, black dotted lines indicate rearrangement break points resulting in fusions between ETV6 and ERC, WNK1, ATP2B1 or LRP6
Extended Data Figure 3
Extended Data Figure 3. Recurrent non-coding events in breast cancers
(A) Manhattan plot demonstrating sites with most significant p-values as identified by binning analysis. Purple highlighted sites were also detected by the method seeking recurrence when partitioned by genomic features. (B) Locus at chr11:65Mb which was identified by independent analyses as being more mutated than expected by chance. In the lowermost panel, a rearrangement hotspot analysis identified this region as a tandem duplication hotspot, with nested tandem duplications noted at this site. Partitioning the genome into different regulatory elements, an analysis of substitutions and indels identified lncRNAs MALAT1 and NEAT1 (topmost panels) with significant p-values.
Extended Data Figure 4
Extended Data Figure 4. Copy number analyses
(A) Frequency of copy number aberrations across the cohort. Chromosome position along x-axis, frequency of copy number gains (red) and losses (green) y-axis. (B) Identification of focal recurrent copy number gains by the GISTIC method (Supplementary Methods) (C) Identification of focal recurrent copy number losses by the GISTIC method (D) Heatmap of GISTIC regions following unsupervised hierarchical clustering. 5 cluster groups are noted and relationships with expression subtype (basal=red, luminal B=light blue, luminal A=dark blue), immunohistopathology status (ER, PR, HER2 status – black=positive), abrogation of BRCA1 (red) and BRCA2 (blue) (whether germline, somatic or through promoter hypermethylation), driver mutations (black=positive), HRD index (top 25% or lowest 25% - black=positive).
Extended Data Figure 5
Extended Data Figure 5. miRNA analyses
Hierarchical clustering of the most variant miRNAs using complete linkage and Euclidean distance. miRNA clusters were assigned using the Partitioning Algorithm using Recursive Thresholding (PART) method. Five main patient clusters were revealed. The horizontal annotation bars show (from top to bottom): PART cluster group, PAM50 mRNA expression subtype, GISTIC cluster, rearrangement cluster, lymphocyte infiltration score and histological grade. The heatmap shows clustered and centered miRNA expression data (log2 transformed). Details on colour coding of the annotation bars are presented below the heatmap.
Extended Data Figure 6
Extended Data Figure 6. Rearrangement cluster groups and associated features
(A) Overall survival by rearrangement cluster group (B) Age of diagnosis (C) Tumor grade (D) Menopausal status (E) ER status (F) Immune response metagene panel (G) Lymphocytic infiltration score
Extended Data Figure 7
Extended Data Figure 7. Contrasting tandem duplication phenotypes
Contrasting tandem duplication phenotypes of two breast cancers using chromosome X. Copy number (y-axis) depicted as black dots. Lines represent rearrangements breakpoints (green=tandem duplications, pink=deletions, blue=inversions, black=translocations with partner breakpoint provided). Top panel, PD4841a, is overwhelmed by large tandem duplications (>100kb, RS1) while PD4833a has many short tandem duplications (< 10kb, RS3) appearing as “single” lines in its plot.
Extended Data Figure 8
Extended Data Figure 8. Hotspots of tandem duplications
A tandem duplication hotspot occurring in 6 different patients
Extended Data Figure 9
Extended Data Figure 9. Rearrangement breakpoint junctions
(A) Breakpoint features of rearrangements in 560 breast cancers by Rearrangement Signature. (B) Breakpoint features in BRCA and non-BRCA cancers
Extended Data Figure 10
Extended Data Figure 10. Signatures of focal hypermutation
(A) Kataegis and alternative kataegis occurring at the same locus (ERBB2 amplicon in PD13164a). Copy number (y-axis) depicted as black dots. Lines represent rearrangements breakpoints (green=tandem duplications, pink=deletions, blue=inversions). Topmost panel showing a ~10Mb region including the ERBB2 locus. Second panel from top zooms in 10-fold to a ~1Mb window highlighting co-occurrence of rearrangement breakpoints, with copy number changes and three different kataegis loci. Third panel from top demonstrates kataegis loci in more detail. Log10 intermutation distance on y axis. Black arrow highlighting kataegis. Blue arrows highlighting alternative kataegis. (B) Sequence context of kataegis and alternative kataegis identified in this dataset.
Figure 1
Figure 1. Cohort and catalogue of somatic mutations in 560 breast cancers.
(A) Catalogue of base substitutions, insertions/deletions, rearrangements and driver mutations in 560 breast cancers (sorted by total substitution burden). Indel axis limited to 5,000(*). (B) Complete list of curated driver genes sorted by frequency (descending). Fraction of ER positive (left, total 366) and ER negative (right, total 194) samples carrying a mutation in the relevant driver gene presented in grey. Log10 p-value of enrichment of each driver gene towards the ER positive or ER negative cohort is provided in black. Highlighted in green are genes for which there is new or further evidence supporting these as novel breast cancer genes.
Figure 2
Figure 2. Non-coding analyses of breast cancer genomes
(A) Distributions of substitution (purple dots) and indel (blue dots) mutations within the footprint of five regulatory regions identified as being more significantly mutated than expected is provided on the left. The proportion of base substitution mutation signatures associated with corresponding samples carrying mutations in each of these non-coding regions, is displayed on the right. (B) Mutability of TGAACA/TGTTCA motifs within inverted repeats of varying flanking palindromic sequence length compared to motifs not within an inverted repeat. (C) Variation in mutability between loci of TGAACA/TGTTCA inverted repeats with 9bp palindromes.
Figure 3
Figure 3. Extraction and contributions of base substitution signatures in 560 breast cancers
(A) Twelve mutation signatures extracted using Non-Negative Matrix Factorization. Each signature is ordered by mutation class (C>A/G>T, C>G/G>C, C>T/G>A, T>A/A>T, T>C/A>G, T>G/A>C), taking immediate flanking sequence into account. For each class, mutations are ordered by 5’ base (A,C,G,T) first before 3’ base (A,C,G,T). (B) The spectrum of base substitution signatures within 560 breast cancers. Mutation signatures are ordered (and coloured) according to broad biological groups: Signatures 1 and 5 are correlated with age of diagnosis, Signatures 2 and 13 are putatively APOBEC-related, Signatures 6, 20 and 26 are associated with MMR deficiency, Signatures 3 and 8 are associated with HR deficiency, Signatures 18, 17 and 30 have unknown etiology. For ease of reading, this arrangement is adopted for the rest of the manuscript. Samples are ordered according to hierarchical clustering performed on mutation signatures. Top panel shows absolute numbers of mutations of each signature in each sample. Lower panel shows proportion of each signature in each sample. (C) Distribution of mutation counts for each signature in relevant breast cancer samples. Percentage of samples carrying each signature provided above each signature.
Figure 4
Figure 4. Additional characteristics of base substitution signatures and novel rearrangement signatures in 560 breast cancers
(A) Contrasting transcriptional strand asymmetry and replication strand asymmetry between twelve base substitution signatures. (B) Six rearrangement signatures extracted using Non-Negative Matrix Factorization. Probability of rearrangement element on y-axis. Rearrangement size on x-axis. Del= deletion, tds = tandem duplication, inv = inversion, trans = translocation.
Figure 5
Figure 5. Integrative analysis of rearrangement signatures
Heatmap of rearrangement signatures (RS) following unsupervised hierarchical clustering based on proportions of RS in each cancer. 7 cluster groups (A-G) noted and relationships with expression (AIMS) subtype (basal=red, luminal B=light blue, luminal A=dark blue), immunohistopathology status (ER, PR, HER2 status – black=positive), abrogation of BRCA1 (purple) and BRCA2 (orange) (whether germline, somatic or through promoter hypermethylation), presence of 3 or more foci of kataegis (black=positive), HRD index (top 25% or lowest 25% - black=positive), GISTIC cluster group (black=positive) and driver mutations in cancer genes. miRNA cluster groups : 0=red, 1=purple, 2=blue, 3=light blue, 4=green, 5=orange. Contribution of base substitution signatures in these 7 cluster groups is provided in the lowermost panel.

Comment in

Similar articles

See all similar articles

Cited by 470 articles

See all "Cited by" articles


    1. Stratton MR, Campbell PJ, Futreal PA. The cancer genome. Nature. 2009;458:719–724. doi: 10.1038/nature07943. - DOI - PMC - PubMed
    1. Nik-Zainal S, et al. Mutational processes molding the genomes of 21 breast cancers. Cell. 2012;149:979–993. doi: 10.1016/j.cell.2012.04.024. - DOI - PMC - PubMed
    1. Nik-Zainal S, et al. The life history of 21 breast cancers. Cell. 2012;149:994–1007. doi: 10.1016/j.cell.2012.04.023. - DOI - PMC - PubMed
    1. Hicks J, et al. Novel patterns of genome rearrangement and their association with survival in breast cancer. Genome research. 2006;16:1465–1479. doi: 10.1101/gr.5460106. - DOI - PMC - PubMed
    1. Bergamaschi A, et al. Extracellular matrix signature identifies breast cancer subgroups with different clinical outcome. The Journal of pathology. 2008;214:357–367. doi: 10.1002/path.2278. - DOI - PubMed

Publication types