Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr 3;23(1):239-254.e6.
doi: 10.1016/j.celrep.2018.03.076.

Genomic and Molecular Landscape of DNA Damage Repair Deficiency Across The Cancer Genome Atlas

Collaborators, Affiliations
Free PMC article

Genomic and Molecular Landscape of DNA Damage Repair Deficiency Across The Cancer Genome Atlas

Theo A Knijnenburg et al. Cell Rep. .
Free PMC article


DNA damage repair (DDR) pathways modulate cancer risk, progression, and therapeutic response. We systematically analyzed somatic alterations to provide a comprehensive view of DDR deficiency across 33 cancer types. Mutations with accompanying loss of heterozygosity were observed in over 1/3 of DDR genes, including TP53 and BRCA1/2. Other prevalent alterations included epigenetic silencing of the direct repair genes EXO5, MGMT, and ALKBH3 in ∼20% of samples. Homologous recombination deficiency (HRD) was present at varying frequency in many cancer types, most notably ovarian cancer. However, in contrast to ovarian cancer, HRD was associated with worse outcomes in several other cancers. Protein structure-based analyses allowed us to predict functional consequences of rare, recurrent DDR mutations. A new machine-learning-based classifier developed from gene expression data allowed us to identify alterations that phenocopy deleterious TP53 mutations. These frequent DDR gene alterations in many human cancers have functional consequences that may determine cancer progression and guide therapy.

Keywords: DNA damage footprints; DNA damage repair; The Cancer Genome Atlas PanCanAtlas project; epigenetic silencing; integrative statistical analysis; mutational signatures; protein structure analysis; somatic copy-number alterations; somatic mutations.


Figure 1
Figure 1. Cancer Types Display Variable DNA Damage Repair Gene Somatic Alterations
(A) DDR gene alterations are frequent and non-uniformly distributed by type and frequency across cancer types. Clustered heatmap indicates the percentage (%) of samples in a cancer type (rows, with cancer types listed right, number of samples between parentheses) altered for at least one core gene in a given DDR pathway (columns, with core gene numbers indicated in parentheses for each pathway, bottom). Color intensity indicates the percentage altered, with the percentage given as a number in each cell. RGB color indicates mutations (red), deep deletions (blue), or epigenetic silencing through methylation (green). Gray scale indicates equal contribution from all three alteration types. A “u” symbol in cells indicates a statistically significant enrichment (FDR [false discovery rate] < 10%, difference in alteration percentages > 2%) in alterations. A “co” or “me” symbol in cells indicates a statistically significant (FDR < 10%) co-occurrence or mutual exclusivity of samples altered by mutation, deep deletion, or silencing. Only “co” relations were observed. The two rightmost columns, Mut.load and SCNA load, indicate average mutation frequency (non-silent mutations/Mb) and copy-number burden (number of copy-number segments) by cancer type. (B) Mutations and deep deletions contribute disproportionately to alter HR genes across nearly all TCGA cancer types. Color and color intensity provide a visual summary of the relative contribution of alteration types to HR pathway variation. The vertical position of each cancer type symbol indicates the percentage altered samples. M+D, mutation and deletion; D+S, deletion and silencing. (C) Multiple genes contribute to enrichment of DDR pathway alterations. Heatmap depicts for each core DDR pathway (columns) statistically enriched alteration frequencies for genes with >2% alterations. Color intensity indicates percentage altered, with the percentage given in each cell. Specific cancer examples representing gene and pathway associations are listed under each column. (D) The top 50 most frequently mutated genes among 276 DDR genes. Genes are listed in order of frequency of non-synonymous mutations (y axis left, blue rectangles), together with the fraction of concurrent mutations and LOH events (y axis right, red bars). See also Figure S1 and S2.
Figure 2
Figure 2. Epigenetic Silencing of DDR Genes and Pathways in Cancer
(A) Gene/probe pairs showing evidence of silencing. Gene expression for gene/probe pairs (x axis) was Z score-transformed based on probe methylation level then plotted as a mean Z score among samples within a methylated group. Negative false discovery rate (FDR)-corrected log10-transformed p values are plotted on the y axis. Green dashed lines indicate the cutoffs for mean Z scores and FDRs. Genes meeting cutoffs for evidence of silencing have red labels, with specific probes listed in parentheses (see STAR Methods for additional details). (B) Gene expression and methylation are inversely correlated for silenced genes. Scatterplots show silenced gene/probe pairs for MGMT (two probes), EXO5, RAD51C, MLH1, and FANCF. Gene expression level is plotted on the y axis and methylation on the x axis with red dots representing silenced samples. (C) Silenced genes are variably distributed across cancer types. Left: oncoprint plot displays the overall frequency of deleterious mutations, deletions, and epigenetic silencing events for each significantly silenced DDR gene (rows, with gene names listed to the right) across 8,739 PanCanAtlas samples. Cancer type is shown in the color key to the right. Frequencies were calculated over the entire cohort, with only altered samples plotted. Right top scale indicates the number of events by molecular type, with the distribution of alterations across cancer types. (D) Heatmap depicting variable frequency of epigenetic silencing events across 33 cancer types and DDR pathways. Cancer types (rows, shown using the same color code as in C) and 12 significantly silenced DDR genes (columns). Bar plots (right) summarize the frequency of silencing events by pathway: DR (ALKBH3 and MGMT), HR(BRCA1, RAD51C, and NSMCE3), and MMR (MLH1, MLH3, and PMS2). Numbers (x axis) below each bar graph indicate the proportion of samples by cancer type with at least one epigenetically silenced gene annotated to that pathway. (E) EXO5 silencing shows cancer subtype variation. Scatterplots as in (B) display the same silenced samples, now color-coded according to cancer subtypes as indicated by the dot color code bottom left. Grey dots represent samples that were expressed/not silenced. See also Figure S2 and S3.
Figure 3
Figure 3. Somatic Copy-Number Alteration Scores in Relation to Clinical Outcomes and DDR Gene Alterations
(A) Matrix heatmap of mean Spearman correlation between SCNA scores and mutation load across 33 cancer types. (B) Forest plot of association between homologous recombination deficiency (HRD) score and progression-free interval (PFI). Results are shown for 28 cancer types with valid outcomes data. Cancer type symbols to the left are followed by sample number (N) included in the model and the number of PFI events. Hazard ratios (HR) and HR 95% confidence intervals are shown to the right. The “P Value” represents the Cox proportional hazards model p value for differences in survival between high versus low HRD score samples. *, a statistically significant association after applying a false discovery correction (threshold 10%). (C) Representative Kaplan-Meier (KM) survival curves for PFI of four cancer types as a function of high versus low HRD Scoring. Cancer samples in GBM, ESCA, PRAD, and ACC were defined as high HRD scoring if the HRD score was above the median within a cancer type. Log rank test p values are displayed in the top right-hand corner of each plot. (D) Volcano plot of significance and magnitude of DDR gene ridge regression coefficients. Our ridge regression model fitted the alteration status of 276 DDR genes to HRD score across 8,464 cancer samples. Homologous recombination repair (HR) genes above a significance threshold of FDR < 0.2 are plotted and labeled in red. (E) HRD scores of two cancer types stratified by BRCA1 and RAD51B alteration status. The two cancer types with the largest number of BRCA1 or RAD51B alterations are plotted to show HRD score distributions as a function of gene alteration status. Mann-Whitney U test p values are displayed above the bracket for each cancer-type-specific comparison. See also Figures S4 and S7.
Figure 4
Figure 4. Rare, Recurrent DDR Gene Mutations Differentially Alter Protein Structural Stability
Key DDR genes (listed in C as column labels) were selected for protein modeling based on the frequency of rare and recurrent mutations and experimentally determined structures that covered a majority of amino acid residues. (A) POLE mutations are clustered in protein functional domains. Spheres represent mutations colored by mutation frequency (boxed key top right) overlaid on a POLE structural model. 3D mutation hotspots are present in both the exonuclease (blue) and polymerase (green) domains. (B) Most MGMT somatic mutations likely alter protein structure. The location of mutations shown on a structural model as spheres, colored by predicted effect. Mutations with protein folding energy (ΔΔGfold) ≥ 3 kBT are predicted to be strongly destabilizing, whereas those altering stability by less than |ΔΔGfold| < 1 kBT were considered not significant (NS). (C) Many DDR gene somatic mutations are predicted to destabilize protein structure. Structure-based calculations of the effect of 1,380 mutations on ΔΔGfold are plotted, with the number of unique mutations/proteins given in parentheses below each protein name. (D) Altered protein stability is associated with greater burden of genomic alterations. Plot uses standardized Z scores (see STAR Methods) across cancer types to compare samples harboring strongly destabilizing versus non-destabilizing mutations. Association strength depended on the DDR gene, e.g., destabilizing mutations in POLB were associated with lower SCNA and higher mutation burdens, while mutations in PARP1 were associated with a higher SCNA and lower mutation burden. (E) Altered stability in four proteins was associated with a large shift (Z > 0.5) in SCNA burden. Split violin plots show the different distributions of SCNA burden among samples with a destabilizing versus non-destabilizing mutations in each gene. (F) Altered stability in five proteins was associated with a large shift (Z > 2.0) in mutation burden. See also Figure S5.
Figure 5
Figure 5. Machine Learning to Predict TP53 Inactivating Mutations in Cancer
(A) Robust classifier performance by receiver operating characteristic (ROC) and area under the ROC curve (AUROC). Training data, cross validation assessment, and held out test set (10%) for 19 cancer types were used. (B) Model-derived gene weighting. Classifier weights indicate individual gene influence on classification accuracy. Negative weights indicate increased gene expression in TP53 wild-type samples. (C) SCNA burden is correlated with known/predicted TP53 status. Plots show SCNA/CNV burden as fraction altered for known or predicted TP53 status. The SCNA profile for TP53 mutation c.375G>T in TP53 exon 4 appears similar to other TP53 loss events. (D) SCNA in TP53-interacting genes MDM2 and CDKN2A phenocopies TP53 loss. Results shown are for PanCanAtlas TP53 wild-type samples. (E) TP53 network gene alterations phenocopy TP53 deficiency. Mutations were manually curated and selected a priori. All mutation tests including only TP53 wild-type/non-hypermutated cancers are indicated by orange edges. Node color indicates event class (red, mutation; blue, copy-number loss; and purple, copy-number amplification); edge values indicate Cohen’s d effect size. Thin blue edges indicate predicted interactions from the STRING database. NS is “not significant” with p > 0.005. See also Figure S6.

Similar articles

See all similar articles

Cited by 78 articles

See all "Cited by" articles


    1. Abkevich V, Timms KM, Hennessy BT, Potter J, Carey MS, Meyer LA, Smith-McCune K, Broaddus R, Lu KH, Chen J, et al. Patterns of genomic loss of heterozygosity predict homologous recombination repair defects in epithelial ovarian cancer. Br. J. Cancer. 2012;107:1776–1782. - PMC - PubMed
    1. Adzhubei I, Jordan DM, Sunyaev SR. Curr. Protoc. Hum. Genet Chapter 7. John Wiley & Sons; 2013. Predicting functional effect of human missense mutations using PolyPhen-2. - PMC - PubMed
    1. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SAJR, Behjati S, Biankin AV, Bignell GR, Bolli N, Borg A, Børresen-Dale A-L, et al. Australian Pancreatic Cancer Genome Initiative; ICGC Breast Cancer Consortium; ICGC MMML-Seq Consortium; ICGC PedBrain. Signatures of mutational processes in human cancer. Nature. 2013;500:415–421. - PMC - PubMed
    1. Aravind L, Walker DR, Koonin EV. Conserved domains in DNA repair proteins and evolution of repair systems. Nucleic Acids Res. 1999;27:1223–1242. - PMC - PubMed
    1. Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D, Weerasinghe A, Colaprico A, Wendl MC, Kim J, Reardon B, et al. Comprehensive characterization of cancer driver genes and mutations. Cell. 2018;173 - DOI - PMC - PubMed

Publication types