Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb;40(7):1347-1361.
doi: 10.1038/s41388-020-01614-3. Epub 2021 Jan 8.

Comprehensive characterisation of intronic mis-splicing mutations in human cancers

Affiliations
Free PMC article

Comprehensive characterisation of intronic mis-splicing mutations in human cancers

Hyunchul Jung et al. Oncogene. 2021 Feb.
Free PMC article

Abstract

Previous studies studying mis-splicing mutations were based on exome data and thus our current knowledge is largely limited to exons and the canonical splice sites. To comprehensively characterise intronic mis-splicing mutations, we analysed 1134 pan-cancer whole genomes and transcriptomes together with 3022 normal control samples. The ratio-based splicing analysis resulted in 678 somatic intronic mutations, with 46% residing in deep introns. Among the 309 deep intronic single nucleotide variants, 245 altered core splicing codes, with 38% activating cryptic splice sites, 12% activating cryptic polypyrimidine tracts, and 36% and 12% disrupting authentic polypyrimidine tracts and branchpoints, respectively. All the intronic cryptic splice sites were created at pre-existing GT/AG dinucleotides or by GC-to-GT conversion. Notably, 85 deep intronic mutations indicated gain of splicing enhancers or loss of splicing silencers. We found that 64 tumour suppressors were affected by intronic mutations and blood cancers showed higher proportion of deep intronic mutations. In particular, a telomere maintenance gene, POT1, was recurrently mis-spliced by deep intronic mutations in blood cancers. We validated a pseudoexon activation involving a splicing silencer in POT1 by CRISPR/Cas9. Our results shed light on previously unappreciated mechanisms by which noncoding mutations acting on splicing codes in deep introns contribute to tumourigenesis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Fig. 1
Fig. 1. Read ratio-based identification of abnormal splicing and allele-based validation.
a Schematic representation of different types of aberrant splicing events. b Summary and examples of allele-based validation. Among mutations within 30 bp from activated cryptic SSs, those spanned by ≥3 RNA-seq reads were defined as verifiable variants (Supplementary Table7). Allele specificity was assessed by the 2 × 2 contingency table. For example, all RNA-seq reads covering the cryptic SS (n = 9, red) of TP53 carried the mutant allele (T). In addition, exonic mutations within 10 bp from the exon-intron junction of skipped exons were collected as verifiable variants and assessed by the 2 × 2 contingency table. For example, there was a significant underrepresentation of the mutant allele, G, in the RNA-seq reads covering this region of HLA-B. c–g Read ratios and allelic patterns of mis-splicing examples. In the browser views, the RNA-seq reads supporting abnormal and normal splicing are shown in pink/orange and blue, respectively, with mutations highlighted in dark red. Below, cryptic and authentic SSs are marked in red and blue, respectively, with wild-type sequences shown in green. The schematic illustrations include the number of abnormally (pink/orange) and normally (blue) spliced RNA-seq reads. c Proximal intronic mutation causing partial intron retention in FUBP1. The proportion of abnormally spliced reads over all reads (ratio-I = 7.9%) was calculated from a lower grade glioma sample (DU-7009). Null distributions of read ratios were derived from 3020 normal and 1133 cancer samples without mutations in FUBP1. The observed ratio was then transformed into the Z-score, and its right-tail P value was estimated. The strength of the cryptic donor SS was estimated using MAXENT for sequences 3 bp upstream to 6 bp downstream of the cryptic splice site with the mutant and wile-type allele. d Recurrent mutations causing full intron retention in TMSB4X. The identical proximal intronic mutations were identified in three different malignant lymphoma samples. The results of the ratio-based analysis and SS strength analysis (right) are from MALY-4189035. e Deep intronic BP mutation leading to partial intron retention in STK11 by two different cryptic acceptor SSs, indicated in pink and orange. The cryptic SSs associated with the partial intron retention (ratio-a and -b) are observed at 3 bp and 56 bp upstream of the variant, supported by the red and orange reads, respectively. The U2 binding free energy was calculated for sequences 5 bp upstream to 3 bp downstream of the BP for the mutant and wile-type allele. f Deep intronic mutation activating a pseudoexon through the gain of a cryptic donor SS in DROSHA. The strength of the cryptic donor SS with the mutant or wild-type allele was compared to the authentic donor SS of exon 17. g Deep intronic mutation activating a pseudoexon through the gain of a splicing enhancer in DNM2. Sequences ±5 bp from the mutation with the mutant or wild-type allele were matched with known splicing enhancer/silencer motifs.
Fig. 2
Fig. 2. Distribution of mis-splicing mutations near exon-intron junctions.
Intronic mutations 1–2 bp, 3–20 bp, and >20 bp away from the nearest exon-intron junction were classified as SS, proximal, and deep intronic mutations, respectively. For deep intronic mutations, we limited our analysis to those that were surrounding BPs (right inset in the top panel) or activated cryptic SSs (bottom panel). Fraction of mis-splicing mutations over all variants found at each position (top panel). The count of mutations near BPs and U2 binding residues (5 bp upstream to 3 bp downstream of the BPs) is shown in the right inset. Distribution of deep intronic mis-splicing mutations (bottom panel) according to the distance from activated cryptic acceptor SSs (caSSs) or cryptic donor SSs (cdSSs). Wild-type dinucleotides at cryptic SSs are shown in green. Mutation counts according to the distance from the nearest authentic exon-intron junction are shown in the right inset.
Fig. 3
Fig. 3. Alterations of splicing codes by intronic mutations.
a Loss of core splicing codes by proximal intronic mutations. Differences in the strength of SSs or PPTs between the mutant and wild-type allele, as estimated by MAXENT [27]. The Wilcoxon signed-rank test was used to assess the statistical significance. b Changes in sequence consensus by proximal intronic or exonic mutations near donor SSs. The consensus motifs were derived from the 9-mers of the authentic donor SSs with the GT dinucleotide at the border. c Gain of core splicing codes by deep intronic mutations. Shown are the differences in the strength of SSs or PPTs between the mutant and wild-type allele. Also compared is the strength of the corresponding authentic SSs with the reference sequences. d Changes in sequence consensus by deep intronic mutations at cryptic donor SSs. The motifs were derived from the 9-mers of cryptic donor SSs, with the GT dinucleotide at the border. e Differences in the U2 binding free energy between the mutant and wild-type allele. The binding free energy of U2 snRNA was calculated for sequences 5 bp upstream to 3 bp downstream of BPs while excluding the BP nucleotide by using the Vienna RNA package [56]. f Alterations of auxiliary splicing codes by deep intronic mutations. The nucleotide sequences ±5 bp from the mutation with the mutant or wild-type allele were matched with the known sets of motifs. The number of sequences present in each motif set is shown above the bar, after removing duplicate instances. The P values were calculated by random permutation tests.
Fig. 4
Fig. 4. Mis-splicing mutations in tumour suppressor genes and prediction models.
a Enrichment of the identified mis-splicing mutations in cancer driver genes commonly altered by splice site mutations. We used genes with type ‘S’ (splice site) mutations in the COSMIC census [49]. The P values were calculated by random permutation tests. b Fraction of PTC-generating mutations in TSGs, oncogenes (OGs), and essential genes [64]. According to the PTC location, the PTC-generating variants were divided into NMD-sensitive and NMD-insensitive groups. The P values were assessed using the one-sided Fisher’s exact test on the 2 × 3 contingency table. c Effect of NMD-sensitive PTCs on mRNA expression by gene class. For each gene per cancer type, the control expression level was calculated by averaging the gene expression levels of wild-type samples (≥5). The P values were calculated using the two-sided Mann–Whitney U test. d Performance of the prediction models. Shown are the accuracies for the intronic mutations near the donor SS (blue; +3 to +6) or acceptor SS (green; +3), and exonic mutations flanking the donor SS (purple; −2 to −1) in comparison to the accuracies of the same models with randomly selected mutations at the corresponding SSs (grey). e Example of feature value distribution. The density plot shows the mean of feature values (maxent score of 5′spliceMUT – 5′spliceWT) used in the predictive model for intronic mutations near the donor SS. The distribution of expected values was obtained from random mutation selection (100,000 trials).
Fig. 5
Fig. 5. Relevance of mis-splicing mutations in tumourigenesis.
a Proportion of samples with mis-splicing mutations compared with those having traditional truncating mutations. Known tumour suppressor genes for which intronic mutations (proximal and deep) account for >5% of the samples are shown. The samples with predicted proximal and exonic mutations were included. b Proportion of mutant samples in RB1 per tissue type. c Proportion of mutant samples in lymphoid tissue. Potential tumour suppressor genes were included. d Deep intronic mutations leading to pseudoexon activation in POT1. The results of the read ratio analysis and motif analysis are for the event identified in a chronic lymphocytic leukaemia sample (CLLE-677). Sequences ±5 bp from the mutation with the mutant or wild-type allele were matched with known splicing enhancer/silencer motifs. e Validation for CRISPR/Cas9-mediated deletion of a splicing silencer motif. The splicing silencer motif in POT1 disrupted by a mutation in MALY-4120193 (Fig. 5d bottom) was deleted by using CRISPR/Cas9 genome editing. This deletion was verified by Sanger sequencing. RT-PCR using primers spanning exons 6 and 7 showed cryptic exon activation, which was confirmed by Sanger sequencing.

Similar articles

Cited by

References

    1. Alexander RP, Fang G, Rozowsky J, Snyder M, Gerstein MB. Annotating non-coding regions of the genome. Nat Rev Genet. 2010;11:559–71. - PubMed
    1. Kornblihtt AR, Schor IE, Alló M, Dujardin G, Petrillo E, Muñoz MJ. Alternative splicing: a pivotal step between eukaryotic transcription and translation. Nat Rev Mol Cell Biol. 2013;14:153–65. - PubMed
    1. Fu XD, Ares M, Jr. Context-dependent control of alternative splicing by RNA-binding proteins. Nat Rev Genet. 2014;15:689–701. - PMC - PubMed
    1. Mercer TR, Clark MB, Andersen SB, Brunck ME, Haerty W, Crawford J, et al. Genome-wide discovery of human splicing branchpoints. Genome Res. 2015;25:290–303. - PMC - PubMed
    1. Sickmier EA, Frato KE, Shen H, Paranawithana SR, Green MR, Kielkopf CL. Structural Basis for Polypyrimidine Tract Recognition by the Essential Pre-mRNA Splicing Factor U2AF65. Mol Cell. 2006;23:49–59. - PMC - PubMed

Publication types