Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr 11;9:19.
doi: 10.1186/s12920-016-0178-5.

A Unified Analytic Framework for Prioritization of Non-Coding Variants of Uncertain Significance in Heritable Breast and Ovarian Cancer

Affiliations
Free PMC article

A Unified Analytic Framework for Prioritization of Non-Coding Variants of Uncertain Significance in Heritable Breast and Ovarian Cancer

Eliseos J Mucaki et al. BMC Med Genomics. .
Free PMC article

Abstract

Background: Sequencing of both healthy and disease singletons yields many novel and low frequency variants of uncertain significance (VUS). Complete gene and genome sequencing by next generation sequencing (NGS) significantly increases the number of VUS detected. While prior studies have emphasized protein coding variants, non-coding sequence variants have also been proven to significantly contribute to high penetrance disorders, such as hereditary breast and ovarian cancer (HBOC). We present a strategy for analyzing different functional classes of non-coding variants based on information theory (IT) and prioritizing patients with large intragenic deletions.

Methods: We captured and enriched for coding and non-coding variants in genes known to harbor mutations that increase HBOC risk. Custom oligonucleotide baits spanning the complete coding, non-coding, and intergenic regions 10 kb up- and downstream of ATM, BRCA1, BRCA2, CDH1, CHEK2, PALB2, and TP53 were synthesized for solution hybridization enrichment. Unique and divergent repetitive sequences were sequenced in 102 high-risk, anonymized patients without identified mutations in BRCA1/2. Aside from protein coding and copy number changes, IT-based sequence analysis was used to identify and prioritize pathogenic non-coding variants that occurred within sequence elements predicted to be recognized by proteins or protein complexes involved in mRNA splicing, transcription, and untranslated region (UTR) binding and structure. This approach was supplemented by in silico and laboratory analysis of UTR structure.

Results: 15,311 unique variants were identified, of which 245 occurred in coding regions. With the unified IT-framework, 132 variants were identified and 87 functionally significant VUS were further prioritized. An intragenic 32.1 kb interval in BRCA2 that was likely hemizygous was detected in one patient. We also identified 4 stop-gain variants and 3 reading-frame altering exonic insertions/deletions (indels).

Conclusions: We have presented a strategy for complete gene sequence analysis followed by a unified framework for interpreting non-coding variants that may affect gene expression. This approach distills large numbers of variants detected by NGS to a limited set of variants prioritized as potential deleterious changes.

Keywords: Hereditary breast and ovarian cancer; Information theory; Next-generation sequencing; Non-coding; Prioritization; RNA-binding protein; Splicing; Transcription factor binding; Variants of uncertain significance.

Figures

Fig. 1
Fig. 1
Capture Probe Coverage over Sequenced Genes. The genomic structure of the 7 genes chosen are displayed with the UCSC Genome Browser. Top row for each gene is a custom track with the “dense” visualization modality selected with black regions indicating the intervals covered by the oligonucleotide capture reagent. Regions without probe coverage contain conserved repetitive sequences or correspond to paralogous sequences that are unsuitable for probe design
Fig. 2
Fig. 2
Framework for the Identification of Potentially Pathogenic Variants. Integrated laboratory processing and bioinformatic analysis procedures for comprehensive complete gene variant determination and analysis. Intermediate datasets resulting from filtering are represented in yellow and final datasets in green. Non-bioinformatic steps, such as sample preparation are represented in blue and prediction programs in purple. Sequencing analysis yields base calls for all samples. CASAVA [85] and CRAC [86] were used to align these sequencing results to hg19. GATK [88] was used to call variants from this data against GRCh37 release of the reference human genome. Variants with a quality score < 50 and/or call confidence score < 30 were eliminated along with variants falling outside of our target regions. SNPnexus [–114] was used to identify the genomic location of the variants. Nonsense and indels were noted and prediction tools were used to assess the potential pathogenicity of missense variants. The Shannon Pipeline [91] evaluated the effect of a variant on natural and cryptic SSs, as well as SRFBSs. ASSEDA [38] was used to predict the potential isoforms as a result of these variants. PWMs for 83 TFs were built using an information weight matrix generator based on Bipad [106]. Mutation Analyzer evaluated the effect of variants found 10 kb upstream up to the first intron on protein binding. Bit thresholds (R i values) for filtering variants on software program outputs are indicated. Variants falling within the UTR sequences were assessed using SNPfold [20], and the most probable variants that alter mRNA structure (p < 0.1) were then processed using mFold to predict the effect on stability [83]. All UTR variants were scanned with a modified version of the Shannon Pipeline, which uses PWMs computed from nucleotide frequencies for 28 RBPs in RBPDB [109] and 76 RBPs in CISBP-RNA [110]. All variants meeting these filtering criteria were verified with IGV [89, 90]. *Sanger sequencing was only performed for protein truncating, splicing, and selected missense variants
Fig. 3
Fig. 3
Predicted Isoforms and Relative Abundances as a Consequence of ATM splice variant c.3747-1G > A. Intronic ATM variant c.3747-1G > A abolishes (11.0 to 0.1 bits) the natural acceptor of exon 26 (total of 63 exons). a ASSEDA predicts skipping of the natural exon (R i,total from 14.5 to 3.6 bits [a 1910 fold decrease in exon strength]; isoform 7) and/or activation of a pre-existing cryptic acceptor site 13 nt downstream (R i,total for cryptic exon = 9.0 bits; isoform 1) of the natural site leading to exon truncation. The reading frame is altered in both mutant isoforms. The other isoforms use weak, alternate acceptor/donor sites leading to cryptic exons with much lower total information. b Before the mutation, isoform 7 is expected to be the most abundant splice form. c After the mutation, isoform 1 is predicted to become the most abundant splice form and the wild-type isoform is not expected to be expressed
Fig. 4
Fig. 4
Predicted Isoforms and Relative Abundances as a Consequence of CHEK2 splice variant c.320-5 T > A. Intronic CHEK2 variant c.320-5 T > A weakens (6.8 to 4.1 bits) the natural acceptor of exon 3 (total of 15 exons). a ASSEDA reports the weakening of the natural exon strength (R i,total reduced from 13.2 to 10.5 bits), which would result in reduced splicing of the exon otherwise known as leaky splicing. A pre-existing cryptic acceptor exists 92 nt upstream of the natural site, leading to a cryptic exon with similar strength to the mutated exon (R i,total = 10.0 bits). This cryptic exon would contain 92 nt of the intron. b Before the mutation, isoform 1 is expected to be the only isoform expressed. c After the mutation, isoform 1 (wild-type) is predicted to become relatively less abundant and isoform 2 is expected to be expressed, although less abundant in relation to isoform 1
Fig. 5
Fig. 5
Predicted Alteration in UTR Structure Using mFOLD for Variants Flagged by SNPfold. Wild-type and variant structures are displayed, with the variant indicated by a red arrow. a Predicted wild-type structure of CDH1 5’UTR surrounding c.-71. b Predicted CDH1 5’UTR structure due to c.-71C > G variant. c Predicted wild-type TP53 3’UTR structure surrounding c.*485. d Predicted TP53 5’UTR structure due to c.*485G > A variant. e Predicted wild-type TP53 3’UTR structure surrounding c.*826. f Predicted TP53 5’UTR structure due to c.*826G > A variant. §SHAPE analysis revealed differences in reactivity between mutant and variant mRNAs, confirming alterations to 2° structure
Fig. 6
Fig. 6
Ladder Plot Representing Variant Identification and Prioritization. Each line is representative of a different sample in each sequencing run (a-e), illustrating the number of unique variants at important steps throughout the variant prioritization process. The left-most point indicates the total number of unique variants. The second point represents the number of unique variants remaining after common (>5 patients within cohort and/or ≥ 1.0 % allele frequency) and false-positive variants were removed. The right-most point represents the final number of unique. No variants were prioritized in the following patients: 2-1A, 2-5A, 2-6A, 3-2A, 3-3A, 3-4A, 3-5A, 3-8A, 4-1B, 4-2C, 4-2 F, 4-3B, 4-3D, 4-4B, 4-4E, 5-1G, 5-1H, 5-3D, 5-4C, 5-4D, 5-4 F, 5-4G, 5-4H, 7-1B, 7-1C, 7-1D, 7-1H, 7-2B, 7-2C, 7-2H, 7-3H, 7-4A, 7-4D, 7-4H. The average number of variants per patient at each step is indicated in a table below each plot, along with the percent reduction in variants from one step to another

Similar articles

See all similar articles

Cited by 6 articles

See all "Cited by" articles

References

    1. Collins FS, Hamburg MA. First FDA authorization for next-generation sequencer. N Engl J Med. 2013;369:2369–2371. doi: 10.1056/NEJMp1314561. - DOI - PMC - PubMed
    1. Green ED, Guyer MS, National Human Genome Research Institute Charting a course for genomic medicine from base pairs to bedside. Nature. 2011;470:204–213. doi: 10.1038/nature09764. - DOI - PubMed
    1. Cassa CA, Savage SK, Taylor PL, Green RC, McGuire AL, Mandl KD. Disclosing pathogenic genetic variants to research participants: Quantifying an emerging ethical responsibility. Genome Res. 2012;22:421–428. doi: 10.1101/gr.127845.111. - DOI - PMC - PubMed
    1. Domchek SM, Bradbury A, Garber JE, Offit K, Robson ME. Multiplex genetic testing for cancer susceptibility: out on the high wire without a net? J Clin Oncol. 2013;31:1267–1270. doi: 10.1200/JCO.2012.46.9403. - DOI - PubMed
    1. Yorczyk A, Robinson LS, Ross TS. Use of panel tests in place of single gene tests in the cancer genetics clinic. Clin Genet. 2015;88:278–282. doi: 10.1111/cge.12488. - DOI - PubMed

MeSH terms

Feedback