Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 36, 508-516

Exhaustive Non-Synonymous Variants Functionality Prediction Enables High Resolution Characterization of the Neurofibromin Architecture

Affiliations

Exhaustive Non-Synonymous Variants Functionality Prediction Enables High Resolution Characterization of the Neurofibromin Architecture

Ofer Isakov et al. EBioMedicine.

Abstract

Background: Neurofibromatosis type I (NF1) is caused by heterozygous loss-of-function variants in the NF1 gene encoding neurofibromin which serves as a tumor suppressor that inhibits RAS signaling and regulates cell proliferation and differentiation. While, the only well-established functional domain in the NF1 protein is the GAP-related domain (GRD), most of the identified non-truncating disease-causing variants are located outside of this domain, supporting the existence of other important disease-associated domains. Identifying these domains may reveal novel functions of NF1.

Methods: By implementing inferential statistics combined with machine-learning methods, we developed a novel NF1-specific functional prediction model that focuses on nonsynonymous single nucleotide variants (SNVs). The model enables annotating all possible NF1 nonsynonymous variants, thus mapping the range of pathogenic non-truncating variants at the codon level across the NF1 gene.

Findings: The generated model demonstrates high absolute prediction value for missense and splice-site variations (area under the ROC curve of 0.96) outperforming 14 other established models. By reviewing the entire dataset of nonsynonymous variants, two novel domains (Armadillo type fold 1 and 2) were identified as being associated with pathogenicity (OR 1.86; CI 1.04 to 3.34 and OR 2.08; CI 1.08 to 4.04, respectively; P < .05). Specific exons and codons associated with increased pathogenicity were also detected along the gene inside and outside the GRD domain.

Interpretation: The developed model, enabled better prediction of pathogenicity for variants in NF1 gene, as well as elucidation of novel NF1-associated domains in addition to the GRD. FUND: This work was partially supported by the Kahn foundation. DGE is supported by the all Manchester NIHR Biomedical Research Centre (IS-brC-1215-20007).

Keywords: Functional annotation; Genetic variant; Machine learning; Neurofibromatosis 1; Variant prioritization.

Figures

Fig. 1
Fig. 1
Model performance. In order to compare the performance of the NF1-specific model and the other available functional prediction tools, we used a test variants dataset that was not used during the training of the model. These variants were scored by each of the tools and the model and the area under the receiver operating characteristic curve was calculated. The NF1-specific model demonstrated significant improvement in performance on the test set which includes only nonsynonymous variants and even better when including variants effecting splice sites as well.
Fig. 2
Fig. 2
Model-based analysis. After generating the NF1-specific prediction model, a score was calculated for the entire dataset of known nonsynonymous NF1 variants. Reviewing the rate of variants predicted to be pathogenic by our model in each exon (A), two main exonic regions were identified as having a significantly higher rate of pathogenic variants (red): exons 25–28 and exons 37–41. These exons correspond to the 5 prime regions of the RAS-GAP domain and the Armadillo type fold 2 domain respectively. The model based analysis also identified a significant decline in pathogenicity rate (blue) starting from exon 45 down to the last exon (57). With pathogenicity scores predicted for every possible nonsynonymous variant, specific codons with higher pathogenicity association could be identified (B). After correction for multiple hypothesis, 85 codons were found to have significantly more variants with a score higher than 0.538 (brown color; corresponding to a FPR of 1%) than would be expected (P < 0.01). In 17 of these codons, all of the variants were above the threshold (red color). While some of these codons already have known pathogenic variants in them (triangle), some represent a novel deleterious loci (point). The background represents the overall exon's odds ratio (with red representing positive association).
Fig. 3
Fig. 3
NF1-SPRED1 binding region. SPRED1 interacts with NF1 by binding to the N and C terminal regions of the RAS-GAP domain. The number of codons with at least one variant predicted to be pathogenic by the model (score above 0.538; corresponding to a FPR of 1%) was compared between these putative binding regions and the adjacent regions (100 base pairs on each side) showed an increase in variants predicted to be pathogenic in the N-terminal binding region (OR 2.19; 1.22 to 3.99; P = 0.005) but only a trend in the C-terminal (OR 2.18; 0.9 to 5.1; P = 0.072). Generating a score across all the codons identified a drop in pathogenicity rate beyond codon 1528. The dashed line marks the borders of the known binding region (codons 1176–1248 and 1477–1573 for the N and C terminal regions, respectively) and the solid line marks the borders of regions previously described as essential (Codons 1202–1217 and 1511 and 1530).
Supplementary Fig. 1
Supplementary Fig. 1
Genetic effect odds ratio. Variants were annotated with their predicted genetic effect, the rate of pathogenic variants was calculated for each genetic effect and an odds ratio was calculated in order to rank them according to their association with pathogenicity. Several variant types were significantly associated with pathogenicity: Stop gain (OR 1912.9; 903.92 to 8192; P < 1e-100), splice donor (A splice variant that changes the two base region at the 5′ end of an intron) (OR 651.8; 293 to 1983.1; P < 1e-100), and splice acceptor variants (A splice variant that changes the two base region at the 3′ end of an intron) (OR 286.9; 152.2 to 601.1; P < 1e-100), missense variants in a splice region (within 1–3 bases of the exon) (OR 27.5; 15.2 to 49.4; P < 1e-22), intron variants in a splice region (within 3–8 bases of the intron) (OR 7.8; 5.3 to 11.2; P < 1e-19), and missense variants (OR 7.1; 5.9 to 8.6; P < 1e-68). The following variant types were identified to be associated with a lower rate of pathogenicity: synonymous variants (OR 0.155; 0.032 to 0.457; P < 1e-04) and variants found in the 3 prime UTR (OR 0.138; 0.003 to 0.779; P < 0.01) and inside introns (OR 0.003; 0.002 to 0.004; P < 1e-100).
Supplementary Fig. 2
Supplementary Fig. 2
Per-exon pathogenicity rate. In order to identify exons across the NF1 gene that have different effects on pathogenicity, we compared each exon's pathogenic variants rate against all the rest. Although initially 8 exons were deemed significant (*) only one remained after correcting for multiple hypothesis testing (**).
Supplementary Fig. 3
Supplementary Fig. 3
Prediction tool performance. Each functional prediction tool produces a score for each variant, we compared the scores given by each tool to the known pathogenic and benign variants. The lower and upper hinges correspond to the first and third quartiles (the 25th and 75th percentiles), the whiskers correspond to 1.5* the inter quantile range (IQR) and the points are outliers. If the difference is large, the tool is considered to perform well in the task of differentiating benign from pathogenic variants. The tools at the top demonstrated the best performance.

Similar articles

See all similar articles

Cited by 1 article

References

    1. Adzhubei I.A., Schmidt S., Peshkin L., Ramensky V.E., Gerasimova A., Bork P. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7:248–249. - PMC - PubMed
    1. Carter H., Douville C., Stenson P.D., Cooper D.N., Karchin R. Identifying Mendelian disease genes with the variant effect scoring tool. BMC Genomics. 2013;14(Suppl. 3):S3. - PMC - PubMed
    1. Chang W., Cheng J., Allaire J.J., Xie Y., McPherson J., (RStudio, library) 2018. Shiny: Web Application Framework for R.
    1. Choi Y., Sims G.E., Murphy S., Miller J.R., Chan A.P. Predicting the functional effect of amino acid substitutions and indels. PLoS One. 2012;7 - PMC - PubMed
    1. Chun S., Fay J.C. Identification of deleterious mutations within three human genomes. Genome Res. 2009;19:1553–1561. - PMC - PubMed

LinkOut - more resources

Feedback