Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 35 (11), 3823-35

SNAP: Predict Effect of Non-Synonymous Polymorphisms on Function


SNAP: Predict Effect of Non-Synonymous Polymorphisms on Function

Yana Bromberg et al. Nucleic Acids Res.


Many genetic variations are single nucleotide polymorphisms (SNPs). Non-synonymous SNPs are 'neutral' if the resulting point-mutated protein is not functionally discernible from the wild type and 'non-neutral' otherwise. The ability to identify non-neutral substitutions could significantly aid targeting disease causing detrimental mutations, as well as SNPs that increase the fitness of particular phenotypes. Here, we introduced comprehensive data sets to assess the performance of methods that predict SNP effects. Along we introduced SNAP (screening for non-acceptable polymorphisms), a neural network-based method for the prediction of the functional effects of non-synonymous SNPs. SNAP needs only sequence information as input, but benefits from functional and structural annotations, if available. In a cross-validation test on over 80,000 mutants, SNAP identified 80% of the non-neutral substitutions at 77% accuracy and 76% of the neutral substitutions at 80% accuracy. This constituted an important improvement over other methods; the improvement rose to over ten percentage points for mutants for which existing methods disagreed. Possibly even more importantly SNAP introduced a well-calibrated measure for the reliability of each prediction. This measure will allow users to focus on the most accurate predictions and/or the most severe effects. Available at


Figure 1.
Figure 1.
Performance on PMD/EC data. ROC-like curves giving accuracy versus coverage [Equation (2)] for different prediction methods. SIFT predictions range from 0 to 1; thus the performance of SIFT can be analyzed for the entire accuracy/coverage spectrum, however, SIFT has been evaluated using a default threshold of 0.05. PolyPhen predictions are not scaled; instead, they are sorted by the gravity of the impact (benign, possibly damaging, probably damaging and unknown). Therefore, we could not ‘dial’ through the PolyPhen cutoff to generate a ROC-like curve for PolyPhen. Two points on the graph indicate the difference in performance due to assignment of ‘possibly damaging’ class to non-neutral or neutral categories (default ‘possibly damaging’ = damaging). SNAP and SNAPannotated default thresholds are 0. The defaults for each method are indicated by arrows corresponding in color to the method. The left panel (A) gives the performance for non-neutral SNP mutants; the right panel (B) gives the performance for neutral SNP mutants.
Figure 2.
Figure 2.
Stronger predictions more accurate. Stronger SNAP predictions were more accurate. This allowed the introduction of a reliability index for SNAP predictions [Equation (6)]. This index effectively predicted the accuracy of a prediction and thereby enables users to focus on more reliable predictions. The x-axis gives the percentage of residues that were predicted above a given reliability index. The actual values of the reliability index (RI) are shown by numbers in italics above the curve for neutral mutations (green), and below the curve for non-neutral mutations (red). The values for 7 and 8 are not explicitly given to avoid confusion. The y-axis shows the cumulative percentage of residues correctly predicted of all those predicted with RI ≥ n [accuracy, Equation (2)]. Curves are shown for SNPs with neutral (green diamonds) and non-neutral (red squares) effects. For instance, ∼38% of both types of residues are predicted at indices ≥5; of all the non-neutral mutations predicted at this threshold, about 90% are predicted correctly, and of all the neutral SNPs about ∼92% are predicted correctly.
Figure 3.
Figure 3.
SNAP versus PolyPhen on subsets of PMD/EC data. PolyPhen uses different types of input information. Here, we separately analyzed the relevance of each of these sources. Annotation: residues for which sequence annotations were available (e.g. binding site or transmembrane region), Structure: residues for which experimental structural constraints were available, Alignment: residues for which only alignments (PSIC scores) were available, and Unknown: residues that were not classified by PolyPhen. The bars for All give the performance on the entire data set for orientation. (A) Total number of correct predictions in each class, (B) Accuracy in each group [Equation (2)]. For the experimental annotations in the PMD/EC data set, only alignment information was available for most mutants. SNAP performs slightly better than PolyPhen in the absence of experimental 3D structure and/or annotation, and slightly worse otherwise. Including SWISS-PROT annotations and SIFT predictions into SNAP improved performance for all groups.
Figure 4.
Figure 4.
Stronger signals for more severe changes. The reliability index [Equation (6)] of SNAP reflected prediction accuracy (Figure 2). However, we also observed that more severe changes were predicted more reliably, i.e. resulted in a higher difference between the two output units of SNAP. In order to distinguish mutants according to the severity of the change they cause, we used the functional effects observed for the LacI repressor (set from (12), as used in testing SIFT). Samples in the ‘very slightly damaging’ and ‘slightly damaging’ category were combined into a single ‘intermediate’ category. Given are normalized percentages of samples in each category (y-axis) for a given range of difference values (x-axis). We normalized the predictions in each ‘severity group’ because the samples for ‘intermediate effects’ were significantly under-represented in the experimental data.

Similar articles

See all similar articles

Cited by 298 articles

See all "Cited by" articles


    1. Kawabata T, Ota M, Nishikawa K. The protein mutant database. Nucleic Acids Res. 1999;27:355–357. - PMC - PubMed
    1. Nishikawa K, Ishino S, Takenaka H, Norioka N, Hirai T, Yao T, Seto Y. Constructing a protein mutant database. Protein Eng. 1994;7:773. - PubMed
    1. Sunyaev SR, Eisenhaber F, Rodchenkov IV, Eisenhaber B, Tumanyan VG, Kuznetsov EN. PSIC: profile extraction from sequence alighnments with position-specific counts of independent observations. Protein Eng. 1999;12:387–394. - PubMed
    1. Chakravarti A. To a future of genetic medicine. Nature. 2001;409:822–823. - PubMed
    1. The FANTOM Consortium. Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, et al. The Transcriptional landscape of the mammalian genome. Science. 2005;309:1559–1563. - PubMed

Publication types