Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 30;7:41329.
doi: 10.1038/srep41329.

Computational Predictors Fail to Identify Amino Acid Substitution Effects at Rheostat Positions

Affiliations
Free PMC article

Computational Predictors Fail to Identify Amino Acid Substitution Effects at Rheostat Positions

M Miller et al. Sci Rep. .
Free PMC article

Abstract

Many computational approaches exist for predicting the effects of amino acid substitutions. Here, we considered whether the protein sequence position class - rheostat or toggle - affects these predictions. The classes are defined as follows: experimentally evaluated effects of amino acid substitutions at toggle positions are binary, while rheostat positions show progressive changes. For substitutions in the LacI protein, all evaluated methods failed two key expectations: toggle neutrals were incorrectly predicted as more non-neutral than rheostat non-neutrals, while toggle and rheostat neutrals were incorrectly predicted to be different. However, toggle non-neutrals were distinct from rheostat neutrals. Since many toggle positions are conserved, and most rheostats are not, predictors appear to annotate position conservation better than mutational effect. This finding can explain the well-known observation that predictors assign disproportionate weight to conservation, as well as the field's inability to improve predictor performance. Thus, building reliable predictors requires distinguishing between rheostat and toggle positions.

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. Experimental differentiation of toggle and rheostat positions.
The left panel shows an example of a toggle position (tyrosine in position 47): Relative to wild-type (value normalized to 1), most substitutions at LacI position 47 abolish transcription repression of the reporter-gene. The right panel shows an example of a rheostat position (valine in position 52): Variants at this position in LacI exhibit a wide range of repression levels relative to wild-type (value normalized to 1). Data for position 52 (right panel) are adapted from11; the dark gray bar shows the ratio of no-repression (full expression of the reporter gene) to repression by wild-type LacI. Data for position 47 (left panel) were adapted from. Briefly, the earlier study categorized these semi-quantitative data relative to the activity of un-repressed reporter gene (i.e., in the absence of repressor protein). For this figure, we translated the semi-quantitative ranges to the quantitative scale using the “none” value on the right panel.
Figure 2
Figure 2. Locations of toggle and rheostat position sets on the structure of the LacI homodimer bound to DNA (PDB 1EFA; visualized with PyMOL).
On one monomer, positions are colored by the sets described in the text. Note that smaller sets are included in the larger sets. For example, toggle_12 positions are also part of toggle_50. Chain B (identical to Chain A) is shown in the background at 50% transparency. DNA is shown as a double helix at the top of the figure.
Figure 3
Figure 3. Variant-effect predictors vary in features and development data used.
The 16 publicly available variant-effect prediction algorithms can be broadly grouped by use of (i) basic biological principles and evolutionary information, (ii) pattern recognition techniques and machine learning, and (iii) meta/ensemble predictors.
Figure 4
Figure 4. Distributions of variant scores from continuous prediction methods differ between rheostat and toggle positions (stringent set).
Panel (a) shows the distributions expected from an ideal variant-effect predictor, while panel (b) shows the distributions determined for neutral and non-neutral variants at both rheostat and toggle positions in the stringent set. These four predictors were selected on the basis of top performance in differentiating rheostat non-neutrals from rheostat neutrals. Results for all other predictors are in Supplementary Fig. 1. The violin plot is an augmented box plot where the width at any given Y-axis value indicates the probability density of the data (median, white circles; interquartile range, box outline). The p-values in the legend are from a Kolmogorov-Smirnov (KS) test, indicating whether a method can significantly distinguish between the two distributions pointed to by the respective arrows. Results from the complete and extended sets are in Supplementary Figs 2 and 3.
Figure 5
Figure 5. Correlation between experimentally measured fold-changes and predicted variant-effect scores.
Panels (a) SNAP2; (b) PROVEAN; (c) MutPred2; (d) PolyPhen-2 show the relationship of the computationally and experimentally derived scores. For each variant at all rheostat positions, fold-change in repression relative to wild-type LacI is shown on log scale (Y axis), whereas predicted scores are normalized to the linear range [0, 1] (X axis). The blue area depicts the scores expected for neutral variants (fold-change between 0.5 and 2.0); the green area depicts scores expected for non-neutral variants. The Pearson product-moment correlation coefficient (Pearson’s r) is given for the rheostat_9 set. Results from other predictors are in Supplementary Fig. 4.

Similar articles

See all similar articles

Cited by 10 articles

See all "Cited by" articles

References

    1. Bruse S. et al. . Whole exome sequencing identifies novel candidate genes that modify chronic obstructive pulmonary disease susceptibility. Hum Genomics 10, 1, doi: 10.1186/s40246-015-0058-7 (2016). - DOI - PMC - PubMed
    1. Ellinghaus D. et al. . Association between variants of PRDM1 and NDP52 and Crohn’s disease, based on exome sequencing and functional studies. Gastroenterology 145, 339–347, doi: 10.1053/j.gastro.2013.04.040 (2013). - DOI - PMC - PubMed
    1. Turner T. N. et al. . Genome Sequencing of Autism-Affected Families Reveals Disruption of Putative Noncoding Regulatory DNA. Am J Hum Genet 98, 58–74, doi: 10.1016/j.ajhg.2015.11.023 (2016). - DOI - PMC - PubMed
    1. Bromberg Y. Building a genome analysis pipeline to predict disease risk and prevent disease. J Mol Biol 425, 3993–4005, doi: 10.1016/j.jmb.2013.07.038 (2013). - DOI - PubMed
    1. Dong C. et al. . Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Hum Mol Genet 24, 2125–2137, doi: 10.1093/hmg/ddu733 (2015). - DOI - PMC - PubMed

Publication types

MeSH terms

Feedback