Optimization of in silico tools for predicting genetic variants: individualizing for genes with molecular sub-regional stratification

Brief Bioinform. 2020 Sep 25;21(5):1776-1786. doi: 10.1093/bib/bbz115.


Genes are unique in functional role and differ in their sensitivities to genetic defects, but with difficulties in pathogenicity prediction. This study attempted to improve the performance of existing in silico algorithms and find a common solution based on individualization strategy. We initiated the individualization with the epilepsy-related SCN1A variants by sub-regional stratification. SCN1A missense variants related to epilepsy were retrieved from mutation databases, and benign missense variants were collected from ExAC database. Predictions were performed by using 10 traditional tools with stepwise optimizations. Model predictive ability was evaluated using the five-fold cross-validations on variants of SCN1A, SCN2A, and KCNQ2. Additional validation was performed in SCN1A variants of damage-confirmed/familial epilepsy. The performance of commonly used predictors was less satisfactory for SCN1A with accuracy less than 80% and varied dramatically by functional domains of Nav1.1. Multistep individualized optimizations, including cutoff resetting, domain-based stratification, and combination of predicting algorithms, significantly increased predictive performance. Similar improvements were obtained for variants in SCN2A and KCNQ2. The predictive performance of the recently developed ensemble tools, such as Mendelian clinically applicable pathogenicity, combined annotation-dependent depletion and Eigen, was also improved dramatically by application of the strategy with molecular sub-regional stratification. The prediction scores of SCN1A variants showed linear correlations with the degree of functional defects and the severity of clinical phenotypes. This study highlights the need of individualized optimization with molecular sub-regional stratification for each gene in practice.

Keywords: SCN1A; epilepsy; in silico prediction; molecular sub-regional stratification; pathogenicity; sequence variants.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • Databases, Genetic
  • Genetic Variation*
  • Humans
  • KCNQ2 Potassium Channel / genetics
  • NAV1.1 Voltage-Gated Sodium Channel / genetics
  • NAV1.2 Voltage-Gated Sodium Channel / genetics


  • KCNQ2 Potassium Channel
  • NAV1.1 Voltage-Gated Sodium Channel
  • NAV1.2 Voltage-Gated Sodium Channel
  • SCN1A protein, human
  • SCN2A protein, human