Regulatory Single-Nucleotide Variant Predictor Increases Predictive Performance of Functional Regulatory Variants

Hum Mutat. 2016 Nov;37(11):1137-1143. doi: 10.1002/humu.23049. Epub 2016 Aug 31.


In silico methods for detecting functionally relevant genetic variants are important for identifying genetic markers of human inherited disease. Much research has focused on protein-coding variants since coding regions have well-defined physicochemical and functional properties. However, many bioinformatics tools are not applicable to variants outside coding regions. Here, we increase the classification performance of our regulatory single-nucleotide variant predictor (RSVP) for variants that cause regulatory abnormalities from an AUC of 0.90-0.97 by incorporating genomic regions identified by the ENCODE project into RSVP. RSVP is comparable to a recently published tool, Genome-Wide Annotation of Variants (GWAVA); both RSVP and GWAVA perform better on regulatory variants than a traditional variant predictor, combined annotation-dependent depletion (CADD). However, our method outperforms GWAVA on variants located at similar distances to the transcription start site as the positive set (AUC: 0.96) as compared with GWAVA (AUC: 0.71). Much of this disparity is due to RSVP's incorporation of features pertaining to the nearest gene (expression, GO terms, etc.), which are not included in GWAVA. Our findings hold out the promise of a framework for the assessment of all functional regulatory variants, providing a means to predict which rare or de novo variants are of pathogenic significance.

Keywords: ENCODE; RSVP; intergenic; regulatory variant; variant prediction; variant predictor.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural

MeSH terms

  • Computational Biology / methods*
  • Computer Simulation
  • Genetic Predisposition to Disease
  • Genome, Human
  • Genomics / methods*
  • Humans
  • Machine Learning
  • Polymorphism, Single Nucleotide*