In silico methods for detecting functionally relevant genetic variants are important for identifying genetic markers of human inherited disease. Much research has focused on protein-coding variants since coding regions have well-defined physicochemical and functional properties. However, many bioinformatics tools are not applicable to variants outside coding regions. Here, we increase the classification performance of our regulatory single-nucleotide variant predictor (RSVP) for variants that cause regulatory abnormalities from an AUC of 0.90-0.97 by incorporating genomic regions identified by the ENCODE project into RSVP. RSVP is comparable to a recently published tool, Genome-Wide Annotation of Variants (GWAVA); both RSVP and GWAVA perform better on regulatory variants than a traditional variant predictor, combined annotation-dependent depletion (CADD). However, our method outperforms GWAVA on variants located at similar distances to the transcription start site as the positive set (AUC: 0.96) as compared with GWAVA (AUC: 0.71). Much of this disparity is due to RSVP's incorporation of features pertaining to the nearest gene (expression, GO terms, etc.), which are not included in GWAVA. Our findings hold out the promise of a framework for the assessment of all functional regulatory variants, providing a means to predict which rare or de novo variants are of pathogenic significance.
Keywords: ENCODE; RSVP; intergenic; regulatory variant; variant prediction; variant predictor.
© 2016 WILEY PERIODICALS, INC.