Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan;37(1):28-35.
doi: 10.1002/humu.22911. Epub 2015 Oct 26.

Assessing the Pathogenicity of Insertion and Deletion Variants With the Variant Effect Scoring Tool (VEST-Indel)

Free PMC article

Assessing the Pathogenicity of Insertion and Deletion Variants With the Variant Effect Scoring Tool (VEST-Indel)

Christopher Douville et al. Hum Mutat. .
Free PMC article


Insertion/deletion variants (indels) alter protein sequence and length, yet are highly prevalent in healthy populations, presenting a challenge to bioinformatics classifiers. Commonly used features--DNA and protein sequence conservation, indel length, and occurrence in repeat regions--are useful for inference of protein damage. However, these features can cause false positives when predicting the impact of indels on disease. Existing methods for indel classification suffer from low specificities, severely limiting clinical utility. Here, we further develop our variant effect scoring tool (VEST) to include the classification of in-frame and frameshift indels (VEST-indel) as pathogenic or benign. We apply 24 features, including a new "PubMed" feature, to estimate a gene's importance in human disease. When compared with four existing indel classifiers, our method achieves a drastically reduced false-positive rate, improving specificity by as much as 90%. This approach of estimating gene importance might be generally applicable to missense and other bioinformatics pathogenicity predictors, which often fail to achieve high specificity. Finally, we tested all possible meta-predictors that can be obtained from combining the four different indel classifiers using Boolean conjunctions and disjunctions, and derived a meta-predictor with improved performance over any individual method.

Keywords: bioinformatics pathogenicity predictor; in-frame frameshift; indel; insertion deletion variant; meta-predictor.

Similar articles

See all similar articles

Cited by 24 articles

See all "Cited by" articles


    1. Altschul SF, Madden TL, Schaeffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped BLAST and PSI‐BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402. - PMC - PubMed
    1. Boutet E, Lieberherr D, Tognolli M, Schneider M, Bairoch A. 2007. Uniprotkb/Swiss‐Prot. Plant Bioinformatics: Springer; p 89–112. - PubMed
    1. Breiman L. 2001. Random forests. Mach Learn 45:5–32.
    1. Capriotti E, Altman RB. 2011. A new disease‐specific machine learning approach for the prediction of cancer‐causing missense variants. Genomics 98:310. - PMC - PubMed
    1. Capriotti E, Nehrt NL, Kann MG, Bromberg Y. 2012. Bioinformatics for personal genome interpretation. Brief Bioinform 13:495–512. - PMC - PubMed

Publication types