How good are pathogenicity predictors in detecting benign variants?

PLoS Comput Biol. 2019 Feb 11;15(2):e1006481. doi: 10.1371/journal.pcbi.1006481. eCollection 2019 Feb.


Computational tools are widely used for interpreting variants detected in sequencing projects. The choice of these tools is critical for reliable variant impact interpretation for precision medicine and should be based on systematic performance assessment. The performance of the methods varies widely in different performance assessments, for example due to the contents and sizes of test datasets. To address this issue, we obtained 63,160 common amino acid substitutions (allele frequency ≥1% and <25%) from the Exome Aggregation Consortium (ExAC) database, which contains variants from 60,706 genomes or exomes. We evaluated the specificity, the capability to detect benign variants, for 10 variant interpretation tools. In addition to overall specificity of the tools, we tested their performance for variants in six geographical populations. PON-P2 had the best performance (95.5%) followed by FATHMM (86.4%) and VEST (83.5%). While these tools had excellent performance, the poorest method predicted more than one third of the benign variants to be disease-causing. The results allow choosing reliable methods for benign variant interpretation, for both research and clinical purposes, as well as provide a benchmark for method developers.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Substitution / genetics
  • Computational Biology / methods*
  • Databases, Genetic
  • Exome
  • Forecasting / methods*
  • Gene Frequency / genetics
  • Genetic Variation
  • Humans
  • Sensitivity and Specificity
  • Sequence Analysis, DNA / methods*
  • Virulence

Grant support

MV acknowledges financial support from Swedish Research Council (Vetenskapsrådet) VR 2015-02510. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.