PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions

PLoS Comput Biol. 2016 May 25;12(5):e1004962. doi: 10.1371/journal.pcbi.1004962. eCollection 2016 May.

Abstract

An important message taken from human genome sequencing projects is that the human population exhibits approximately 99.9% genetic similarity. Variations in the remaining parts of the genome determine our identity, trace our history and reveal our heritage. The precise delineation of phenotypically causal variants plays a key role in providing accurate personalized diagnosis, prognosis, and treatment of inherited diseases. Several computational methods for achieving such delineation have been reported recently. However, their ability to pinpoint potentially deleterious variants is limited by the fact that their mechanisms of prediction do not account for the existence of different categories of variants. Consequently, their output is biased towards the variant categories that are most strongly represented in the variant databases. Moreover, most such methods provide numeric scores but not binary predictions of the deleteriousness of variants or confidence scores that would be more easily understood by users. We have constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category-optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score. Additional comparative analyses showed that in the case of missense variations, protein-based predictors perform better than DNA sequence-based predictors. A user-friendly web interface was developed that provides easy access to the five tools' predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations. To enable comprehensive evaluation of variants, the predictions are complemented with annotations from eight databases. The web server is freely available to the community at http://loschmidt.chemi.muni.cz/predictsnp2.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology
  • Databases, Nucleic Acid
  • Databases, Protein
  • Genetic Variation
  • Genome, Human
  • Genomics / statistics & numerical data
  • Humans
  • Polymorphism, Single Nucleotide*
  • Software*

Grants and funding

The work was supported by the Czech Ministry of Education of the Czech Republic (LO1214 and LQ1605;http://www.msmt.cz) and European Commission within the Research Infrastructures programme of Horizon 2020 (ELIXIR-EXCELERATE 676559;ec.europa.eu/research) and the European Union Framework Programme (REGPOT 316345;ec.europa.eu/research). The work of MM and JZ were supported by the project Research and Application of Advanced Methods in ICT (FIT-S-14-2299;http://www.fit.vutbr.cz/). Computational resources were provided by the CESNET and the CERIT Scientific Cloud (LM2015042 and LM2015085;http://www.msmt.cz), provided under the programme "Projects of Large Research, Development, and Innovations Infrastructures. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.