PredictSNP: robust and accurate consensus classifier for prediction of disease-related mutations

PLoS Comput Biol. 2014 Jan;10(1):e1003440. doi: 10.1371/journal.pcbi.1003440. Epub 2014 Jan 16.

Abstract

Single nucleotide variants represent a prevalent form of genetic variation. Mutations in the coding regions are frequently associated with the development of various genetic diseases. Computational tools for the prediction of the effects of mutations on protein function are very important for analysis of single nucleotide variants and their prioritization for experimental characterization. Many computational tools are already widely employed for this purpose. Unfortunately, their comparison and further improvement is hindered by large overlaps between the training datasets and benchmark datasets, which lead to biased and overly optimistic reported performances. In this study, we have constructed three independent datasets by removing all duplicities, inconsistencies and mutations previously used in the training of evaluated tools. The benchmark dataset containing over 43,000 mutations was employed for the unbiased evaluation of eight established prediction tools: MAPP, nsSNPAnalyzer, PANTHER, PhD-SNP, PolyPhen-1, PolyPhen-2, SIFT and SNAP. The six best performing tools were combined into a consensus classifier PredictSNP, resulting into significantly improved prediction performance, and at the same time returned results for all mutations, confirming that consensus prediction represents an accurate and robust alternative to the predictions delivered by individual tools. A user-friendly web interface enables easy access to all eight prediction tools, the consensus classifier PredictSNP and annotations from the Protein Mutant Database and the UniProt database. The web server and the datasets are freely available to the academic community at http://loschmidt.chemi.muni.cz/predictsnp.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Computer Simulation
  • Databases, Protein
  • Genetic Diseases, Inborn / genetics*
  • Genetic Variation
  • Genome, Human
  • Humans
  • Internet
  • Mutation*
  • Phylogeny
  • Polymorphism, Single Nucleotide*
  • Software

Grant support

The research of JS, AP and JD was supported by the project FNUSA-ICRC (CZ.1.05/1.1.00/02.0123) from the European Regional Development Fund. The work of JB was supported by the Program of “Employment of Best Young Scientists for International Cooperation Empowerment” (CZ1.07/2.3.00/30.0037) co-financed from European Social Fund and the state budget of the Czech Republic. The work of JB, OS and JZ was supported by the project Security-Oriented Research in Information Technology (CEZ MSM0021630528) and the BUT FIT specific research grant (FIT-S-11-2). MetaCentrum is acknowledged for providing access to their computing facilities, supported by the Czech Ministry of Education of the Czech Republic (LM2010005). CERIT-SC is acknowledged for providing access to their computing facilities, under the program Center CERIT scientific Cloud (CZ.1.05/3.2.00/08.0144). The work of AP was supported by Brno Ph.D. Talent Scholarship provided by Brno City Municipality. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.