The SAAP pipeline and database: tools to analyze the impact and predict the pathogenicity of mutations

Nouf S Al-Numair; Andrew C R Martin

doi:10.1186/1471-2164-14-S3-S4

The SAAP pipeline and database: tools to analyze the impact and predict the pathogenicity of mutations

BMC Genomics. 2013;14 Suppl 3(Suppl 3):S4. doi: 10.1186/1471-2164-14-S3-S4. Epub 2013 May 28.

Authors

Nouf S Al-Numair¹, Andrew C R Martin

Affiliation

¹ Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK.

Abstract

Background: Understanding and predicting the effects of mutations on protein structure and phenotype is an increasingly important area. Genes for many genetically linked diseases are now routinely sequenced in the clinic. Previously we focused on understanding the structural effects of mutations, creating the SAAPdb resource.

Results: We have updated SAAPdb to include 41% more SNPs and 36% more PDs. Introducing a hydrophobic residue on the surface, or a hydrophilic residue in the core, no longer shows significant differences between SNPs and PDs. We have improved some of the analyses significantly enhancing the analysis of clashes and of mutations to-proline and from-glycine. A new web interface has been developed allowing users to analyze their own mutations. Finally we have developed a machine learning method which gives a cross-validated accuracy of 0.846, considerably out-performing well known methods including SIFT and PolyPhen2 which give accuracies between 0.690 and 0.785.

Conclusions: We have updated SAAPdb and improved its analyses, but with the increasing rate with which mutation data are generated, we have created a new analysis pipeline and web interface. Results of machine learning using the structural analysis results to predict pathogenicity considerably outperform other methods.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Substitution / genetics
Artificial Intelligence
Computational Biology / methods*
Genetic Diseases, Inborn / genetics*
Humans
Internet
Mutation / genetics*
Phenotype*
Polymorphism, Single Nucleotide / genetics
Protein Conformation*
Proteins / genetics*
Software*

Substances

Proteins