Structure-based Method for Predicting Deleterious Missense SNPs

IEEE EMBS Int Conf Biomed Health Inform. 2019 May:2019:10.1109/bhi.2019.8834504. doi: 10.1109/bhi.2019.8834504. Epub 2019 Sep 12.

Abstract

Missense SNPs are key factors contributing towards many Mendelian disorders and complex diseases. Identifying whether a single amino acid substitution will lead to pathological effects is important for interpreting personal genome and for precision medicine. In this study, we describe a novel method for predicting whether a missense SNP likely brings about pathological effects. Our approach integrates sequence information, biophysical properties, and topological properties of protein structures. In our test dataset consisting of 500 deleterious variants and 500 neutral, our method achieves an accuracy of 0.823. The ROC curve of model has an AUC of 0.910. Our methods outperforms two well known methods, and is comparable with the widely used Polyphen-2 method, while requiring a much smaller amount (approximately 25%) of training data. Our method can be used to aid in distinguishing driver and passenger mutations in cancer and in assessing missense mutations assocaited with rare diseases. It can also be used to identifying mutations in rare disease where only limited patient exome data exsit.