Non-silent single nucleotide genetic variants, like nonsense changes and insertion-deletion variants, that affect protein function and length substantially are prevalent and are frequently misclassified. The low sensitivity and specificity of existing variant effect predictors for nonsense and indel variations restrict their use in clinical applications. We propose the Pathogenic Mutation Prediction (PMPred) method to predict the pathogenicity of single nucleotide variations, which impair protein function by prematurely terminating a protein's elongation during its synthesis. The prediction starts by monitoring functional effects (Gene Ontology annotation changes) of the change in sequence, using an existing ensemble machine learning model (UniGOPred). This, in turn, reveals the mutations that significantly deviate functionally from the wild-type sequence. We have identified novel harmful mutations in patient data and present them as motivating case studies. We also show that our method has increased sensitivity and specificity compared to state-of-the-art, especially in single nucleotide variations that produce large functional changes in the final protein. As further validation, we have done a comparative docking study on such a variation that is misclassified by existing methods and, using the altered binding affinities, show how PMPred can correctly predict the pathogenicity when other tools miss it. PMPred is freely accessible as a web service at https://pmpred.kansil.org/, and the related code is available at https://github.com/kansil/PMPred.
Keywords: Gene ontology; Insertion-deletion variants; Mutation; Pearson correlation; Protein sequence; Variant pathogenicity prediction.
Copyright © 2023 Elsevier Ltd. All rights reserved.