Motivation: It becomes widely accepted that human cancer is a disease involving dynamic changes in the genome and that the missense mutations constitute the bulk of human genetic variations. A multitude of computational algorithms, especially the machine learning-based ones, has consequently been proposed to distinguish missense changes that contribute to the cancer progression ('driver' mutation) from those that do not ('passenger' mutation). However, the existing methods have multifaceted shortcomings, in the sense that they either adopt incomplete feature space or depend on protein structural databases which are usually far from integrated.
Results: In this article, we investigated multiple aspects of a missense mutation and identified a novel feature space that well distinguishes cancer-associated driver mutations from passenger ones. An index (DX score) was proposed to evaluate the discriminating capability of each feature, and a subset of these features which ranks top was selected to build the SVM classifier. Cross-validation showed that the classifier trained on our selected features significantly outperforms the existing ones both in precision and robustness. We applied our method to several datasets of missense mutations culled from published database and literature and obtained more reasonable results than previous studies.
Availability: The software is available online at http://www.methodisthealth.com/software and https://sites.google.com/site/drivermutationidentification/.
Supplementary information: Supplementary data are available at Bioinformatics online.