Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 4 (1), 10

In-silico Predictive Mutagenicity Model Generation Using Supervised Learning Approaches

Affiliations

In-silico Predictive Mutagenicity Model Generation Using Supervised Learning Approaches

Abhik Seal et al. J Cheminform.

Abstract

Background: Experimental screening of chemical compounds for biological activity is a time consuming and expensive practice. In silico predictive models permit inexpensive, rapid "virtual screening" to prioritize selection of compounds for experimental testing. Both experimental and in silico screening can be used to test compounds for desirable or undesirable properties. Prior work on prediction of mutagenicity has primarily involved identification of toxicophores rather than whole-molecule predictive models. In this work, we examined a range of in silico predictive classification models for prediction of mutagenic properties of compounds, including methods such as J48 and SMO which have not previously been widely applied in cheminformatics.

Results: The Bursi mutagenicity data set containing 4337 compounds (Set 1) and a Benchmark data set of 6512 compounds (Set 2) were taken as input data set in this work. A third data set (Set 3) was prepared by joining up the previous two sets. Classification algorithms including Naïve Bayes, Random Forest, J48 and SMO with 10 fold cross-validation and default parameters were used for model generation on these data sets. Models built using the combined performed better than those developed from the Benchmark data set. Significantly, Random Forest outperformed other classifiers for all the data sets, especially for Set 3 with 89.27% accuracy, 89% precision and ROC of 95.3%. To validate the developed models two external data sets, AID1189 and AID1194, with mutagenicity data were tested showing 62% accuracy with 67% precision and 65% ROC area and 91% accuracy, 91% precision with 96.3% ROC area respectively. A Random Forest model was used on approved drugs from DrugBank and metabolites from the Zinc Database with True Positives rate almost 85% showing the robustness of the model.

Conclusion: We have created a new mutagenicity benchmark data set with around 8,000 compounds. Our work shows that highly accurate predictive mutagenicity models can be built using machine learning methods based on chemical descriptors and trained using this set, and these models provide a complement to toxicophores based methods. Further, our work supports other recent literature in showing that Random Forest models generally outperform other comparable machine learning methods for this kind of application.

Figures

Figure 1
Figure 1
The diagram above represents the knowledge workflow model of Weka environment software.
Figure 2
Figure 2
The graph represents number of Set 1 compounds classified by TP, FN, FP and TN by Naive Bayes, Random Forest, J48 and SMO classifiers.
Figure 3
Figure 3
The graph represents number of Set 3 compounds classified by TP, FN, FP and TN by Naive Bayes, Random Forest, J48 and SMO classifiers.
Figure 4
Figure 4
The graph represents number of Set 3 compounds classified by TP, FN, FP and TN by Naive Bayes, Random Forest, J48 and SMO classifiers.
Figure 5
Figure 5
Set 3 Variable Importance Graph.
Figure 6
Figure 6
Represents some compounds which are Mutagenic but predicted as Non Mutagen(False Negative) by Random Forest in the test set.
Figure 7
Figure 7
Shows false positive compounds of the test sets.
Figure 8
Figure 8
Shows some drugs predicted as false positives.
Figure 9
Figure 9
Shows the withdrawn drug compounds predicted as false negative.

Similar articles

See all similar articles

Cited by 3 articles

References

    1. van Ravenzwaay B, Herold M, Kamp H, Kapp MD, Fabian E, Looser R, Krennrich G, Mellert W, Prokoudine A, Strauss V, Walk T, Wiemer J. Metabolomics: A tool for early detection of toxicological effects and an opportunity for biology based grouping of chemicals-From QSAR to QBAR. Mutat Res. 2012. [In Press] - PubMed
    1. Ames B. The detection of environmental mutagens and potential. Cancer. 1984;53:2030–2040. - PubMed
    1. Mortelmans K, Zeiger E. The ames salmonella/microsome mutagenicity assay. Mutat Res. 2000;455(1–2):29–60. - PubMed
    1. Kazius J, McGuire J, Bursi R. Derivation and validation of toxicophores for mutagenicity prediction. J Med Chem. 2005;48(1):312–320. doi: 10.1021/jm040835a. - DOI - PubMed
    1. Helma C, Cramer T, Kramer S, Raedt L. Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. J Chem Inf Comput Sci. 2004;44:1402–1411. doi: 10.1021/ci034254q. - DOI - PubMed

LinkOut - more resources

Feedback