The machine learning (ML) as well as quantitative structure activity relationship (QSAR) method has been explored for predicting compounds with antibacterial activities at impressive performance. It is desirable to test additional ML methods, select most representative sets of molecular descriptors, and subject the developed prediction models to rigorous evaluations. This work evaluated three ML methods, support vector classification (SVC), k-nearest neighbor (k-NN), and C4.5 decision tree, which were trained and tested by 230 antibacterial and 381 nonantibacterial compounds. A well-established feature selection method was used to select representative molecular descriptors from a larger pool than that used in reported studies. The performance of the developed prediction models was tested by 5-fold cross-validation and independent evaluation set. SVC produced the best prediction accuracies of 96.66 and 98.15% for antibacterial compounds, and 99.50 and 98.02% for nonantibacterial compounds respectively, which are slightly improved against those of the reported ML as well as QSAR models and outperform the k-NN and C4.5 decision tree models developed in this work. Our study suggests that ML methods, particularly SVC, are potentially useful for facilitating the discovery of antibacterial agents.
2008 Wiley Periodicals, Inc.