QSAR classification model for antibacterial compounds and its use in virtual screening

J Chem Inf Model. 2012 Oct 22;52(10):2559-69. doi: 10.1021/ci300336v. Epub 2012 Oct 8.


As novel and drug-resistant bacterial strains continue to present an emerging health threat, the development of new antibacterial agents is critical. This includes making improvements to existing antibacterial scaffolds as well as identifying novel ones. The aim of this study is to apply a Bayesian classification QSAR approach to rapidly screen chemical libraries for compounds predicted to have antibacterial activity. Toward this end we assembled a data set of 317 known antibacterial compounds as well as a second data set of diverse, well-validated, non-antibacterial compounds from 215 PubChem Bioassays against various bacterial species. We constructed a Bayesian classification model using structural fingerprints and physicochemical property descriptors and achieved an accuracy of 84% and precision of 86% on an independent test set in identifying antibacterial compounds. To demonstrate the practical applicability of the model in virtual screening, we screened an independent data set of ~200k compounds. The results show that the model can screen top hits of PubChem Bioassay actives with accuracy up to ~76%, representing a 1.5-2-fold enrichment. The top screened hits represented a mixture of both known antibacterial scaffolds as well as novel scaffolds. Our study suggests that a well-validated Bayesian classification QSAR approach could compliment other screening approaches in identifying novel and promising hits. The data sets used in constructing and validating this model have been made publicly available.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Anti-Bacterial Agents / chemistry*
  • Anti-Bacterial Agents / pharmacology
  • Bayes Theorem
  • Computer Simulation
  • Databases, Chemical
  • Drug Design
  • Drug Discovery
  • Gram-Negative Bacteria / drug effects
  • Gram-Positive Bacteria / drug effects
  • High-Throughput Screening Assays
  • Humans
  • Models, Chemical
  • Molecular Structure
  • Quantitative Structure-Activity Relationship*
  • ROC Curve
  • Reproducibility of Results
  • Small Molecule Libraries / chemistry*
  • Small Molecule Libraries / pharmacology
  • User-Computer Interface*


  • Anti-Bacterial Agents
  • Small Molecule Libraries