Evaluation of virtual screening performance of support vector machines trained by sparsely distributed active compounds

J Chem Inf Model. 2008 Jun;48(6):1227-37. doi: 10.1021/ci800022e. Epub 2008 Jun 6.


Virtual screening performance of support vector machines (SVM) depends on the diversity of training active and inactive compounds. While diverse inactive compounds can be routinely generated, the number and diversity of known actives are typically low. We evaluated the performance of SVM trained by sparsely distributed actives in six MDDR biological target classes composed of a high number of known actives (983-1645) of high, intermediate, and low structural diversity (muscarinic M1 receptor agonists, NMDA receptor antagonists, thrombin inhibitors, HIV protease inhibitors, cephalosporins, and renin inhibitors). SVM trained by regularly sparse data sets of 100 actives show improved yields at substantially reduced false-hit rates compared to those of published studies and those of Tanimoto-based similarity searching method based on the same data sets and molecular descriptors. SVM trained by very sparse data sets of 40 actives (2.4%-4.1% of the known actives) predicted 17.5-39.5%, 23.0-48.1%, and 70.2-92.4% of the remaining 943-1605 actives in the high, intermediate, and low diversity classes, respectively, 13.8-68.7% of which are outside the training compound families. SVM predicted 99.97% and 97.1% of the 9.997 M PUBCHEM and 167K remaining MDDR compounds as inactive and 2.6%-8.3% of the 19,495-38,483 MDDR compounds similar to the known actives as active. These suggest that SVM has substantial capability in identifying novel active compounds from sparse active data sets at low false-hit rates.

Publication types

  • Evaluation Study

MeSH terms

  • Artificial Intelligence*
  • Cephalosporins / chemistry
  • Cephalosporins / pharmacology
  • Drug Evaluation, Preclinical / methods*
  • Protease Inhibitors / chemistry
  • Protease Inhibitors / pharmacology
  • Receptors, Cell Surface / antagonists & inhibitors


  • Cephalosporins
  • Protease Inhibitors
  • Receptors, Cell Surface