Identifying biologically active compound classes using phenotypic screening data and sampling statistics

J Chem Inf Model. 2005 Nov-Dec;45(6):1824-36. doi: 10.1021/ci050087d.


Scoring the activity of compounds in phenotypic high-throughput assays presents a unique challenge because of the limited resolution and inherent measurement error of these assays. Techniques that leverage the structural similarity of compounds within an assay can be used to improve the hit-recovery rate from screening data. A technique is presented that uses clustering and sampling statistics to predict likely compound activity by scoring entire structural classes. A set of phenotypic assays performed against a commercially available compound library was used as a test set. Using the class-scoring technique, the resultant activity prediction scores were more reproducible than individual assay measurements, and class scoring recovered known active compounds more efficiently than individual assay measurements because class scoring had fewer false positives. Known biologically active compounds were recovered 87% of the time using class scores, suggesting a low false-negative rate that compared well to individual assay measurements. In addition, many weak and potentially novel classes of active compounds, overlooked by individual assay measurements, were suggested.

MeSH terms

  • Actins / chemistry
  • Actins / drug effects
  • Algorithms
  • Cluster Analysis
  • Drug Evaluation, Preclinical / statistics & numerical data*
  • Endocytosis / drug effects
  • Entropy
  • Enzyme Inhibitors
  • Methyltransferases / antagonists & inhibitors
  • Microtubules / drug effects
  • Mitochondria / drug effects
  • Models, Statistical*
  • Phenotype
  • Structure-Activity Relationship
  • Terminology as Topic


  • Actins
  • Enzyme Inhibitors
  • Methyltransferases