Confirmation of primary active substances from high throughput screening of chemical and biological populations: a statistical approach and practical considerations

J Comb Chem. May-Jun 2000;2(3):258-65. doi: 10.1021/cc9900706.


Many biologically important substances are discovered through screening of relevant chemical or biological libraries. The ability to find the active substances ("hits") from any random collection is largely determined by the quality of the assay and screening conditions. When a large population is screened for a specific characteristic, each member of that population is usually tested only once. Errors in the measurements require additional follow-up tests to confirm that each hit from the primary screen is truly active. In this report, we present a statistical model system that predicts the reliability of hits from a primary test as affected by the error in the assay and the choice of the hit threshold (hit limit). The hit confirmation rate, as well as false positive (representing substances that initially fall above the hit limit but whose true activity are below the hit limit) and false negative (representing substances that initially fall below the hit limit but whose true activity are in fact greater than the hit limit) rates have been analyzed with this model by computational simulation. This model can also be used in screen validation and post-screening data analysis. The statistical analysis presented here has broad implications and is applicable to screening of any large population for any specific characteristic. Obvious applications include drug discovery, gene chip analysis, population biology, directed molecular evolution, biological panning, and combinatorial material sciences.

MeSH terms

  • Biological Factors / analysis*
  • Biological Factors / pharmacology
  • Sensitivity and Specificity


  • Biological Factors