Statistical practice in high-throughput screening data analysis

Nat Biotechnol. 2006 Feb;24(2):167-75. doi: 10.1038/nbt1186.


High-throughput screening is an early critical step in drug discovery. Its aim is to screen a large number of diverse chemical compounds to identify candidate 'hits' rapidly and accurately. Few statistical tools are currently available, however, to detect quality hits with a high degree of confidence. We examine statistical aspects of data preprocessing and hit identification for primary screens. We focus on concerns related to positional effects of wells within plates, choice of hit threshold and the importance of minimizing false-positive and false-negative rates. We argue that replicate measurements are needed to verify assumptions of current methods and to suggest data analysis strategies when assumptions are not met. The integration of replicates with robust statistical methods in primary screens will facilitate the discovery of reliable hits, ultimately improving the sensitivity and specificity of the screening process.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biological Assay / methods*
  • Biometry / methods*
  • Data Interpretation, Statistical*
  • Drug Design*
  • Drug Evaluation, Preclinical / methods*
  • Gene Expression Profiling / methods*
  • Guidelines as Topic
  • Microarray Analysis / methods*
  • Reproducibility of Results
  • Sensitivity and Specificity