An open-source drug discovery platform enables ultra-large virtual screens

Nature. 2020 Apr;580(7805):663-668. doi: 10.1038/s41586-020-2117-z. Epub 2020 Mar 9.


On average, an approved drug currently costs US$2-3 billion and takes more than 10 years to develop1. In part, this is due to expensive and time-consuming wet-laboratory experiments, poor initial hit compounds and the high attrition rates in the (pre-)clinical phases. Structure-based virtual screening has the potential to mitigate these problems. With structure-based virtual screening, the quality of the hits improves with the number of compounds screened2. However, despite the fact that large databases of compounds exist, the ability to carry out large-scale structure-based virtual screening on computer clusters in an accessible, efficient and flexible manner has remained difficult. Here we describe VirtualFlow, a highly automated and versatile open-source platform with perfect scaling behaviour that is able to prepare and efficiently screen ultra-large libraries of compounds. VirtualFlow is able to use a variety of the most powerful docking programs. Using VirtualFlow, we prepared one of the largest and freely available ready-to-dock ligand libraries, with more than 1.4 billion commercially available molecules. To demonstrate the power of VirtualFlow, we screened more than 1 billion compounds and identified a set of structurally diverse molecules that bind to KEAP1 with submicromolar affinity. One of the lead inhibitors (iKeap1) engages KEAP1 with nanomolar affinity (dissociation constant (Kd) = 114 nM) and disrupts the interaction between KEAP1 and the transcription factor NRF2. This illustrates the potential of VirtualFlow to access vast regions of the chemical space and identify molecules that bind with high affinity to target proteins.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Access to Information
  • Automation / methods
  • Automation / standards
  • Cloud Computing
  • Computer Simulation
  • Databases, Chemical
  • Drug Discovery / methods*
  • Drug Discovery / standards
  • Drug Evaluation, Preclinical / methods*
  • Drug Evaluation, Preclinical / standards
  • Kelch-Like ECH-Associated Protein 1 / antagonists & inhibitors
  • Kelch-Like ECH-Associated Protein 1 / chemistry
  • Kelch-Like ECH-Associated Protein 1 / metabolism
  • Ligands
  • Molecular Docking Simulation / methods*
  • Molecular Docking Simulation / standards
  • Molecular Targeted Therapy
  • NF-E2-Related Factor 2 / metabolism
  • Reproducibility of Results
  • Software* / standards
  • Thermodynamics
  • User-Computer Interface*


  • KEAP1 protein, human
  • Kelch-Like ECH-Associated Protein 1
  • Ligands
  • NF-E2-Related Factor 2
  • NFE2L2 protein, human