Modeling the expansion of virtual screening libraries

Nat Chem Biol. 2023 Jun;19(6):712-718. doi: 10.1038/s41589-022-01234-w. Epub 2023 Jan 16.


Recently, 'tangible' virtual libraries have made billions of molecules readily available. Prioritizing these molecules for synthesis and testing demands computational approaches, such as docking. Their success may depend on library diversity, their similarity to bio-like molecules and how receptor fit and artifacts change with library size. We compared a library of 3 million 'in-stock' molecules with billion-plus tangible libraries. The bias toward bio-like molecules in the tangible library decreases 19,000-fold versus those 'in-stock'. Similarly, thousands of high-ranking molecules, including experimental actives, from five ultra-large-library docking campaigns are also dissimilar to bio-like molecules. Meanwhile, better-fitting molecules are found as the library grows, with the score improving log-linearly with library size. Finally, as library size increases, so too do rare molecules that rank artifactually well. Although the nature of these artifacts changes from target to target, the expectation of their occurrence does not, and simple strategies can minimize their impact.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Libraries, Digital*
  • Molecular Docking Simulation