Parallel worlds of public and commercial bioactive chemistry data

J Med Chem. 2015 Mar 12;58(5):2068-76. doi: 10.1021/jm5011308. Epub 2014 Dec 4.


The availability of structures and linked bioactivity data in databases is powerfully enabling for drug discovery and chemical biology. However, we now review some confounding issues with the divergent expansions of public and commercial sources of chemical structures. These are associated with not only expanding patent extraction but also increasingly large vendor collections amassed via different selection criteria between SciFinder from Chemical Abstracts Service (CAS) and major public sources such as PubChem, ChemSpider, UniChem, and others. These increasingly massive collections may include both real and virtual compounds, as well as so-called prophetic compounds from patents. We address a range of issues raised by the challenges faced resolving the NIH probe compounds. In addition we highlight the confounding of prior-art searching by virtual compounds that could impact the composition of matter patentability of a new medicinal chemistry lead. Finally, we propose some potential solutions.

MeSH terms

  • Chemistry, Pharmaceutical*
  • Computational Biology / methods*
  • Databases, Factual*
  • Drug Discovery / methods*
  • Humans
  • Information Storage and Retrieval
  • Patents as Topic
  • Pharmaceutical Preparations / chemistry*
  • Pharmaceutical Preparations / metabolism*
  • Structure-Activity Relationship


  • Pharmaceutical Preparations