Context Aware Data-Driven Retrosynthetic Analysis

J Chem Inf Model. 2020 Jun 22;60(6):2728-2738. doi: 10.1021/acs.jcim.9b01141. Epub 2020 Apr 24.

Abstract

Modern drug discovery is an iterative process relying on hypothesis generation through exploitation of available data and hypothesis testing that produces informative results necessary for subsequent rounds of exploration. In this setting, hypothesis generation consists of designing chemical structures likely to meet the pharmaceutically relevant objectives of the discovery project pursued while hypothesis testing involves the compound synthesis and biological assays to query the hypothesis. While much attention has been placed on effective compound design, it is often the case that hypothesis generation efforts lead to novel chemical structure designs with no established chemical synthesis route. We introduce a chemical context aware data-driven method built upon millions of available reactions, with attractive run-time characteristics, to recommend synthetic routes matching a precedent-derived template. Coupled with modern automated synthesis platforms and available building block collections, the method enables drug discovery researchers to identify easy to interpret and implement routes for target compounds. Results of this in-house computer-aided synthesis platform termed ChemoPrint are presented here demonstrating how such tools can bridge chemical synthesis knowledge with synthetic resources and facilitate hypothesis testing, thereby reducing the time required to complete an idea-to-data drug discovery cycle.

MeSH terms

  • Drug Discovery*