Exploiting cheminformatic and machine learning to navigate the available chemical space of potential small molecule inhibitors of SARS-CoV-2

Comput Struct Biotechnol J. 2021;19:424-438. doi: 10.1016/j.csbj.2020.12.028. Epub 2020 Dec 29.


The current life-threatening and tenacious pandemic eruption of coronavirus disease in 2019 (COVID-19) has posed a significant global hazard concerning high mortality rate, economic meltdown, and everyday life distress. The rapid spread of COVID-19 demands countermeasures to combat this deadly virus. Currently, there are no drugs approved by the FDA to treat COVID-19. Therefore, discovering small molecule therapeutics for treating COVID-19 infection is essential. So far, only a few small molecule inhibitors are reported for coronaviruses. There is a need to expand the small chemical space of coronaviruses inhibitors by adding potent and selective scaffolds with anti-COVID activity. In this context, the huge antiviral chemical space already available can be analysed using cheminformatic and machine learning to unearth new scaffolds. We created three specific datasets called "antiviral dataset" (N = 38,428) "drug-like antiviral dataset" (N = 20,963) and "anticorona dataset" (N = 433) for this purpose. We analyzed the 433 molecules of "anticorona dataset" for their scaffold diversity, physicochemical distributions, principal component analysis, activity cliffs, R-group decomposition, and scaffold mapping. The scaffold diversity of the "anticorona dataset" in terms of Murcko scaffold analysis demonstrates a thorough representation of diverse chemical scaffolds. However, physicochemical descriptor analysis and principal component analysis demonstrated negligible drug-like features for the "anticorona dataset" molecules. The "antiviral dataset" and "drug-like antiviral dataset" showed low scaffold diversity as measured by the Gini coefficient. The hierarchical clustering of the "antiviral dataset" against the "anticorona dataset" demonstrated little molecular similarity. We generated a library of frequent fragments and polypharmacological ligands targeting various essential viral proteins such as main protease, helicase, papain-like protease, and replicase polyprotein 1ab. Further structural and chemical features of the "anticorona dataset" were compared with SARS-CoV-2 repurposed drugs, FDA-approved drugs, natural products, and drugs currently in clinical trials. Using machine learning tool DCA (DMax Chemistry Assistant), we converted the "anticorona dataset" into an elegant hypothesis with significant functional biological relevance. Machine learning analysis uncovered that FDA approved drugs, Tizanidine HCl, Cefazolin, Raltegravir, Azilsartan, Acalabrutinib, Luliconazole, Sitagliptin, Meloxicam (Mobic), Succinyl sulfathiazole, Fluconazole, and Pranlukast could be repurposed as effective drugs for COVID-19. Fragment-based scaffold analysis and R-group decomposition uncovered pyrrolidine and the indole molecular scaffolds as the potent fragments for designing and synthesizing the novel drug-like molecules for targeting SARS-CoV-2. This comprehensive and systematic assessment of small-molecule viral therapeutics' entire chemical space realised critical insights to potentially privileged scaffolds that could aid in enrichment and rapid discovery of efficacious antiviral drugs for COVID-19.

Keywords: COVID, COronaVIrus Disease; COVID-19; Chemical space; FDA, Food and Drug Administration; Gini coefficient; Repurpose drugs; SARS-CoV-2; SARS-CoV-2, Severe Acute Respiratory Syndrome CoronaVirus-2; WHO, World Health Organization.