To improve the drug discovery yield, a method which is implemented at the beginning of drug discovery that accurately predicts drug side effects, indications, efficacy, and mode of action based solely on the input of the drug's chemical structure is needed. In contrast, extant predictive methods do not comprehensively address these aspects of drug discovery and rely on features derived from extensive, often unavailable experimental information for novel molecules. To address these issues, we developed MEDICASCY, a multilabel-based boosted random forest machine learning method that only requires the small molecule's chemical structure for the drug side effect, indication, efficacy, and probable mode of action target predictions; however, it has comparable or even significantly better performance than existing approaches requiring far more information. In retrospective benchmarking on high confidence predictions, MEDICASCY shows about 78% precision and recall for predicting at least one severe side effect and 72% precision drug efficacy. Experimental validation of MEDICASCY's efficacy predictions on novel molecules shows close to 80% precision for the inhibition of growth in ovarian, breast, and prostate cancer cell lines. Thus, MEDICASCY should improve the success rate for new drug approval. A web service for academic users is available at http://pwp.gatech.edu/cssb/MEDICASCY.
Keywords: drug clinical trials; drug efficacy prediction; drug side effect prediction; indication; machine learning; mode of action protein.