Genome-wide investigation of gene-cancer associations for the prediction of novel therapeutic targets in oncology

Adrián Bazaga; Dan Leggate; Hendrik Weisser

doi:10.1038/s41598-020-67846-1

Genome-wide investigation of gene-cancer associations for the prediction of novel therapeutic targets in oncology

Sci Rep. 2020 Jul 1;10(1):10787. doi: 10.1038/s41598-020-67846-1.

Authors

Adrián Bazaga^{1

2}, Dan Leggate³, Hendrik Weisser⁴

Affiliations

¹ Department of Genetics, University of Cambridge, Cambridge, UK. ar989@cam.ac.uk.
² STORM Therapeutics Ltd, Cambridge, UK. ar989@cam.ac.uk.
³ STORM Therapeutics Ltd, Cambridge, UK.
⁴ STORM Therapeutics Ltd, Cambridge, UK. hendrik.weisser@stormtherapeutics.com.

Abstract

A major cause of failed drug discovery programs is suboptimal target selection, resulting in the development of drug candidates that are potent inhibitors, but ineffective at treating the disease. In the genomics era, the availability of large biomedical datasets with genome-wide readouts has the potential to transform target selection and validation. In this study we investigate how computational intelligence methods can be applied to predict novel therapeutic targets in oncology. We compared different machine learning classifiers applied to the task of drug target classification for nine different human cancer types. For each cancer type, a set of "known" target genes was obtained and equally-sized sets of "non-targets" were sampled multiple times from the human protein-coding genes. Models were trained on mutation, gene expression (TCGA), and gene essentiality (DepMap) data. In addition, we generated a numerical embedding of the interaction network of protein-coding genes using deep network representation learning and included the results in the modeling. We assessed feature importance using a random forests classifier and performed feature selection based on measuring permutation importance against a null distribution. Our best models achieved good generalization performance based on the AUROC metric. With the best model for each cancer type, we ran predictions on more than 15,000 protein-coding genes to identify potential novel targets. Our results indicate that this approach may be useful to inform early stages of the drug discovery pipeline.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Databases, Genetic*
Drug Development*
Drug Discovery*
Gene Regulatory Networks*
Genome, Human*
Genome-Wide Association Study
Humans
Machine Learning
Medical Oncology
Models, Biological*
Neoplasm Proteins* / genetics
Neoplasm Proteins* / metabolism
Neoplasms* / drug therapy
Neoplasms* / genetics
Neoplasms* / metabolism

Substances

Neoplasm Proteins