Machine learning model to predict oncologic outcomes for drugs in randomized clinical trials

Int J Cancer. 2020 Nov 1;147(9):2537-2549. doi: 10.1002/ijc.33240. Epub 2020 Aug 19.


Predicting oncologic outcome is challenging due to the diversity of cancer histologies and the complex network of underlying biological factors. In this study, we determine whether machine learning (ML) can extract meaningful associations between oncologic outcome and clinical trial, drug-related biomarker and molecular profile information. We analyzed therapeutic clinical trials corresponding to 1102 oncologic outcomes from 104 758 cancer patients with advanced colorectal adenocarcinoma, pancreatic adenocarcinoma, melanoma and nonsmall-cell lung cancer. For each intervention arm, a dataset with the following attributes was curated: line of treatment, the number of cytotoxic chemotherapies, small-molecule inhibitors, or monoclonal antibody agents, drug class, molecular alteration status of the clinical arm's population, cancer type, probability of drug sensitivity (PDS) (integrating the status of genomic, transcriptomic and proteomic biomarkers in the population of interest) and outcome. A total of 467 progression-free survival (PFS) and 369 overall survival (OS) data points were used as training sets to build our ML (random forest) model. Cross-validation sets were used for PFS and OS, obtaining correlation coefficients (r) of 0.82 and 0.70, respectively (outcome vs model's parameters). A total of 156 PFS and 110 OS data points were used as test sets. The Spearman correlation (rs ) between predicted and actual outcomes was statistically significant (PFS: rs = 0.879, OS: rs = 0.878, P < .0001). The better outcome arm was predicted in 81% (PFS: N = 59/73, z = 5.24, P < .0001) and 71% (OS: N = 37/52, z = 2.91, P = .004) of randomized trials. The success of our algorithm to predict clinical outcome may be exploitable as a model to optimize clinical trial design with pharmaceutical agents.

Keywords: clinical trials; drug-related biomarkers; machine learning; molecular profiles; outcome prediction.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Antineoplastic Combined Chemotherapy Protocols / pharmacology*
  • Antineoplastic Combined Chemotherapy Protocols / therapeutic use
  • Biomarkers, Tumor / analysis
  • Biomarkers, Tumor / genetics*
  • Datasets as Topic
  • Drug Resistance, Neoplasm / genetics
  • Forecasting / methods
  • Humans
  • Machine Learning
  • Models, Genetic*
  • Mutation
  • Neoplasms / drug therapy*
  • Neoplasms / genetics
  • Neoplasms / mortality
  • Neoplasms / pathology
  • Prognosis
  • Progression-Free Survival
  • Randomized Controlled Trials as Topic*
  • Research Design


  • Biomarkers, Tumor