A method of gene expression data transfer from cell lines to cancer patients for machine-learning prediction of drug efficiency

Cell Cycle. 2018;17(4):486-491. doi: 10.1080/15384101.2017.1417706. Epub 2018 Jan 17.


Personalized medicine implies that distinct treatment methods are prescribed to individual patients according several features that may be obtained from, e.g., gene expression profile. The majority of machine learning methods suffer from the deficiency of preceding cases, i.e. the gene expression data on patients combined with the confirmed outcome of known treatment methods. At the same time, there exist thousands of various cell lines that were treated with hundreds of anti-cancer drugs in order to check the ability of these drugs to stop the cell proliferation, and all these cell line cultures were profiled in terms of their gene expression. Here we present a new approach in machine learning, which can predict clinical efficiency of anti-cancer drugs for individual patients by transferring features obtained from the expression-based data from cell lines. The method was validated on three datasets for cancer-like diseases (chronic myeloid leukemia, as well as lung adenocarcinoma and renal carcinoma) treated with targeted drugs - kinase inhibitors, such as imatinib or sorafenib.

Keywords: Bioinformatics; cancer; cell lines; drug scoring; gene expression profiling; machine learning; pathway activation scoring; personalized medicine; support vector machines.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Antineoplastic Agents / pharmacology*
  • Area Under Curve
  • Databases, Factual
  • Gene Expression Regulation, Neoplastic / drug effects*
  • Humans
  • Kidney Neoplasms / drug therapy
  • Kidney Neoplasms / metabolism
  • Kidney Neoplasms / pathology
  • Machine Learning*
  • Precision Medicine
  • ROC Curve


  • Antineoplastic Agents

Grant support

This work was supported by the National Research Center "Kurchatov Institute" [grant number 2017-1025].