Predicting High-Impact Pharmacological Targets by Integrating Transcriptome and Text-Mining Features

J Pharm Pharm Sci. Oct-Dec 2016;19(4):475-495. doi: 10.18433/J3SC8X.


Purpose: Novel, "outside of the box" approaches are needed for evaluating candidate molecules, especially in oncology. Throughout the years of 2000-2010, the efficiency of drug development fell to barely acceptable levels, and in the second decade of this century, levels have improved only marginally. This dismal condition continues despite unprecedented progress in the development of a variety of high-throughput tools, computational methods, aggregated databases, drug repurposing programs and innovative chemistries. Here we tested a hypothesis that the economic impact of targeting a particular gene product is predictable a priori by employing a combination of transcriptome profiles and quantitative metrics reflecting existing literature.

Methods: To extract classification features, the gene expression patterns of a posteriori high-impact and low-impact anti-cancer target sets were compared. To minimize the possible bias of text-mining, the number of manuscripts published prior to the first clinical trial or relevant review paper, as well as its first derivative in this interval, were collected and used as quantitative metrics of public interest.

Results: By combining the gene expression and literature mining features, a 4-fold enrichment in high-impact targets was produced, resulting in a favourable ROC curve analysis for the top impact targets. The dataset was enriched by the highest impact anti-cancer targets, while demonstrating drastic differences in economic value between high and low-impact targets. Known anti-cancer products of EGFR, ERBB2, CYP19A1/aromatase, MTOR, PTGS2, tubulin, VEGFA, BRAF, PGR, PDGFRA, SRC, REN, CSF1R, CTLA4 and HSP90AA1 genes received the highest scores for predicted impact, while microsomal steroid sulfatase, anticoagulant protein C, p53, CDKN2A, c-Jun, and TNSFS11 were highlighted as most promising research-stage targets.

Conclusions: A significant cost reduction may be achieved by a priori impact assessment of targets and ligands before their development or repurposing. Expanding a suite of combinational treatments could also decrease the costs, while achieving a higher impact per developed ligand. This article is open to POST-PUBLICATION REVIEW. Registered readers (see "For Readers") may comment by clicking on ABSTRACT on the issue's contents page.

MeSH terms

  • Data Mining*
  • Drug Discovery / methods*
  • Humans
  • Ligands
  • Molecular Targeted Therapy*
  • Neoplasms / drug therapy*
  • Neoplasms / genetics*
  • ROC Curve
  • Transcriptome*


  • Ligands