Topologically inferring risk-active pathways toward precise cancer classification by directed random walk

Bioinformatics. 2013 Sep 1;29(17):2169-77. doi: 10.1093/bioinformatics/btt373. Epub 2013 Jul 10.


Motivation: The accurate prediction of disease status is a central challenge in clinical cancer research. Microarray-based gene biomarkers have been identified to predict outcome and outperform traditional clinical parameters. However, the robustness of the individual gene biomarkers is questioned because of their little reproducibility between different cohorts of patients. Substantial progress in treatment requires advances in methods to identify robust biomarkers. Several methods incorporating pathway information have been proposed to identify robust pathway markers and build classifiers at the level of functional categories rather than of individual genes. However, current methods consider the pathways as simple gene sets but ignore the pathway topological information, which is essential to infer a more robust pathway activity.

Results: Here, we propose a directed random walk (DRW)-based method to infer the pathway activity. DRW evaluates the topological importance of each gene by capturing the structure information embedded in the directed pathway network. The strategy of weighting genes by their topological importance greatly improved the reproducibility of pathway activities. Experiments on 18 cancer datasets showed that the proposed method yielded a more accurate and robust overall performance compared with several existing gene-based and pathway-based classification methods. The resulting risk-active pathways are more reliable in guiding therapeutic selection and the development of pathway-specific therapeutic strategies.

Availability: DRW is freely available at

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers, Tumor / genetics
  • Biomarkers, Tumor / metabolism
  • Gene Expression Profiling*
  • Humans
  • Neoplasms / classification*
  • Neoplasms / genetics
  • Neoplasms / metabolism
  • Oligonucleotide Array Sequence Analysis
  • Reproducibility of Results
  • Risk
  • Signal Transduction*


  • Biomarkers, Tumor