A random walk-based method to identify driver genes by integrating the subcellular localization and variation frequency into bipartite graph

BMC Bioinformatics. 2019 May 14;20(1):238. doi: 10.1186/s12859-019-2847-9.

Abstract

Background: Cancer as a worldwide problem is driven by genomic alterations. With the advent of high-throughput sequencing technology, a huge amount of genomic data generates at every second which offer many valuable cancer information and meanwhile throw a big challenge to those investigators. As the major characteristic of cancer is heterogeneity and most of alterations are supposed to be useless passenger mutations that make no contribution to the cancer progress. Hence, how to dig out driver genes that have effect on a selective growth advantage in tumor cells from those tremendously and noisily data is still an urgent task.

Results: Considering previous network-based method ignoring some important biological properties of driver genes and the low reliability of gene interactive network, we proposed a random walk method named as Subdyquency that integrates the information of subcellular localization, variation frequency and its interaction with other dysregulated genes to improve the prediction accuracy of driver genes. We applied our model to three different cancers: lung, prostate and breast cancer. The results show our model can not only identify the well-known important driver genes but also prioritize the rare unknown driver genes. Besides, compared with other existing methods, our method can improve the precision, recall and fscore to a higher level for most of cancer types.

Conclusions: The final results imply that driver genes are those prone to have higher variation frequency and impact more dysregulated genes in the common significant compartment.

Availability: The source code can be obtained at https://github.com/weiba/Subdyquency .

Keywords: Driver genes; Dysregulated genes; Genomic expression; Random walk; Subcellular localization; Variation frequency.

MeSH terms

  • Gene Regulatory Networks / genetics*
  • Genomics / methods*
  • Humans
  • Reproducibility of Results