Due to involved disease mechanisms, many complex diseases such as cancer, demonstrate significant heterogeneity with varying behaviors, including different survival time, treatment responses, and recurrence rates. The aim of tumor stratification is to identify disease subtypes, which is an important first step towards precision medicine. Recent advances in profiling a large number of molecular variables such as in The Cancer Genome Atlas (TCGA), have enabled researchers to implement computational methods, including traditional clustering and bi-clustering algorithms, to systematically analyze high-throughput molecular measurements to identify tumor subtypes as well as their corresponding associated biomarkers. In this study we discuss critical issues and challenges in existing computational approaches for tumor stratification. We show that the problem can be formulated as finding densely connected sub-graphs (bi-cliques) in a bipartite graph representation of genomic data. We propose a novel algorithm that takes advantage of prior biology knowledge through a gene-gene interaction network to find such sub-graphs, which helps simultaneously identify both tumor subtypes and their corresponding genetic markers. Our experimental results show that our proposed method outperforms current state-of-the-art methods for tumor stratification.
Keywords: Bi-clique finding; Bi-partite graph; Graph regularization; Tumor stratification.
Copyright © 2015 Elsevier Ltd. All rights reserved.