Sparse overlapping group lasso for integrative multi-omics analysis

J Comput Biol. 2015 Feb;22(2):73-84. doi: 10.1089/cmb.2014.0197. Epub 2015 Jan 28.

Abstract

Gene networks and graphs are crucial tools for understanding a heterogeneous system of cancer, since cancer is a disease that does not involve individual genes but combinations of genes associated with oncogenic process. A goal of genomic data analysis via gene networks is to identify both gene networks and individual genes within the selected networks. Existing methods, however, perform only network selection, and thus all genes in selected networks are included in models. This leads to overfitting when uncovering driver genes, and the results are not biologically interpretable. To accomplish both "groupwise sparsity" and "within group sparsity" for identifying driver genes based on biological knowledge (i.e., predefined overlapping groups of features), we propose a sparse overlapping group lasso via duplicated predictors in extended space. The proposed method effectively identifies driver genes and their interactions using known biological pathway information. Monte Carlo simulations and The Cancer Genome Atlas (TCGA) project data analysis indicate that the proposed method is effective for fitting a regression model (i.e., feature selection and prediction accuracy) constructed with duplicated predictors in overlapping groups. In the TCGA data analysis, we uncover potential cancer driver genes via expression modules and gene networks constructed by multi-omics data and identify that the uncovered genes have strong evidences as a cancer driver gene. The proposed method is a useful tool for identifying cancer driver genes and for integrative multi-omics analysis.

Keywords: gene networks; graph; group sparse regularization; multi-omics analysis; uncovering driver genes.

MeSH terms

  • Algorithms*
  • Gene Regulatory Networks*
  • Genome, Human
  • Humans
  • Oncogenes*
  • Proteome / genetics
  • Proteome / metabolism

Substances

  • Proteome