Integration of somatic mutation, expression and functional data reveals potential driver genes predictive of breast cancer survival

Bioinformatics. 2015 Aug 15;31(16):2607-13. doi: 10.1093/bioinformatics/btv164. Epub 2015 Mar 24.

Abstract

Motivation: Genome and transcriptome analyses can be used to explore cancers comprehensively, and it is increasingly common to have multiple omics data measured from each individual. Furthermore, there are rich functional data such as predicted impact of mutations on protein coding and gene/protein networks. However, integration of the complex information across the different omics and functional data is still challenging. Clinical validation, particularly based on patient outcomes such as survival, is important for assessing the relevance of the integrated information and for comparing different procedures.

Results: An analysis pipeline is built for integrating genomic and transcriptomic alterations from whole-exome and RNA sequence data and functional data from protein function prediction and gene interaction networks. The method accumulates evidence for the functional implications of mutated potential driver genes found within and across patients. A driver-gene score (DGscore) is developed to capture the cumulative effect of such genes. To contribute to the score, a gene has to be frequently mutated, with high or moderate mutational impact at protein level, exhibiting an extreme expression and functionally linked to many differentially expressed neighbors in the functional gene network. The pipeline is applied to 60 matched tumor and normal samples of the same patient from The Cancer Genome Atlas breast-cancer project. In clinical validation, patients with high DGscores have worse survival than those with low scores (P = 0.001). Furthermore, the DGscore outperforms the established expression-based signatures MammaPrint and PAM50 in predicting patient survival. In conclusion, integration of mutation, expression and functional data allows identification of clinically relevant potential driver genes in cancer.

Availability and implementation: The documented pipeline including annotated sample scripts can be found in http://fafner.meb.ki.se/biostatwiki/driver-genes/.

Contact: yudi.pawitan@ki.se

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Biomarkers, Tumor / genetics*
  • Breast / metabolism
  • Breast Neoplasms / genetics*
  • Breast Neoplasms / mortality*
  • Breast Neoplasms / pathology
  • Carcinoma, Ductal, Breast / genetics
  • Carcinoma, Ductal, Breast / mortality
  • Carcinoma, Ductal, Breast / pathology
  • Case-Control Studies
  • Computational Biology / methods
  • Exome / genetics
  • Female
  • Gene Expression Profiling*
  • Gene Expression Regulation, Neoplastic*
  • Gene Regulatory Networks*
  • Genetic Predisposition to Disease
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Middle Aged
  • Mutation / genetics*
  • Neoplasm Invasiveness
  • Neoplasm Staging
  • Prognosis
  • Survival Rate

Substances

  • Biomarkers, Tumor