Integrating mutation and gene expression cross-sectional data to infer cancer progression

BMC Syst Biol. 2016 Jan 25:10:12. doi: 10.1186/s12918-016-0255-6.

Abstract

Background: A major problem in identifying the best therapeutic targets for cancer is the molecular heterogeneity of the disease. Cancer is often caused by an accumulation of mutations which produce irreversible damage to the cell's control mechanisms of survival and proliferation. Different mutations may affect these cellular anachronisms through a combination of molecular interactions which may be dynamically changing during cancer progression. It has been previously shown that cancer accumulates mutations over time. In this paper we address the problem of cancer heterogeneity by modeling cancer progression using somatic mutation and gene expression cross-sectional data.

Results: We propose a novel formulation of integrating somatic mutation and gene expression data to infer the temporal sequence of events from cross-sectional data. Using a mixed integer linear program we model the interaction between groups of different mutated genes and the resulting modifications at the gene expression level. Our approach identifies a partition of mutation events which gradually produce gene expression changes to a partition of genes over time. The proposed formulation is tested using both simulated data and real breast cancer data with matched somatic mutations and gene expression measurements from The Cancer Genome Atlas. First, we classify the genes as oncogenes or tumor suppressors based on the frequency of driver mutations. As expected, the most frequently mutated genes in breast cancer are PIK3CA and TP53 genes. Then, we select those genes with most frequent driver mutations and a set of genes known to play roles in cancer development. Furthermore, we apply the proposed mixed integer linear program to identify the temporal order in which genes mutate and, simultaneously, the changes they produce at the gene expression level during cancer progression. In addition, we are able to identify known causal relationships between mutations and gene expression changes in PI3K/AKT and TP53 pathways.

Conclusions: This paper proposes a new model to infer the temporal sequence in which mutations occur and lead to changes at the gene expression level during cancer progression. The approach is general and can be applied to any data sets with available somatic mutations and gene expression measurements.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Breast Neoplasms / genetics
  • Breast Neoplasms / metabolism
  • Breast Neoplasms / pathology
  • Computational Biology / methods*
  • Cross-Sectional Studies
  • Disease Progression*
  • Gene Regulatory Networks
  • Humans
  • Mutation*
  • Neoplasms / genetics*
  • Neoplasms / metabolism
  • Neoplasms / pathology*
  • Phosphatidylinositol 3-Kinases / metabolism
  • Proto-Oncogene Proteins c-akt / metabolism
  • Transcriptome*
  • Tumor Suppressor Protein p53 / metabolism

Substances

  • Tumor Suppressor Protein p53
  • Phosphatidylinositol 3-Kinases
  • Proto-Oncogene Proteins c-akt