Co-occurrence analysis of insertional mutagenesis data reveals cooperating oncogenes

Bioinformatics. 2007 Jul 1;23(13):i133-41. doi: 10.1093/bioinformatics/btm202.


Motivation: Cancers are caused by an accumulation of multiple independent mutations that collectively deregulate cellular pathways, e.g. such as those regulating cell division and cell-death. The publicly available Retroviral Tagged Cancer Gene Database (RTCGD) contains the data of many insertional mutagenesis screens, in which the virally induced mutations result in tumor formation in mice. The insertion loci therefore indicate the location of putative cancer genes. Additionally, the presence of multiple independent insertions within one tumor hints towards a cooperation between the insertionally mutated genes. In this study we focus on the detection of statistically significant co-mutations.

Results: We propose a two-dimensional Gaussian Kernel Convolution method (2DGKC), a computational technique that identifies the cooperating mutations in insertional mutagenesis data. We define the Common Co-occurrence of Insertions (CCI), signifying the co-mutations that are statistically significant across all different screens in the RTCGD. Significance estimates are made on multiple scales, and the results visualized in a scale space, thereby providing valuable extra information on the putative cooperation. The multidimensional analysis of the insertion data results in the discovery of 86 statistically significant co-mutations, indicating the presence of cooperating oncogenes that play a role in tumor development. Since oncogenes may cooperate with several members of a parallel pathway, we combined the co-occurrence data with gene family information to find significant cooperations between oncogenes and families of genes. We show, for instance, the interchangeable cooperation of Myc insertions with insertions in the Pim family.

Availability: A list of the resulting CCIs is available at:

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • Databases, Genetic
  • Models, Biological
  • Multigene Family / genetics*
  • Mutagenesis, Insertional / methods*
  • Neoplasms / genetics*
  • Neoplasms / metabolism*
  • Oncogene Proteins / genetics
  • Oncogene Proteins / metabolism*
  • Oncogenes / genetics*
  • Signal Transduction*


  • Oncogene Proteins