Design and analysis issues in genome-wide somatic mutation studies of cancer

Genomics. 2009 Jan;93(1):17-21. doi: 10.1016/j.ygeno.2008.07.005. Epub 2008 Aug 23.


The availability of the human genome sequence and progress in sequencing and bioinformatic technologies have enabled genome-wide investigation of somatic mutations in human cancers. This article briefly reviews challenges arising in the statistical analysis of mutational data of this kind. A first challenge is that of designing studies that efficiently allocate sequencing resources. We show that this can be addressed by two-stage designs and demonstrate via simulations that even relatively small studies can produce lists of candidate cancer genes that are highly informative for future research efforts. A second challenge is to distinguish mutated genes that are selected for by cancer (drivers) from mutated genes that have no role in the development of cancer and simply happened to mutate (passengers). We suggest that this question is best approached as a classification problem and discuss some of the difficulties of more traditional testing-based approaches. A third challenge is to identify biologic processes affected by the driver genes. This can be pursued by gene set analyses. These can reliably identify functional groups and pathways that are enriched for mutated genes even when the individual genes involved in those pathways or sets are not mutated at sufficient frequencies to provide conclusive evidence as drivers.

Publication types

  • Research Support, Non-U.S. Gov't
  • Review

MeSH terms

  • DNA Mutational Analysis / methods*
  • Genes, Neoplasm*
  • Genome, Human*
  • Genomics*
  • Humans
  • Mutation / genetics
  • Neoplasm Proteins / chemistry
  • Neoplasm Proteins / genetics
  • Neoplasms / genetics*


  • Neoplasm Proteins