Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jul 11;499(7457):214-218.
doi: 10.1038/nature12213. Epub 2013 Jun 16.

Mutational Heterogeneity in Cancer and the Search for New Cancer-Associated Genes

Michael S Lawrence #  1 Petar Stojanov #  1   2 Paz Polak #  1   3   4 Gregory V Kryukov  1   3   4 Kristian Cibulskis  1 Andrey Sivachenko  1 Scott L Carter  1 Chip Stewart  1 Craig H Mermel  1   5 Steven A Roberts  6 Adam Kiezun  1 Peter S Hammerman  1   2 Aaron McKenna  1   7 Yotam Drier  1   3   5   8   9 Lihua Zou  1 Alex H Ramos  1 Trevor J Pugh  1   2   3 Nicolas Stransky  1 Elena Helman  1   10 Jaegil Kim  1 Carrie Sougnez  1 Lauren Ambrogio  1 Elizabeth Nickerson  1 Erica Shefler  1 Maria L Cortés  1 Daniel Auclair  1 Gordon Saksena  1 Douglas Voet  1 Michael Noble  1 Daniel DiCara  1 Pei Lin  1 Lee Lichtenstein  1 David I Heiman  1 Timothy Fennell  1 Marcin Imielinski  1   5 Bryan Hernandez  1 Eran Hodis  1   2 Sylvan Baca  1   2 Austin M Dulak  1   2 Jens Lohr  1   2 Dan-Avi Landau  1   2   11 Catherine J Wu  2   3 Jorge Melendez-Zajgla  12 Alfredo Hidalgo-Miranda  12 Amnon Koren  1   3 Steven A McCarroll  1   3 Jaume Mora  13 Brian Crompton  2   14 Robert Onofrio  1 Melissa Parkin  1 Wendy Winckler  1 Kristin Ardlie  1 Stacey B Gabriel  1 Charles W M Roberts  2   3   14 Jaclyn A Biegel  15 Kimberly Stegmaier  1   2   14 Adam J Bass  1   2   3 Levi A Garraway  1   2   3 Matthew Meyerson  1   2   3 Todd R Golub  1   2   3   8 Dmitry A Gordenin  6 Shamil Sunyaev  1   3   4 Eric S Lander  1   3   10 Gad Getz  1   5
Free PMC article

Mutational Heterogeneity in Cancer and the Search for New Cancer-Associated Genes

Michael S Lawrence et al. Nature. .
Free PMC article


Major international projects are underway that are aimed at creating a comprehensive catalogue of all the genes responsible for the initiation and progression of cancer. These studies involve the sequencing of matched tumour-normal samples followed by mathematical analysis to identify those genes in which mutations occur more frequently than expected by random chance. Here we describe a fundamental problem with cancer genome studies: as the sample size increases, the list of putatively significant genes produced by current analytical methods burgeons into the hundreds. The list includes many implausible genes (such as those encoding olfactory receptors and the muscle protein titin), suggesting extensive false-positive findings that overshadow true driver events. We show that this problem stems largely from mutational heterogeneity and provide a novel analytical methodology, MutSigCV, for resolving the problem. We apply MutSigCV to exome sequences from 3,083 tumour-normal pairs and discover extraordinary variation in mutation frequency and spectrum within cancer types, which sheds light on mutational processes and disease aetiology, and in mutation frequency across the genome, which is strongly correlated with DNA replication timing and also with transcriptional activity. By incorporating mutational heterogeneity into the analyses, MutSigCV is able to eliminate most of the apparent artefactual findings and enable the identification of genes truly associated with cancer.


Figure 1
Figure 1
Somatic mutation frequencies observed in exomes from 3,083 tumor-normal pairs. Each dot corresponds to a tumor-normal pair, with vertical position indicating the total frequency of somatic mutations in the exome. Tumor types are ordered by their median somatic mutation frequency, with the lowest frequencies (left) found in hematological and pediatric tumors, and the highest (right) in tumors induced by carcinogens such as tobacco smoke and UV light. Mutation frequencies vary more than 1000-fold between lowest and highest mutation rates across cancer and also within several tumor types. The lower panel shows the relative proportions of the six different possible base-pair substitutions, as indicated in the legend on the left. (See also Supplementary Table S2.)
Figure 2
Figure 2
Radial spectrum plot of the 2,892 tumor samples having at least 10 coding mutations. The angular space is compartmentalized into the six different factors discovered by NMF (see Methods). The distance from the center represents the total mutation frequency. Different tumor types segregate into different compartments based on their mutation spectra. Notable examples are: lung adenocarcinoma and lung squamous carcinoma (red; 2 o’clock position), melanoma (black; 12 o’clock position), stomach, esophageal and colorectal cancer (various shades of green; 8 o’clock position), samples harboring mutations of the HPV or APOBEC signature (bladder, cervical and head and neck cancer, marked in yellow, orange, and blue respectively; 10 o’clock position), and AML and CLL samples sharing the Tp*A→T signature, 4 o’clock position. (See also Supplementary Table S3.)
Figure 3
Figure 3
Mutation rate varies widely across the genome and correlates with DNA replication time and expression level. (a,b) Mutation rate, replication time, and expression level plotted across selected regions of the genome. Red shows total noncoding mutation rate calculated from whole-genome sequences of 126 samples (excluding exons). Blue shows replication time. Green shows average expression level across 91 cell lines in the Cancer Cell Line Encyclopedia (CCLE), determined by RNA sequencing. (Note that low expression is at the top of the scale and high expression at the bottom, in order to emphasize the mutual correlations with the other variables). Shown are (a) entire chromosome 14 and (b) portions of chromosomes 1 and 8, with the locations of two specific loci: a cluster of 16 olfactory receptors on chr1 and the gene CSMD3 on chr8. These two loci have very high mutation rates, late replication times, and low expression levels. (The local mutation rate at CSMD3 is even higher than predicted from replication time and expression, suggesting contributions from additional factors, perhaps locally increased DNA breakage: the locus is a known fragile site). (c,d) Correlation of mutation rate with expression level and replication time, for all 100 Kb windows across the genome. (e,f) Cumulative distribution of various gene families as a function of expression level and replication time. Olfactory receptor genes, genes encoding long proteins (>4,000aa) and genes spanning large genomic loci (>1Mb) are significantly enriched towards lower expression and later replication. In contrast, known cancer genes (as listed in the Cancer Gene Census) trend toward slightly higher expression and earlier replication. (See also Supplementary Figure S9 and Supplementary Tables S4, S5, S6.)

Comment in

Similar articles

See all similar articles

Cited by 1,711 articles

See all "Cited by" articles


    1. TCGA Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455:1061–8. - PMC - PubMed
    1. TCGA Integrated genomic analyses of ovarian carcinoma. Nature. 2011;474:609–15. - PMC - PubMed
    1. TCGA Comprehensive Molecular Characterization of Human Colon and Rectal Cancer. Nature. 2012 - PMC - PubMed
    1. Ding L, et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature. 2008;455:1069–75. - PMC - PubMed
    1. Stransky N, et al. The mutational landscape of head and neck squamous cell carcinoma. Science. 2011;333:1157–60. - PMC - PubMed

Publication types