Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Filters applied. Clear all
. 2013 May 4;14:302.
doi: 10.1186/1471-2164-14-302.

Identification of Somatic Mutations in Cancer Through Bayesian-based Analysis of Sequenced Genome Pairs

Free PMC article

Identification of Somatic Mutations in Cancer Through Bayesian-based Analysis of Sequenced Genome Pairs

Alexis Christoforides et al. BMC Genomics. .
Free PMC article


Background: The field of cancer genomics has rapidly adopted next-generation sequencing (NGS) in order to study and characterize malignant tumors with unprecedented resolution. In particular for cancer, one is often trying to identify somatic mutations--changes specific to a tumor and not within an individual's germline. However, false positive and false negative detections often result from lack of sufficient variant evidence, contamination of the biopsy by stromal tissue, sequencing errors, and the erroneous classification of germline variation as tumor-specific.

Results: We have developed a generalized Bayesian analysis framework for matched tumor/normal samples with the purpose of identifying tumor-specific alterations such as single nucleotide mutations, small insertions/deletions, and structural variation. We describe our methodology, and discuss its application to other types of paired-tissue analysis such as the detection of loss of heterozygosity as well as allelic imbalance. We also demonstrate the high level of sensitivity and specificity in discovering simulated somatic mutations, for various combinations of a) genomic coverage and b) emulated heterogeneity.

Conclusion: We present a Java-based implementation of our methods named Seurat, which is made available for free academic use. We have demonstrated and reported on the discovery of different types of somatic change by applying Seurat to an experimentally-derived cancer dataset using our methods; and have discussed considerations and practices regarding the accurate detection of somatic events in cancer genomes. Seurat is available at


Figure 1
Figure 1
Performance of Seurat’s somatic point mutation detection with varying genomic coverage. Legend: The sensitivity (A) and false discovery rate (B) for Seurat’s somatic point mutation detection method, evaluated on simulated cancer genome data with no simulated normal tissue contamination. Each series represents the coverage used for the ‘normal’ genome data set, and the x-axis represents the ‘tumor’ genome average coverage.
Figure 2
Figure 2
Performance of somatic point mutation detection with varying tumor purity. Legend: The sensitivity (A) and false discovery rate (B) for Seurat, VarScan 2, Strelka and Somaticsniper, given tumor DNA data of varying simulated tumor purity. Seurat reaches 90% sensitivity at ~45% tumor purity in sequence data with average genomic coverage of 128 × .
Figure 3
Figure 3
The effect of increased sequencing of the normal genome on Seurat’s somatic mutation detection. Legend: Demonstration of the effect of increased sequencing of the normal genome in a matched normal/tumor analysis using Seurat. We present three common scenarios: A) a locus with a true somatic variant, but presented with low frequency of the variant allele, because of mapping difficulty, low purity of the tumor biopsy or because of the variant being present only in a minor sub-clonal population. B) a locus with a potential false-positive call, because of erroneously-aligned variant evidence. C) a locus with a variant genotype in the normal genome, but with a coincidental lack of evidence causing it to appear as a tumor-only variant. In all three scenarios, the increase in sequencing data available for the normal genome updates the expectation of variant evidence (by altering the shape of the conjugate beta distribution) and consequently amplifies Seurat’s capability to correctly reject the last two cases and accept the first case.

Similar articles

See all similar articles

Cited by 36 articles

See all "Cited by" articles


    1. Nielsen R, Paul JS, Albrechtsen A, Song YS. Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet. 2011;12:443. doi: 10.1038/nrg2986. - DOI - PMC - PubMed
    1. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The sequence alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. - DOI - PMC - PubMed
    1. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. - DOI - PMC - PubMed
    1. Lee W, Jiang Z, Liu J, Haverty PM, Guan Y, Stinson J, Yue P, Zhang Y, Pant KP, Bhatt D, Ha C, Johnson S, Kennemer MI, Mohan S, Nazarenko I, Watanabe C, Sparks AB, Shames DS, Gentleman R, De Sauvage FJ, Stern H, Pandita A, Ballinger DG, Drmanac R, Modrusan Z, Seshagiri S, Zhang Z. The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature. 2010;465:473–477. doi: 10.1038/nature09004. - DOI - PubMed
    1. Pleasance ED, Cheetham RK, Stephens PJ, McBride DJ, Humphray SJ, Greenman CD, Varela I, Lin M-L, Ordonez GR, Bignell GR, Ye K, Alipaz J, Bauer MJ, Beare D, Butler A, Carter RJ, Chen L, Cox AJ, Edkins S, Kokko-Gonzales PI, Gormley NA, Grocock RJ, Haudenschild CD, Hims MM, James T, Jia M, Kingsbury Z, Leroy C, Marshall J, Menzies A. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature. 2010;463:191–196. doi: 10.1038/nature08658. - DOI - PMC - PubMed

Publication types

LinkOut - more resources