Comparison of somatic mutation calling methods in amplicon and whole exome sequence data

BMC Genomics. 2014 Mar 28;15:244. doi: 10.1186/1471-2164-15-244.


Background: High-throughput sequencing is rapidly becoming common practice in clinical diagnosis and cancer research. Many algorithms have been developed for somatic single nucleotide variant (SNV) detection in matched tumor-normal DNA sequencing. Although numerous studies have compared the performance of various algorithms on exome data, there has not yet been a systematic evaluation using PCR-enriched amplicon data with a range of variant allele fractions. The recently developed gold standard variant set for the reference individual NA12878 by the NIST-led "Genome in a Bottle" Consortium (NIST-GIAB) provides a good resource to evaluate admixtures with various SNV fractions.

Results: Using the NIST-GIAB gold standard, we compared the performance of five popular somatic SNV calling algorithms (GATK UnifiedGenotyper followed by simple subtraction, MuTect, Strelka, SomaticSniper and VarScan2) for matched tumor-normal amplicon and exome sequencing data.

Conclusions: We demonstrated that the five commonly used somatic SNV calling methods are applicable to both targeted amplicon and exome sequencing data. However, the sensitivities of these methods vary based on the allelic fraction of the mutation in the tumor sample. Our analysis can assist researchers in choosing a somatic SNV calling method suitable for their specific needs.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Databases, Nucleic Acid
  • Exome*
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Mutation*
  • Point Mutation
  • ROC Curve
  • Sensitivity and Specificity
  • Software*