Intersect-then-combine approach: improving the performance of somatic variant calling in whole exome sequencing data using multiple aligners and callers

Genome Med. 2017 Apr 18;9(1):35. doi: 10.1186/s13073-017-0425-1.

Abstract

Bioinformatic analysis of genomic sequencing data to identify somatic mutations in cancer samples is far from achieving the required robustness and standardisation. In this study we generated a whole exome sequencing benchmark dataset using the platinum genome sample NA12878 and developed an intersect-then-combine (ITC) approach to increase the accuracy in calling single nucleotide variants (SNVs) and indels in tumour-normal pairs. We evaluated the effect of alignment, base quality recalibration, mutation caller and filtering on sensitivity and false positive rate. The ITC approach increased the sensitivity up to 17.1%, without increasing the false positive rate per megabase (FPR/Mb) and its validity was confirmed in a set of clinical samples.

Keywords: BWA; Filtering; Mutect2; NA12878; Novoalign; Platinum genome; Somatic mutation; Strelka; Variant calling; Whole exome sequencing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • DNA, Neoplasm
  • Exome
  • Genome, Human*
  • Humans
  • INDEL Mutation
  • Mutation*
  • Neoplasms / genetics*
  • Polymorphism, Single Nucleotide
  • Sensitivity and Specificity
  • Sequence Analysis, DNA / methods*

Substances

  • DNA, Neoplasm