SomaticSeq: An Ensemble and Machine Learning Method to Detect Somatic Mutations

Methods Mol Biol. 2020:2120:47-70. doi: 10.1007/978-1-0716-0327-7_4.

Abstract

A standard strategy to discover somatic mutations in a cancer genome is to use next-generation sequencing (NGS) technologies to sequence the tumor tissue and its matched normal (commonly blood or adjacent normal tissue) for side-by-side comparison. However, when interrogating entire genomes (or even just the coding regions), the number of sequencing errors easily outnumbers the number of real somatic mutations by orders of magnitudes. Here, we describe SomaticSeq, which incorporates multiple somatic mutation detection algorithms and then uses machine learning to vastly improve the accuracy of the somatic mutation call sets.

Keywords: Bioinformatics; Ensemble method; Machine learning; Sequencing; Somatic mutations.

MeSH terms

  • Genome, Human
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Machine Learning*
  • Mutation*
  • Neoplasms / genetics*
  • Software*