Passenger mutations accurately classify human tumors

PLoS Comput Biol. 2019 Apr 15;15(4):e1006953. doi: 10.1371/journal.pcbi.1006953. eCollection 2019 Apr.

Abstract

Determining the cancer type and molecular subtype has important clinical implications. The primary site is however unknown for some malignancies discovered in the metastatic stage. Moreover liquid biopsies may be used to screen for tumoral DNA, which upon detection needs to be assigned to a site-of-origin. Classifiers based on genomic features are a promising approach to prioritize the tumor anatomical site, type and subtype. We examined the predictive ability of causal (driver) somatic mutations in this task, comparing it against global patterns of non-selected (passenger) mutations, including features based on regional mutation density (RMD). In the task of distinguishing 18 cancer types, the driver mutations-mutated oncogenes or tumor suppressors, pathways and hotspots-classified 36% of the patients to the correct cancer type. In contrast, the features based on passenger mutations did so at 92% accuracy, with similar contribution from the RMD and the trinucleotide mutation spectra. The RMD and the spectra covered distinct sets of patients with predictions. In particular, introducing the RMD features into a combined classification model increased the fraction of diagnosed patients by 50 percentage points (at 20% FDR). Furthermore, RMD was able to discriminate molecular subtypes and/or anatomical site of six major cancers. The advantage of passenger mutations was upheld under high rates of false negative mutation calls and with exome sequencing, even though overall accuracy decreased. We suggest whole genome sequencing is valuable for classifying tumors because it captures global patterns emanating from mutational processes, which are informative of the underlying tumor biology.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • DNA, Neoplasm / classification
  • DNA, Neoplasm / genetics
  • Exome / genetics
  • Exome Sequencing / methods
  • Genomics
  • Humans
  • Machine Learning
  • Mutation / genetics
  • Neoplasms / classification*
  • Neoplasms / genetics*
  • Software
  • Whole Genome Sequencing / methods

Substances

  • DNA, Neoplasm

Grants and funding

FS was funded by the ERC StG 757700 HYPER-INSIGHT (https://erc.europa.eu/) and by the MINECO grant BFU2017-89833-P (http://www.ciencia.gob.es/portal/site/MICINN/). We acknowledge funding from the Severo Ochoa Center of Excellence award (http://www.ciencia.gob.es/portal/site/MICINN/excellentinstitutions) to the IRB Barcelona. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.