Predicting the functional consequences of somatic missense mutations found in tumors

Methods Mol Biol. 2014;1101:135-59. doi: 10.1007/978-1-62703-721-1_8.


Cancer-specific High-throughput Annotation of Somatic Mutations (CHASM) is a computational method that uses supervised machine learning to prioritize somatic missense mutations detected in tumor sequencing studies. Missense mutations are a key mechanism by which important cellular behaviors, such as cell growth, proliferation, and survival, are disrupted in cancer. However, only a fraction of the missense mutations observed in tumor genomes are expected to be cancer causing. Distinguishing tumorigenic "driver" mutations from their neutral "passenger" counterparts is currently a pressing problem in cancer research.CHASM trains a Random Forest classifier on driver mutations from the COSMIC databases and uses background nucleotide substitution rates observed in tumor sequencing data to model tumor type-specific passenger mutations. Each missense mutation is represented by quantitative features that fall into five major categories: physiochemical properties of amino acid residues; scores derived from multiple sequence alignments of protein or DNA; region-based amino acid sequence composition; predicted properties of local protein structure; and annotations from the UniProt feature tables. Both a software package and a Web server implementation of CHASM are available to facilitate high-throughput prioritization of somatic missense mutations from large, multi-tumor exome sequencing studies. After ranking candidate driver mutations with CHASM, the vector of features describing each mutation can be used to suggest possible mechanism by which mutations alter protein activity in tumorigenesis. This chapter details the application of both implementations of CHASM to tumor sequencing data.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Artificial Intelligence
  • Computational Biology
  • DNA Mutational Analysis
  • Databases, Genetic
  • Genes, Neoplasm
  • Genetic Association Studies*
  • Humans
  • Models, Genetic
  • Molecular Sequence Annotation
  • Mutation, Missense*
  • Neoplasms / genetics*
  • Software*