Identification of Local Clusters of Mutation Hotspots in Cancer-Related Genes and Their Biological Relevance

IEEE/ACM Trans Comput Biol Bioinform. 2019 Sep-Oct;16(5):1656-1662. doi: 10.1109/TCBB.2018.2813375. Epub 2018 Mar 16.


Mutation hotspots are either solitary amino acid residues or stretches of amino acids that show elevated mutation frequency in cancer-related genes, but their prevalence and biological relevance are not completely understood. Here, we developed a Smith-Waterman algorithm-based mutation hotspot discovery method, MutClustSW, to identify mutation hotspots of either single or clustered amino acid residues. We identified 181 missense mutation hotspots from COSMIC and TCGA mutation databases. In addition to 77 single amino acid residue hotspots (42.5 percent) including well-known mutation hotspots such as IDH1 (p.R132) and BRAF (p.V600), we identified 104 mutation hotspots (57.5 percent) as clusters or stretches of multiple amino acids, and the hotspots on MUC2, EPPK1, KMT2C, and TP53 were larger than 50 amino acids. Twelve of 27 nonsense mutation hotspots (44.4 percent) were observed in four cancer-related genes, TP53, ARID1A, CDKN2A, and PTEN, suggesting that truncating mutations on some tumor suppressor genes are not randomly distributed as previously assumed. We also show that hotspot mutations have higher mutation allele frequency than non-hotspots, and the hotspot information can be used to prioritize the cancer drivers. Together, the proposed algorithm and the mutation hotspot information can serve as valuable resources in the selection of functional driver mutations and associated genes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence / genetics
  • Cluster Analysis
  • Computational Biology / methods*
  • Databases, Genetic
  • Humans
  • Mutation / genetics*
  • Neoplasms / genetics*