OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes

Bioinformatics. 2013 Sep 15;29(18):2238-44. doi: 10.1093/bioinformatics/btt395. Epub 2013 Jul 24.

Abstract

Motivation: Gain-of-function mutations often cluster in specific protein regions, a signal that those mutations provide an adaptive advantage to cancer cells and consequently are positively selected during clonal evolution of tumours. We sought to determine the overall extent of this feature in cancer and the possibility to use this feature to identify drivers.

Results: We have developed OncodriveCLUST, a method to identify genes with a significant bias towards mutation clustering within the protein sequence. This method constructs the background model by assessing coding-silent mutations, which are assumed not to be under positive selection and thus may reflect the baseline tendency of somatic mutations to be clustered. OncodriveCLUST analysis of the Catalogue of Somatic Mutations in Cancer retrieved a list of genes enriched by the Cancer Gene Census, prioritizing those with dominant phenotypes but also highlighting some recessive cancer genes, which showed wider but still delimited mutation clusters. Assessment of datasets from The Cancer Genome Atlas demonstrated that OncodriveCLUST selected cancer genes that were nevertheless missed by methods based on frequency and functional impact criteria. This stressed the benefit of combining approaches based on complementary principles to identify driver mutations. We propose OncodriveCLUST as an effective tool for that purpose.

Availability: OncodriveCLUST has been implemented as a Python script and is freely available from http://bg.upf.edu/oncodriveclust

Contact: nuria.lopez@upf.edu or abel.gonzalez@upf.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis
  • Genes, Neoplasm*
  • Genomics
  • Humans
  • Mutation*
  • Neoplasm Proteins / genetics*
  • Sequence Analysis, Protein / methods*
  • Software

Substances

  • Neoplasm Proteins