Statistical method on nonrandom clustering with application to somatic mutations in cancer

BMC Bioinformatics. 2010 Jan 7;11:11. doi: 10.1186/1471-2105-11-11.


Background: Human cancer is caused by the accumulation of tumor-specific mutations in oncogenes and tumor suppressors that confer a selective growth advantage to cells. As a consequence of genomic instability and high levels of proliferation, many passenger mutations that do not contribute to the cancer phenotype arise alongside mutations that drive oncogenesis. While several approaches have been developed to separate driver mutations from passengers, few approaches can specifically identify activating driver mutations in oncogenes, which are more amenable for pharmacological intervention.

Results: We propose a new statistical method for detecting activating mutations in cancer by identifying nonrandom clusters of amino acid mutations in protein sequences. A probability model is derived using order statistics assuming that the location of amino acid mutations on a protein follows a uniform distribution. Our statistical measure is the differences between pair-wise order statistics, which is equivalent to the size of an amino acid mutation cluster, and the probabilities are derived from exact and approximate distributions of the statistical measure. Using data in the Catalog of Somatic Mutations in Cancer (COSMIC) database, we have demonstrated that our method detects well-known clusters of activating mutations in KRAS, BRAF, PI3K, and beta-catenin. The method can also identify new cancer targets as well as gain-of-function mutations in tumor suppressors.

Conclusions: Our proposed method is useful to discover activating driver mutations in cancer by identifying nonrandom clusters of somatic amino acid mutations in protein sequences.

MeSH terms

  • Cluster Analysis*
  • Genes, ras / genetics
  • Genome, Human
  • Humans
  • Models, Statistical*
  • Mutation*
  • Neoplasms / genetics*
  • Proto-Oncogene Proteins B-raf / genetics
  • beta Catenin / genetics


  • beta Catenin
  • BRAF protein, human
  • Proto-Oncogene Proteins B-raf