Coclustering of human cancer microarrays using Minimum Sum-Squared Residue coclustering

IEEE/ACM Trans Comput Biol Bioinform. 2008 Jul-Sep;5(3):385-400. doi: 10.1109/TCBB.2007.70268.

Abstract

It is a consensus in microarray analysis that identifying potential local patterns, characterized by coherent groups of genes and conditions, may shed light on the discovery of previously undetectable biological cellular processes of genes as well as macroscopic phenotypes of related samples. In order to simultaneously cluster genes and conditions, we have previously developed a fast co-clustering algorithm, Minimum Sum-Squared Residue Co-clustering (MSSRCC), which employs an alternating minimization scheme and generates what we call co-clusters in a checkerboard structure. In this paper, we propose specific strategies that enable MSSRCC to escape poor local minima and resolve the degeneracy problem in partitional clustering algorithms. The strategies include binormalization, deterministic spectral initialization, and incremental local search. We assess the effects of various strategies on both synthetic gene expression datasets and real human cancer microarrays and provide empirical evidence that MSSRCC with the proposed strategies performs better than existing co-clustering and clustering algorithms. In particular, the combination of all the three strategies leads to the best performance. Furthermore, we illustrate coherence of the resulting co-clusters in a checkerboard structure, where genes in a co-cluster manifest the phenotype structure of corresponding specific samples, and evaluate the enrichment of functional annotations in Gene Ontology (GO).

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Biomarkers, Tumor / analysis*
  • Cluster Analysis
  • Gene Expression Profiling / methods*
  • Humans
  • Least-Squares Analysis
  • Neoplasm Proteins / analysis*
  • Neoplasms / diagnosis*
  • Neoplasms / metabolism*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pattern Recognition, Automated / methods*

Substances

  • Biomarkers, Tumor
  • Neoplasm Proteins