A semi-supervised approach to projected clustering with applications to microarray data

Int J Data Min Bioinform. 2009;3(3):229-59. doi: 10.1504/ijdmb.2009.026700.


Recent studies have suggested that extremely low dimensional projected clusters exist in real datasets. Here, we propose a new algorithm for identifying them. It combines object clustering and dimension selection, and allows the input of domain knowledge in guiding the clustering process. Theoretical and experimental results show that even a small amount of input knowledge could already help detect clusters with only 1% of the relevant dimensions. We also show that this semi-supervised algorithm can perform knowledge-guided selective clustering when there are multiple meaningful object groupings. The algorithm is also shown effective in analysing a microarray dataset.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Cluster Analysis*
  • Humans
  • Neoplasm Proteins / metabolism
  • Oligonucleotide Array Sequence Analysis / methods
  • Pattern Recognition, Automated*
  • Software


  • Neoplasm Proteins