Resampling method for unsupervised estimation of cluster validity

Neural Comput. 2001 Nov;13(11):2573-93. doi: 10.1162/089976601753196030.

Abstract

We introduce a method for validation of results obtained by clustering analysis of data. The method is based on resampling the available data. A figure of merit that measures the stability of clustering solutions against resampling is introduced. Clusters that are stable against resampling give rise to local maxima of this figure of merit. This is presented first for a one-dimensional data set, for which an analytic approximation for the figure of merit is derived and compared with numerical measurements. Next, the applicability of the method is demonstrated for higher-dimensional data, including gene microarray expression data.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Cluster Analysis*
  • Models, Theoretical*
  • Oligonucleotide Array Sequence Analysis