Pairwise variable selection for high-dimensional model-based clustering

Jian Guo; Elizaveta Levina; George Michailidis; Ji Zhu

doi:10.1111/j.1541-0420.2009.01341.x

Pairwise variable selection for high-dimensional model-based clustering

Biometrics. 2010 Sep;66(3):793-804. doi: 10.1111/j.1541-0420.2009.01341.x.

Authors

Jian Guo¹, Elizaveta Levina, George Michailidis, Ji Zhu

Affiliation

¹ Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109, USA.

Abstract

Variable selection for clustering is an important and challenging problem in high-dimensional data analysis. Existing variable selection methods for model-based clustering select informative variables in a "one-in-all-out" manner; that is, a variable is selected if at least one pair of clusters is separable by this variable and removed if it cannot separate any of the clusters. In many applications, however, it is of interest to further establish exactly which clusters are separable by each informative variable. To address this question, we propose a pairwise variable selection method for high-dimensional model-based clustering. The method is based on a new pairwise penalty. Results on simulated and real data show that the new method performs better than alternative approaches that use ℓ(1) and ℓ(∞) penalties and offers better interpretation.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Cluster Analysis*
Computer Simulation
Data Interpretation, Statistical*
Models, Statistical

Abstract

Publication types

MeSH terms

Grants and funding