Pairwise Variable Selection for High-Dimensional Model-Based Clustering

doi:10.1111/J.1541-0420.2009.01341.X

Open AccessJournal ArticleDOI

Pairwise Variable Selection for High-Dimensional Model-Based Clustering

Jian Guo, +3 more

- 01 Sep 2010 -

Biometrics

- Vol. 66, Iss: 3, pp 793-804

TLDR

A pairwise variable selection method for high‐dimensional model‐based clustering is proposed, based on a new pairwise penalty, that performs better than alternative approaches that use ℓ1 andℓ∞ penalties and offers better interpretation.

Abstract:

Variable selection for clustering is an important and challenging problem in high-dimensional data analysis. Existing variable selection methods for model-based clustering select informative variables in a "one-in-all-out" manner; that is, a variable is selected if at least one pair of clusters is separable by this variable and removed if it cannot separate any of the clusters. In many applications, however, it is of interest to further establish exactly which clusters are separable by each informative variable. To address this question, we propose a pairwise variable selection method for high-dimensional model-based clustering. The method is based on a new pairwise penalty. Results on simulated and real data show that the new method performs better than alternative approaches that use l(1) and l(∞) penalties and offers better interpretation.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A Concave Pairwise Fusion Approach to Subgroup Analysis

Shujie Ma, +1 more

- 03 May 2017 -

Journal of the American Statistical Asso...

TL;DR: In this paper, a penalized approach for subgroup analysis based on a regression model is proposed, in which heterogeneity is driven by unobserved latent factors and thus can be represented by using subject-specific intercepts.

...read moreread less

Book

Model-Based Clustering and Classification for Data Science

Charles Bouveyron, +3 more

TL;DR: In this paper, the authors frame cluster analysis and classification in terms of statistical models, thus yielding principled estimation, testing and prediction methods, and sound answers to the central questions, such as how many clusters are there? which method should I use? How should I handle outliers.

...read moreread less

Journal ArticleDOI

Regularized k-means clustering of high-dimensional data and its asymptotic consistency

Wei Sun, +2 more

- 01 Jan 2012 -

Electronic Journal of Statistics

...read moreread less

Journal ArticleDOI

Penalized model-based clustering with unconstrained covariance matrices.

Hui Zhou, +2 more

- 01 Jan 2009 -

Electronic Journal of Statistics

TL;DR: This article proposes a regularized Gaussian mixture model permitting a treatment of general covariance matrices, taking various dependencies into account, and derives an E-M algorithm utilizing the graphical lasso for parameter estimation, achieving better clustering and variable selection.

...read moreread less

Journal ArticleDOI

Variable Selection Methods for Model-based Clustering

Michael Fop, +1 more

- 02 Jul 2017 -

arXiv: Methodology

TL;DR: This review provides a summary of the methods developed for variable selection in model-based clustering and existing R packages implementing the different methods are indicated and illustrated in application to two data analysis examples.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Some methods for classification and analysis of multivariate observations

James B. MacQueen

TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.

...read moreread less

Book

Finding Groups in Data: An Introduction to Cluster Analysis

Leonard Kaufman, +1 more

TL;DR: An electrical signal transmission system, applicable to the transmission of signals from trackside hot box detector equipment for railroad locomotives and rolling stock, wherein a basic pulse train is transmitted whereof the pulses are of a selected first amplitude and represent a train axle count.

...read moreread less

BookDOI

Finding Groups in Data

Leonard Kaufman, +1 more

TL;DR: In this article, an electrical signal transmission system for railway locomotives and rolling stock is proposed, where a basic pulse train is transmitted whereof the pulses are of a selected first amplitude and represent a train axle count, and a spike pulse of greater selected amplitude is transmitted, occurring immediately after the axle count pulse to which it relates, whenever an overheated axle box is detected.

...read moreread less

Journal ArticleDOI

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

Jianqing Fan, +1 more

- 01 Dec 2001 -

Journal of the American Statistical Asso...

TL;DR: In this article, penalized likelihood approaches are proposed to handle variable selection problems, and it is shown that the newly proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well if the correct submodel were known.

...read moreread less

Book

Finite Mixture Models

Geoffrey J. McLachlan, +1 more

TL;DR: The important role of finite mixture models in the statistical analysis of data is underscored by the ever-increasing rate at which articles on mixture applications appear in the mathematical and statistical literature.

...read moreread less