scispace - formally typeset
Open AccessJournal ArticleDOI

Pairwise Variable Selection for High-Dimensional Model-Based Clustering

Jian Guo, +3 more
- 01 Sep 2010 - 
- Vol. 66, Iss: 3, pp 793-804
TLDR
A pairwise variable selection method for high‐dimensional model‐based clustering is proposed, based on a new pairwise penalty, that performs better than alternative approaches that use ℓ1 andℓ∞ penalties and offers better interpretation.
Abstract
Variable selection for clustering is an important and challenging problem in high-dimensional data analysis. Existing variable selection methods for model-based clustering select informative variables in a "one-in-all-out" manner; that is, a variable is selected if at least one pair of clusters is separable by this variable and removed if it cannot separate any of the clusters. In many applications, however, it is of interest to further establish exactly which clusters are separable by each informative variable. To address this question, we propose a pairwise variable selection method for high-dimensional model-based clustering. The method is based on a new pairwise penalty. Results on simulated and real data show that the new method performs better than alternative approaches that use l(1) and l(∞) penalties and offers better interpretation.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A Concave Pairwise Fusion Approach to Subgroup Analysis

TL;DR: In this paper, a penalized approach for subgroup analysis based on a regression model is proposed, in which heterogeneity is driven by unobserved latent factors and thus can be represented by using subject-specific intercepts.
Book

Model-Based Clustering and Classification for Data Science

TL;DR: In this paper, the authors frame cluster analysis and classification in terms of statistical models, thus yielding principled estimation, testing and prediction methods, and sound answers to the central questions, such as how many clusters are there? which method should I use? How should I handle outliers.
Journal ArticleDOI

Regularized k-means clustering of high-dimensional data and its asymptotic consistency

TL;DR: In this article, a copy of an article published in the Electronic Journal of Statistics © 2012 Institute of Mathematical Statistics at DOI: 10.1214/12-EJS668.
Journal ArticleDOI

Penalized model-based clustering with unconstrained covariance matrices.

TL;DR: This article proposes a regularized Gaussian mixture model permitting a treatment of general covariance matrices, taking various dependencies into account, and derives an E-M algorithm utilizing the graphical lasso for parameter estimation, achieving better clustering and variable selection.
Journal ArticleDOI

Variable Selection Methods for Model-based Clustering

TL;DR: This review provides a summary of the methods developed for variable selection in model-based clustering and existing R packages implementing the different methods are indicated and illustrated in application to two data analysis examples.
References
More filters

Some methods for classification and analysis of multivariate observations

TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Book

Finding Groups in Data: An Introduction to Cluster Analysis

TL;DR: An electrical signal transmission system, applicable to the transmission of signals from trackside hot box detector equipment for railroad locomotives and rolling stock, wherein a basic pulse train is transmitted whereof the pulses are of a selected first amplitude and represent a train axle count.
BookDOI

Finding Groups in Data

TL;DR: In this article, an electrical signal transmission system for railway locomotives and rolling stock is proposed, where a basic pulse train is transmitted whereof the pulses are of a selected first amplitude and represent a train axle count, and a spike pulse of greater selected amplitude is transmitted, occurring immediately after the axle count pulse to which it relates, whenever an overheated axle box is detected.
Journal ArticleDOI

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

TL;DR: In this article, penalized likelihood approaches are proposed to handle variable selection problems, and it is shown that the newly proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well if the correct submodel were known.
Book

Finite Mixture Models

TL;DR: The important role of finite mixture models in the statistical analysis of data is underscored by the ever-increasing rate at which articles on mixture applications appear in the mathematical and statistical literature.
Related Papers (5)