CHIME: Clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality

doi:10.1214/18-AOS1711

Journal ArticleDOI

CHIME: Clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality

T. Tony Cai, +2 more

- 01 Jun 2019 -

Annals of Statistics

- Vol. 47, Iss: 3, pp 1234-1267

TLDR

This paper studies clustering of high-dimensional Gaussian mixtures and proposes a procedure, called CHIME, that is based on the EM algorithm and a direct estimation method for the sparse discriminant vector that outperforms the existing methods under a variety of settings.

Abstract:

Unsupervised learning is an important problem in statistics and machine learning with a wide range of applications. In this paper, we study clustering of high-dimensional Gaussian mixtures and propose a procedure, called CHIME, that is based on the EM algorithm and a direct estimation method for the sparse discriminant vector. Both theoretical and numerical properties of CHIME are investigated. We establish the optimal rate of convergence for the excess misclustering error and show that CHIME is minimax rate optimal. In addition, the optimality of the proposed estimator of the discriminant vector is also established. Simulation studies show that CHIME outperforms the existing methods under a variety of settings. The proposed CHIME procedure is also illustrated in an analysis of a glioblastoma gene expression data set and shown to have superior performance. Clustering of Gaussian mixtures in the conventional low-dimensional setting is also considered. The technical tools developed for the high-dimensional setting are used to establish the optimality of the clustering procedure that is based on the classical EM algorithm.

CHIME: Clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality

Citations

Finite Mixture Distributions

Singularity, misspecification and the convergence rate of EM

High-dimensional Linear Discriminant Analysis: Optimality, Adaptive Algorithm, and Missing Data

A Novel Brain MRI Image Segmentation Method Using an Improved Multi-View Fuzzy c-Means Clustering Algorithm.

A general frame for uncertainty propagation under multimodally distributed random variables

References

Maximum likelihood from incomplete data via the EM algorithm

Some methods for classification and analysis of multivariate observations

Pattern Recognition and Machine Learning

Hierarchical Grouping to Optimize an Objective Function

Pattern Recognition and Machine Learning (Information Science and Statistics)

Related Papers (5)

Statistical guarantees for the EM algorithm: From population to sample-based analysis

Maximum likelihood from incomplete data via the EM algorithm

On the convergence properties of the em algorithm

Mixture densities, maximum likelihood, and the EM algorithm

Optimal Rate of Convergence for Finite Mixture Models