Journal ArticleDOI
CHIME: Clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality
T. Tony Cai,Jing Ma,Linjun Zhang +2 more
TLDR
This paper studies clustering of high-dimensional Gaussian mixtures and proposes a procedure, called CHIME, that is based on the EM algorithm and a direct estimation method for the sparse discriminant vector that outperforms the existing methods under a variety of settings.Abstract:
Unsupervised learning is an important problem in statistics and machine learning with a wide range of applications. In this paper, we study clustering of high-dimensional Gaussian mixtures and propose a procedure, called CHIME, that is based on the EM algorithm and a direct estimation method for the sparse discriminant vector. Both theoretical and numerical properties of CHIME are investigated. We establish the optimal rate of convergence for the excess misclustering error and show that CHIME is minimax rate optimal. In addition, the optimality of the proposed estimator of the discriminant vector is also established. Simulation studies show that CHIME outperforms the existing methods under a variety of settings. The proposed CHIME procedure is also illustrated in an analysis of a glioblastoma gene expression data set and shown to have superior performance. Clustering of Gaussian mixtures in the conventional low-dimensional setting is also considered. The technical tools developed for the high-dimensional setting are used to establish the optimality of the clustering procedure that is based on the classical EM algorithm.read more
Citations
More filters
Journal ArticleDOI
Singularity, misspecification and the convergence rate of EM
TL;DR: This work makes use of a careful form of localization in the associated empirical process, and develops a recursive argument to progressively sharpen the statistical rate of the EM algorithm in over-specified settings.
Posted Content
High-dimensional Linear Discriminant Analysis: Optimality, Adaptive Algorithm, and Missing Data
T. Tony Cai,Linjun Zhang +1 more
TL;DR: A data‐driven and tuning‐free classification rule, which is based on an adaptive constrained l1‐minimization approach, is proposed and analysed and it is shown to be simultaneously rate optimal over a collection of parameter spaces.
Journal ArticleDOI
A Novel Brain MRI Image Segmentation Method Using an Improved Multi-View Fuzzy c-Means Clustering Algorithm.
TL;DR: Zhang et al. as discussed by the authors used an improved multiview FCM clustering algorithm (IMV-FCM) to improve the segmentation accuracy of brain MRI images.
Journal ArticleDOI
A general frame for uncertainty propagation under multimodally distributed random variables
TL;DR: A general frame based on a new finite mixture model constructed by derivative lambda probability density function and polynomial chaos expansion method is put forward to efficiently solve the multimodal distribution propagation problem.
References
More filters
Journal ArticleDOI
Maximum likelihood from incomplete data via the EM algorithm
Some methods for classification and analysis of multivariate observations
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Journal ArticleDOI
Pattern Recognition and Machine Learning
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
Journal ArticleDOI
Hierarchical Grouping to Optimize an Objective Function
TL;DR: In this paper, a procedure for forming hierarchical groups of mutually exclusive subsets, each of which has members that are maximally similar with respect to specified characteristics, is suggested for use in large-scale (n > 100) studies when a precise optimal solution for a specified number of groups is not practical.
Book
Pattern Recognition and Machine Learning (Information Science and Statistics)
TL;DR: Looking for competent reading resources?