scispace - formally typeset
Open AccessJournal ArticleDOI

Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models

TLDR
Model-based clustering using a family of Gaussian mixture models, with parsimonious factor analysis-like covariance structure, is described and an ecient algorithm for its implementation is presented, showing its eectiveness when compared to existing software.
About
This article is published in Computational Statistics & Data Analysis.The article was published on 2010-03-01 and is currently open access. It has received 128 citations till now. The article focuses on the topics: Parallel algorithm & Mixture model.

read more

Citations
More filters
Journal ArticleDOI

Model-based clustering of microarray expression data via latent Gaussian mixture models

TL;DR: This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models, which gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data.
Journal ArticleDOI

Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions

TL;DR: A novel family of mixture models wherein each component is modeled using a multivariate t-distribution with an eigen-decomposed covariance structure is put forth, known as the tEIGEN family.
Journal ArticleDOI

Mixtures of Shifted AsymmetricLaplace Distributions

TL;DR: This work marks an important step in the non-Gaussian model-based clustering and classification direction, and a variant of the EM algorithm is developed for parameter estimation by exploiting the relationship with the generalized inverse Gaussian distribution.
Journal ArticleDOI

Extending mixtures of multivariate t-factor analyzers

TL;DR: The extension of the mixtures of multivariate t-factor analyzers model is described to include constraints on the degrees of freedom, the factor loadings, and the error variance matrices to create a family of six mixture models, including parsimonious models.
Journal ArticleDOI

A mixture of generalized hyperbolic distributions

TL;DR: The authors introduce a mixture of generalized hyperbolic distributions as an alternative to the ubiquitous mixture of Gaussian distributions as well as their near relatives within which the mixture of multivariate t-distributions and the mixtures of skew-t distributions predominate.
References
More filters
Journal ArticleDOI

Estimating the Dimension of a Model

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Book

Finite Mixture Models

TL;DR: The important role of finite mixture models in the statistical analysis of data is underscored by the ever-increasing rate at which articles on mixture applications appear in the mathematical and statistical literature.
Journal ArticleDOI

Objective Criteria for the Evaluation of Clustering Methods

TL;DR: This article proposes several criteria which isolate specific aspects of the performance of a method, such as its retrieval of inherent structure, its sensitivity to resampling and the stability of its results in the light of new data.
Book

The EM algorithm and extensions

TL;DR: The EM Algorithm and Extensions describes the formulation of the EM algorithm, details its methodology, discusses its implementation, and illustrates applications in many statistical contexts, opening the door to the tremendous potential of this remarkably versatile statistical tool.
Frequently Asked Questions (11)
Q1. What are examples of problems that are trivially parallelizable?

Ray tracing in computer graphics, signal processing, brute force attacks in cryptography and gene sequence alignment are all examples of problems that are trivially parallelizable. 

Within a master-slave paradigm, the ideal situation occurs when the speed-up is directly proportional to the number of processors — this is known as linear speed-up. 

Due to the strategy adopted for parallelization it was necessary to write two functions, one for the master and one for the slaves. 

Factor analysis (Spearman 1904) is a data reduction technique in which a p-dimensional real-valued data vector x is modelled using a q-dimensional vector of latent variables u, where q ≪ p. 

The PGMM family was fitted to the data for G ∈ {1, 2, . . . , 5} and q ∈ {1, 2, . . . , 5} by running the software from three random starting values, so that a total of 600 models were fitted. 

The AECM algorithm used for parameter estimation was parallelized within the master-slave paradigm using MPI and the resulting speed-up has been shown to be linear up to a certain point. 

In the E-step, the expected value of the complete-data log-likelihood is computed based on the current estimates of the model parameters and the completedata vector, which is the vector of observed data plus missing data. 

The nature of the problem makes it trivially parallelizable: that is, each triple (M, G, q) can be sent to a different processor and processors can work independently of one another. 

The prospect of parallelizing within-triple is not implemented here because any within-triple parallelization may actually cost time since the saving achieved by sending jobs triple-wise to processors may well be so great as to negate any possible advantage of within-triple parallelization. 

The eight PGMMs were fitted to the data for G ∈ {1, 2, . . . , 6} and q ∈ {1, 2, . . . , 6} and three random starts were used for each model. 

TThese include parallel implementations of algorithms for kernel estimation (Racine 2002), linear models (Kontoghiorghes 2000, Yanev & Kontoghiorghes 2006), partial least squares (Milidiú & Rentera 2005) and regression submodels (Gatu et al. 2007).