scispace - formally typeset
Search or ask a question

Showing papers on "Mixture model published in 2011"


Proceedings Article
28 Jun 2011
TL;DR: This paper proposes a new framework for learning from large scale datasets based on iterative learning from small mini-batches by adding the right amount of noise to a standard stochastic gradient optimization algorithm and shows that the iterates will converge to samples from the true posterior distribution as the authors anneal the stepsize.
Abstract: In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic gradient optimization algorithm we show that the iterates will converge to samples from the true posterior distribution as we anneal the stepsize. This seamless transition between optimization and Bayesian posterior sampling provides an inbuilt protection against overfitting. We also propose a practical method for Monte Carlo estimates of posterior statistics which monitors a "sampling threshold" and collects samples after it has been surpassed. We apply the method to three models: a mixture of Gaussians, logistic regression and ICA with natural gradients.

2,080 citations


Journal ArticleDOI
TL;DR: Efficiency figures show that the proposed technique for motion detection outperforms recent and proven state-of-the-art methods in terms of both computation speed and detection rate.
Abstract: This paper presents a technique for motion detection that incorporates several innovative mechanisms. For example, our proposed technique stores, for each pixel, a set of values taken in the past at the same location or in the neighborhood. It then compares this set to the current pixel value in order to determine whether that pixel belongs to the background, and adapts the model by choosing randomly which values to substitute from the background model. This approach differs from those based upon the classical belief that the oldest values should be replaced first. Finally, when the pixel is found to be part of the background, its value is propagated into the background model of a neighboring pixel. We describe our method in full details (including pseudo-code and the parameter values used) and compare it to other background subtraction techniques. Efficiency figures show that our method outperforms recent and proven state-of-the-art methods in terms of both computation speed and detection rate. We also analyze the performance of a downscaled version of our algorithm to the absolute minimum of one comparison and one byte of memory per pixel. It appears that even such a simplified version of our algorithm performs better than mainstream techniques.

1,777 citations


Proceedings ArticleDOI
06 Nov 2011
TL;DR: A generic framework which allows for whole image restoration using any patch based prior for which a MAP (or approximate MAP) estimate can be calculated is proposed and a generic, surprisingly simple Gaussian Mixture prior is presented, learned from a set of natural images.
Abstract: Learning good image priors is of utmost importance for the study of vision, computer vision and image processing applications. Learning priors and optimizing over whole images can lead to tremendous computational challenges. In contrast, when we work with small image patches, it is possible to learn priors and perform patch restoration very efficiently. This raises three questions - do priors that give high likelihood to the data also lead to good performance in restoration? Can we use such patch based priors to restore a full image? Can we learn better patch priors? In this work we answer these questions. We compare the likelihood of several patch models and show that priors that give high likelihood to data perform better in patch restoration. Motivated by this result, we propose a generic framework which allows for whole image restoration using any patch based prior for which a MAP (or approximate MAP) estimate can be calculated. We show how to derive an appropriate cost function, how to optimize it and how to use it to restore whole images. Finally, we present a generic, surprisingly simple Gaussian Mixture prior, learned from a set of natural images. When used with the proposed framework, this Gaussian Mixture Model outperforms all other generic prior methods for image denoising, deblurring and inpainting.

1,552 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: A general, flexible mixture model for capturing contextual co-occurrence relations between parts, augmenting standard spring models that encode spatial relations, and it is shown that such relations can capture notions of local rigidity.
Abstract: We describe a method for human pose estimation in static images based on a novel representation of part models. Notably, we do not use articulated limb parts, but rather capture orientation with a mixture of templates for each part. We describe a general, flexible mixture model for capturing contextual co-occurrence relations between parts, augmenting standard spring models that encode spatial relations. We show that such relations can capture notions of local rigidity. When co-occurrence and spatial relations are tree-structured, our model can be efficiently optimized with dynamic programming. We present experimental results on standard benchmarks for pose estimation that indicate our approach is the state-of-the-art system for pose estimation, outperforming past work by 50% while being orders of magnitude faster.

1,194 citations


Journal ArticleDOI
TL;DR: poLCA is a software package for the estimation of latent class and latent class regression models for polytomous outcome variables, implemented in the R statistical computing environment using expectation-maximization and Newton-Raphson algorithms to find maximum likelihood estimates of the model parameters.
Abstract: poLCA is a software package for the estimation of latent class and latent class regression models for polytomous outcome variables, implemented in the R statistical computing environment. Both models can be called using a single simple command line. The basic latent class model is a finite mixture model in which the component distributions are assumed to be multi-way cross-classification tables with all variables mutually independent. The latent class regression model further enables the researcher to estimate the effects of covariates on predicting latent class membership. poLCA uses expectation-maximization and Newton-Raphson algorithms to find maximum likelihood estimates of the model parameters.

991 citations


Journal ArticleDOI
TL;DR: This paper presents a unified framework for the rigid and nonrigid point set registration problem in the presence of significant amounts of noise and outliers, and shows that the popular iterative closest point (ICP) method and several existing point setRegistration methods in the field are closely related and can be reinterpreted meaningfully in this general framework.
Abstract: In this paper, we present a unified framework for the rigid and nonrigid point set registration problem in the presence of significant amounts of noise and outliers. The key idea of this registration framework is to represent the input point sets using Gaussian mixture models. Then, the problem of point set registration is reformulated as the problem of aligning two Gaussian mixtures such that a statistical discrepancy measure between the two corresponding mixtures is minimized. We show that the popular iterative closest point (ICP) method and several existing point set registration methods in the field are closely related and can be reinterpreted meaningfully in our general framework. Our instantiation of this general framework is based on the the L2 distance between two Gaussian mixtures, which has the closed-form expression and in turn leads to a computationally efficient registration algorithm. The resulting registration algorithm exhibits inherent statistical robustness, has an intuitive interpretation, and is simple to implement. We also provide theoretical and experimental comparisons with other robust methods for point set registration.

909 citations


Journal ArticleDOI
TL;DR: This review paper summarizes the available methods and results of endmember variability reduction in Spectral Mixture Analysis (SMA), drawing attention to the high complementarities between the different techniques and suggesting that an integrated approach is necessary to effectively address endmember heterogeneity issues in SMA.

571 citations


Journal ArticleDOI
TL;DR: A more efficient version of the slice sampler for Dirichlet process mixture models described by Walker allows for the fitting of infinite mixture models with a wide-range of prior specifications and considers priors defined through infinite sequences of independent positive random variables.
Abstract: We propose a more efficient version of the slice sampler for Dirichlet process mixture models described by Walker (Commun. Stat., Simul. Comput. 36:45---54, 2007). This new sampler allows for the fitting of infinite mixture models with a wide-range of prior specifications. To illustrate this flexibility we consider priors defined through infinite sequences of independent positive random variables. Two applications are considered: density estimation using mixture models and hazard function estimation. In each case we show how the slice efficient sampler can be applied to make inference in the models. In the mixture case, two submodels are studied in detail. The first one assumes that the positive random variables are Gamma distributed and the second assumes that they are inverse-Gaussian distributed. Both priors have two hyperparameters and we consider their effect on the prior distribution of the number of occupied clusters in a sample. Extensive computational comparisons with alternative "conditional" simulation techniques for mixture models using the standard Dirichlet process prior and our new priors are made. The properties of the new priors are illustrated on a density estimation problem.

371 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: It is shown that an efficient model of color appearance in human vision, which contains a principled selection of parameters as well as an innate spatial pooling mechanism, can be generalized to obtain a saliency model that outperforms state-of-the-art models.
Abstract: Many successful models for predicting attention in a scene involve three main steps: convolution with a set of filters, a center-surround mechanism and spatial pooling to construct a saliency map. However, integrating spatial information and justifying the choice of various parameter values remain open problems. In this paper we show that an efficient model of color appearance in human vision, which contains a principled selection of parameters as well as an innate spatial pooling mechanism, can be generalized to obtain a saliency model that outperforms state-of-the-art models. Scale integration is achieved by an inverse wavelet transform over the set of scale-weighted center-surround responses. The scale-weighting function (termed ECSF) has been optimized to better replicate psychophysical data on color appearance, and the appropriate sizes of the center-surround inhibition windows have been determined by training a Gaussian Mixture Model on eye-fixation data, thus avoiding ad-hoc parameter selection. Additionally, we conclude that the extension of a color appearance model to saliency estimation adds to the evidence for a common low-level visual front-end for different visual tasks.

362 citations


Book
19 Mar 2011
TL;DR: This introduction to the expectation–maximization (EM) algorithm provides an intuitive and mathematically rigorous understanding of EM.
Abstract: This introduction to the expectation–maximization (EM) algorithm provides an intuitive and mathematically rigorous understanding of EM. Two of the most popular applications of EM are described in detail: estimating Gaussian mixture models (GMMs), and estimating hidden Markov models (HMMs). EM solutions are also derived for learning an optimal mixture of fixed models, for estimating the parameters of a compound Dirichlet distribution, and for dis-entangling superimposed signals. Practical issues that arise in the use of EM are discussed, as well as variants of the algorithm that help deal with these challenges.

314 citations


Journal ArticleDOI
TL;DR: The distance dependent Chinese restaurant process (DCP) as discussed by the authors is a flexible class of distributions over partitions that allows for dependencies between the elements, which can be used to model many kinds of dependencies between data in infinite clustering models including dependencies arising from time, space and network connectivity.
Abstract: We develop the distance dependent Chinese restaurant process, a flexible class of distributions over partitions that allows for dependencies between the elements This class can be used to model many kinds of dependencies between data in infinite clustering models, including dependencies arising from time, space, and network connectivity We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both fully observed and latent mixture settings We study its empirical performance with three text corpora We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data and network data We also show that the distance dependent CRP representation of the traditional CRP mixture leads to a faster-mixing Gibbs sampling algorithm than the one based on the original formulation

Journal ArticleDOI
TL;DR: A new approach to speech recognition, in which all Hidden Markov Model states share the same Gaussian Mixture Model (GMM) structure with the same number of Gaussians in each state, appears to give better results than a conventional model.

Journal ArticleDOI
TL;DR: In this article, the posterior distribution of a mixture model is studied in the presence of overfitting, where the number of components in the mixture is larger than the true numbers of components, a situation referred to as an overfitted mixture.
Abstract: Summary. We study the asymptotic behaviour of the posterior distribution in a mixture model when the number of components in the mixture is larger than the true number of components: a situation which is commonly referred to as an overfitted mixture. We prove in particular that quite generally the posterior distribution has a stable and interesting behaviour, since it tends to empty the extra components. This stability is achieved under some restriction on the prior, which can be used as a guideline for choosing the prior. Some simulations are presented to illustrate this behaviour.

Proceedings ArticleDOI
20 Jun 2011
TL;DR: A new spatio-temporal context distribution feature of interest points for human action recognition, and a new learning algorithm, called Multiple Kernel Learning with Augmented Features (AFMKL), to learn an adapted classifier based on multiple kernels and the pre-learned classifiers of other action classes.
Abstract: We first propose a new spatio-temporal context distribution feature of interest points for human action recognition. Each action video is expressed as a set of relative XYT coordinates between pairwise interest points in a local region. We learn a global GMM (referred to as Universal Background Model, UBM) using the relative coordinate features from all the training videos, and then represent each video as the normalized parameters of a video-specific GMM adapted from the global GMM. In order to capture the spatio-temporal relationships at different levels, multiple GMMs are utilized to describe the context distributions of interest points over multi-scale local regions. To describe the appearance information of an action video, we also propose to use GMM to characterize the distribution of local appearance features from the cuboids centered around the interest points. Accordingly, an action video can be represented by two types of distribution features: 1) multiple GMM distributions of spatio-temporal context; 2) GMM distribution of local video appearance. To effectively fuse these two types of heterogeneous and complementary distribution features, we additionally propose a new learning algorithm, called Multiple Kernel Learning with Augmented Features (AFMKL), to learn an adapted classifier based on multiple kernels and the pre-learned classifiers of other action classes. Extensive experiments on KTH, multi-view IXMAS and complex UCF sports datasets demonstrate that our method generally achieves higher recognition accuracy than other state-of-the-art methods.

Journal ArticleDOI
Jianbo Yu1
TL;DR: The experimental results indicate potential applications of LPP-based FE and Gaussian mixture model (GMM)-based negative log likelihood probability (NLLP) as effective tools for bearing performance degradation assessment.

Journal ArticleDOI
TL;DR: An innovative EM-like algorithm, namely, the Expectation Conditional Maximization for Point Registration (ECMPR) algorithm, is introduced, which allows the use of general covariance matrices for the mixture model components and improves over the isotropic covariance case.
Abstract: This paper addresses the issue of matching rigid and articulated shapes through probabilistic point registration. The problem is recast into a missing data framework where unknown correspondences are handled via mixture models. Adopting a maximum likelihood principle, we introduce an innovative EM-like algorithm, namely, the Expectation Conditional Maximization for Point Registration (ECMPR) algorithm. The algorithm allows the use of general covariance matrices for the mixture model components and improves over the isotropic covariance case. We analyze in detail the associated consequences in terms of estimation of the registration parameters, and propose an optimal method for estimating the rotational and translational parameters based on semidefinite positive relaxation. We extend rigid registration to articulated registration. Robustness is ensured by detecting and rejecting outliers through the addition of a uniform component to the Gaussian mixture model at hand. We provide an in-depth analysis of our method and compare it both theoretically and experimentally with other robust methods for point registration.

Journal ArticleDOI
TL;DR: In this paper, a Markov normal mixture model with two hidden layers is proposed for time series, which is a generalization of the Markov Normal Mixture model in which the mixture components are themselves normal mixtures.
Abstract: Motivated by the common problem of constructing predictive distributions for daily asset returns over horizons of one to several trading days, this article introduces a new model for time series. This model is a generalization of the Markov normal mixture model in which the mixture components are themselves normal mixtures, and it is a specific case of an artificial neural network model with two hidden layers. The article uses the model to construct predictive distributions of daily S&P 500 returns 1971–2005 and one-year maturity bond returns 1987–2007. For these time series the model compares favorably with ARCH and stochastic volatility models. The article concludes by using the model to form predictive distributions of one- to ten-day returns during volatile episodes for the S&P 500 and bond return series. Copyright © 2010 John Wiley & Sons, Ltd.

Proceedings ArticleDOI
22 May 2011
TL;DR: This work proposes a context-dependent DBN-HMM system that dramatically outperforms strong Gaussian mixture model (GMM)-HMM baselines on a challenging, large vocabulary, spontaneous speech recognition dataset from the Bing mobile voice search task.
Abstract: The context-independent deep belief network (DBN) hidden Markov model (HMM) hybrid architecture has recently achieved promising results for phone recognition. In this work, we propose a context-dependent DBN-HMM system that dramatically outperforms strong Gaussian mixture model (GMM)-HMM baselines on a challenging, large vocabulary, spontaneous speech recognition dataset from the Bing mobile voice search task. Our system achieves absolute sentence accuracy improvements of 5.8% and 9.2% over GMM-HMMs trained using the minimum phone error rate (MPE) and maximum likelihood (ML) criteria, respectively, which translate to relative error reductions of 16.0% and 23.2%.

Journal ArticleDOI
TL;DR: This paper introduces a regularized probabilistic model based on manifold structure for data clustering, called Laplacian regularized Gaussian Mixture Model (LapGMM), which is modeled by a nearest neighbor graph, and the graph structure is incorporated in the maximum likelihood objective function.
Abstract: Gaussian Mixture Models (GMMs) are among the most statistically mature methods for clustering. Each cluster is represented by a Gaussian distribution. The clustering process thereby turns to estimate the parameters of the Gaussian mixture, usually by the Expectation-Maximization algorithm. In this paper, we consider the case where the probability distribution that generates the data is supported on a submanifold of the ambient space. It is natural to assume that if two points are close in the intrinsic geometry of the probability distribution, then their conditional probability distributions are similar. Specifically, we introduce a regularized probabilistic model based on manifold structure for data clustering, called Laplacian regularized Gaussian Mixture Model (LapGMM). The data manifold is modeled by a nearest neighbor graph, and the graph structure is incorporated in the maximum likelihood objective function. As a result, the obtained conditional probability distribution varies smoothly along the geodesics of the data manifold. Experimental results on real data sets demonstrate the effectiveness of the proposed approach.

Journal ArticleDOI
TL;DR: An approximation to the prior/posterior distribution of the parameters in the beta distribution is introduced and an analytically tractable (closed form) Bayesian approach to the parameter estimation is proposed.
Abstract: Bayesian estimation of the parameters in beta mixture models (BMM) is analytically intractable. The numerical solutions to simulate the posterior distribution are available, but incur high computational cost. In this paper, we introduce an approximation to the prior/posterior distribution of the parameters in the beta distribution and propose an analytically tractable (closed form) Bayesian approach to the parameter estimation. The approach is based on the variational inference (VI) framework. Following the principles of the VI framework and utilizing the relative convexity bound, the extended factorized approximation method is applied to approximate the distribution of the parameters in BMM. In a fully Bayesian model where all of the parameters of the BMM are considered as variables and assigned proper distributions, our approach can asymptotically find the optimal estimate of the parameters posterior distribution. Also, the model complexity can be determined based on the data. The closed-form solution is proposed so that no iterative numerical calculation is required. Meanwhile, our approach avoids the drawback of overfitting in the conventional expectation maximization algorithm. The good performance of this approach is verified by experiments with both synthetic and real data.

Journal ArticleDOI
TL;DR: It is shown that if the variance of the Gaussian noise is small in a certain sense, then the homology can be learned with high confidence by an algorithm that has a weak (linear) dependence on the ambient dimension.
Abstract: In this paper, we take a topological view of unsupervised learning. From this point of view, clustering may be interpreted as trying to find the number of connected components of any underlying geometrically structured probability distribution in a certain sense that we will make precise. We construct a geometrically structured probability distribution that seems appropriate for modeling data in very high dimensions. A special case of our construction is the mixture of Gaussians where there is Gaussian noise concentrated around a finite set of points (the means). More generally we consider Gaussian noise concentrated around a low dimensional manifold and discuss how to recover the homology of this underlying geometric core from data that do not lie on it. We show that if the variance of the Gaussian noise is small in a certain sense, then the homology can be learned with high confidence by an algorithm that has a weak (linear) dependence on the ambient dimension. Our algorithm has a natural interpretation as a spectral learning algorithm using a combinatorial Laplacian of a suitable data-derived simplicial complex.

Journal ArticleDOI
Yingli Tian1, Rogerio Feris1, Haowei Liu2, Arun Hampapur1, Ming-Ting Sun2 
01 Sep 2011
TL;DR: A new framework to robustly and efficiently detect abandoned and removed objects based on background subtraction (BGS) and foreground analysis with complement of tracking to reduce false positives is presented.
Abstract: Tracking-based approaches for abandoned object detection often become unreliable in complex surveillance videos due to occlusions, lighting changes, and other factors. We present a new framework to robustly and efficiently detect abandoned and removed objects based on background subtraction (BGS) and foreground analysis with complement of tracking to reduce false positives. In our system, the background is modeled by three Gaussian mixtures. In order to handle complex situations, several improvements are implemented for shadow removal, quick-lighting change adaptation, fragment reduction, and keeping a stable update rate for video streams with different frame rates. Then, the same Gaussian mixture models used for BGS are employed to detect static foreground regions without extra computation cost. Furthermore, the types of the static regions (abandoned or removed) are determined by using a method that exploits context information about the foreground masks, which significantly outperforms previous edge-based techniques. Based on the type of the static regions and user-defined parameters (e.g., object size and abandoned time), a matching method is proposed to detect abandoned and removed objects. A person-detection process is also integrated to distinguish static objects from stationary people. The robustness and efficiency of the proposed method is tested on IBM Smart Surveillance Solutions for public safety applications in big cities and evaluated by several public databases, such as The Image library for intelligent detection systems (i-LIDS) and IEEE Performance Evaluation of Tracking and Surveillance Workshop (PETS) 2006 datasets. The test and evaluation demonstrate our method is efficient to run in real-time, while being robust to quick-lighting changes and occlusions in complex environments.

Journal ArticleDOI
TL;DR: In this article, the authors proposed an online kernel density estimation (KDE) method, which maintains and updates a non-parametric model of the observed data, from which the KDE can be calculated.

Proceedings Article
12 Dec 2011
TL;DR: It is proved that a weighted set of O(dk3/e2) data points suffices for computing a (1 + e)-approximation for the optimal model on the original n data points, which guarantees that models fitting the coreset will also provide a good fit for the original data set.
Abstract: How can we train a statistical mixture model on a massive data set? In this paper, we show how to construct coresets for mixtures of Gaussians and natural generalizations. A coreset is a weighted subset of the data, which guarantees that models fitting the coreset will also provide a good fit for the original data set. We show that, perhaps surprisingly, Gaussian mixtures admit coresets of size independent of the size of the data set. More precisely, we prove that a weighted set of O(dk3/e2) data points suffices for computing a (1 + e)-approximation for the optimal model on the original n data points. Moreover, such coresets can be efficiently constructed in a map-reduce style computation, as well as in a streaming setting. Our results rely on a novel reduction of statistical estimation to problems in computational geometry, as well as new complexity results about mixtures of Gaussians. We empirically evaluate our algorithms on several real data sets, including a density estimation problem in the context of earthquake detection using accelerometers in mobile phones.

Proceedings ArticleDOI
10 Apr 2011
TL;DR: Based on device-dependent channel-invariant radio-metrics, a non-parametric Bayesian method to detect the number of devices as well as classify multiple devices in a unsupervised passive manner is proposed.
Abstract: Each wireless device has its unique fingerprint, which can be utilized for device identification and intrusion detection. Most existing literature employs supervised learning techniques and assumes the number of devices is known. In this paper, based on device-dependent channel-invariant radio-metrics, we propose a non-parametric Bayesian method to detect the number of devices as well as classify multiple devices in a unsupervised passive manner. Specifically, the infinite Gaussian mixture model is used and a modified collapsed Gibbs sampling method is proposed. Sybil attacks and Masquerade attacks are investigated. We have proven the effectiveness of the proposed method by both simulation data and experimental measurements obtained by USRP2 and Zigbee devices.

Proceedings ArticleDOI
06 Nov 2011
TL;DR: This is the first work showing that RBMs can be trained with almost no hyperparameter tuning to provide classification performance similar to or significantly better than mixture models (e.g., Gaussian mixture models).
Abstract: Informative image representations are important in achieving state-of-the-art performance in object recognition tasks. Among feature learning algorithms that are used to develop image representations, restricted Boltzmann machines (RBMs) have good expressive power and build effective representations. However, the difficulty of training RBMs has been a barrier to their wide use. To address this difficulty, we show the connections between mixture models and RBMs and present an efficient training method for RBMs that utilize these connections. To the best of our knowledge, this is the first work showing that RBMs can be trained with almost no hyperparameter tuning to provide classification performance similar to or significantly better than mixture models (e.g., Gaussian mixture models). Along with this efficient training, we evaluate the importance of convolutional training that can capture a larger spatial context with less redundancy, as compared to non-convolutional training. Overall, our method achieves state-of-the-art performance on both Caltech 101 / 256 datasets using a single type of feature.

Proceedings ArticleDOI
06 Dec 2011
TL;DR: This work proposes a 'learning-based' approach, WiGEM, where the received signal strength is modeled as a Gaussian Mixture Model (GMM) where Expectation Maximization (EM) is used to learn the maximum likelihood estimates of the model parameters.
Abstract: We consider the problem of localizing a wireless client in an indoor environment based on the signal strength of its transmitted packets as received on stationary sniffers or access points. Several state-of-the-art indoor localization techniques have the drawback that they rely extensively on a labor-intensive 'training' phase that does not scale well. Use of unmodeled hardware with heterogeneous power levels further reduces the accuracy of these techniques.We propose a 'learning-based' approach, WiGEM, where the received signal strength is modeled as a Gaussian Mixture Model (GMM). Expectation Maximization (EM) is used to learn the maximum likelihood estimates of the model parameters. This approach enables us to localize a transmitting device based on the maximum a posteriori estimate. The key insight is to use the physics of wireless propagation, and exploit the signal strength constraints that exist for different transmit power levels. The learning approach not only avoids the labor-intensive training, but also makes the location estimates considerably robust in the face of heterogeneity and various time varying phenomena. We present evaluations on two different indoor testbeds with multiple WiFi devices. We demonstrate that WiGEM's accuracy is at par with or better than state-of-the-art techniques but without requiring any training.

Journal ArticleDOI
TL;DR: A new flexible extreme value mixture model is proposed combining a non-parametric kernel density estimator for the bulk of the distribution with an appropriate tail model to overcome the lack of consistency of likelihood based kernel bandwidth estimators when faced with heavy tailed distributions.

Journal ArticleDOI
TL;DR: The extension of the mixtures of multivariate t-factor analyzers model is described to include constraints on the degrees of freedom, the factor loadings, and the error variance matrices to create a family of six mixture models, including parsimonious models.
Abstract: Model-based clustering typically involves the development of a family of mixture models and the imposition of these models upon data. The best member of the family is then chosen using some criterion and the associated parameter estimates lead to predicted group memberships, or clusterings. This paper describes the extension of the mixtures of multivariate t-factor analyzers model to include constraints on the degrees of freedom, the factor loadings, and the error variance matrices. The result is a family of six mixture models, including parsimonious models. Parameter estimates for this family of models are derived using an alternating expectation-conditional maximization algorithm and convergence is determined based on Aitken's acceleration. Model selection is carried out using the Bayesian information criterion (BIC) and the integrated completed likelihood (ICL). This novel family of mixture models is then applied to simulated and real data where clustering performance meets or exceeds that of established model-based clustering methods. The simulation studies include a comparison of the BIC and the ICL as model selection techniques for this novel family of models. Application to simulated data with larger dimensionality is also explored.

Journal ArticleDOI
TL;DR: The spectral characteristics of emotional signals are used in order to group emotions based on acoustic rather than psychological considerations, and the proposed multiple feature hierarchical method for seven emotions improves the performance over the standard classifiers and the fixed features.