scispace - formally typeset
Search or ask a question

Showing papers on "Mixture model published in 2006"


Book
Christopher M. Bishop1
17 Aug 2006
TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

22,840 citations


Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations


Journal ArticleDOI
TL;DR: This work considers problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups, and considers a hierarchical model, specifically one in which the base measure for the childDirichlet processes is itself distributed according to a Dirichlet process.
Abstract: We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the well-known clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes ...

3,755 citations


Book
08 Aug 2006
TL;DR: This book should help newcomers to the field to understand how finite mixture and Markov switching models are formulated, what structures they imply on the data, what they could be used for, and how they are estimated.
Abstract: WINNER OF THE 2007 DEGROOT PRIZE! The prominence of finite mixture modelling is greater than ever. Many important statistical topics like clustering data, outlier treatment, or dealing with unobserved heterogeneity involve finite mixture models in some way or other. The area of potential applications goes beyond simple data analysis and extends to regression analysis and to non-linear time series analysis using Markov switching models. For more than the hundred years since Karl Pearson showed in 1894 how to estimate the five parameters of a mixture of two normal distributions using the method of moments, statistical inference for finite mixture models has been a challenge to everybody who deals with them. In the past ten years, very powerful computational tools emerged for dealing with these models which combine a Bayesian approach with recent Monte simulation techniques based on Markov chains. This book reviews these techniques and covers the most recent advances in the field, among them bridge sampling techniques and reversible jump Markov chain Monte Carlo methods. It is the first time that the Bayesian perspective of finite mixture modelling is systematically presented in book form. It is argued that the Bayesian approach provides much insight in this context and is easily implemented in practice. Although the main focus is on Bayesian inference, the author reviews several frequentist techniques, especially selecting the number of components of a finite mixture model, and discusses some of their shortcomings compared to the Bayesian approach. The aim of this book is to impart the finite mixture and Markov switching approach to statistical modelling to a wide-ranging community. This includes not only statisticians, but also biologists, economists, engineers, financial agents, market researcher, medical researchers or any other frequent user of statistical models. This book should help newcomers to the field to understand how finite mixture and Markov switching models are formulated, what structures they imply on the data, what they could be used for, and how they are estimated. Researchers familiar with the subject also will profit from reading this book. The presentation is rather informal without abandoning mathematical correctness. Previous notions of Bayesian inference and Monte Carlo simulation are useful but not needed.

1,642 citations


Journal ArticleDOI
TL;DR: This work presents recursive equations that are used to constantly update the parameters of a Gaussian mixture model and to simultaneously select the appropriate number of components for each pixel and presents a simple non-parametric adaptive density estimation method.

1,483 citations


Journal ArticleDOI
TL;DR: A variational inference algorithm forDP mixtures is presented and experiments that compare the algorithm to Gibbs sampling algorithms for DP mixtures of Gaussians and present an application to a large-scale image analysis problem are presented.
Abstract: Dirichlet process (DP) mixture models are the cornerstone of non- parametric Bayesian statistics, and the development of Monte-Carlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of non- parametric Bayesian methods to a variety of practical data analysis problems. However, MCMC sampling can be prohibitively slow, and it is important to ex- plore alternatives. One class of alternatives is provided by variational methods, a class of deterministic algorithms that convert inference problems into optimization problems (Opper and Saad 2001; Wainwright and Jordan 2003). Thus far, varia- tional methods have mainly been explored in the parametric setting, in particular within the formalism of the exponential family (Attias 2000; Ghahramani and Beal 2001; Blei et al. 2003). In this paper, we present a variational inference algorithm for DP mixtures. We present experiments that compare the algorithm to Gibbs sampling algorithms for DP mixtures of Gaussians and present an application to a large-scale image analysis problem.

1,471 citations


Journal ArticleDOI
TL;DR: This work examines the idea of using the GMM supervector in a support vector machine (SVM) classifier and proposes two new SVM kernels based on distance metrics between GMM models that produce excellent classification accuracy in a NIST speaker recognition evaluation task.
Abstract: Gaussian mixture models (GMMs) have proven extremely successful for text-independent speaker recognition. The standard training method for GMM models is to use MAP adaptation of the means of the mixture components based on speech from a target speaker. Recent methods in compensation for speaker and channel variability have proposed the idea of stacking the means of the GMM model to form a GMM mean supervector. We examine the idea of using the GMM supervector in a support vector machine (SVM) classifier. We propose two new SVM kernels based on distance metrics between GMM models. We show that these SVM kernels produce excellent classification accuracy in a NIST speaker recognition evaluation task.

1,081 citations


Journal ArticleDOI
TL;DR: This work introduces a statistical formulation of this problem in terms of a simple mixture model and presents an instantiation of this framework to maximum entropy classifiers and their linear chain counterparts and leads to improved performance on three real world tasks on four different data sets from the natural language processing domain.
Abstract: The most basic assumption used in statistical learning theory is that training data and test data are drawn from the same underlying distribution. Unfortunately, in many applications, the "in-domain" test data is drawn from a distribution that is related, but not identical, to the "out-of-domain" distribution of the training data. We consider the common case in which labeled out-of-domain data is plentiful, but labeled in-domain data is scarce. We introduce a statistical formulation of this problem in terms of a simple mixture model and present an instantiation of this framework to maximum entropy classifiers and their linear chain counterparts. We present efficient inference algorithms for this special case based on the technique of conditional expectation maximization. Our experimental results show that our approach leads to improved performance on three real world tasks on four different data sets from the natural language processing domain.

894 citations


Journal ArticleDOI
TL;DR: Comprehensive experiments on urban vehicular traffic flow data of Beijing and comparisons with several other methods show that the Bayesian network is a very promising and effective approach for traffic flow modeling and forecasting, both for complete data and incomplete data.
Abstract: A new approach based on Bayesian networks for traffic flow forecasting is proposed. In this paper, traffic flows among adjacent road links in a transportation network are modeled as a Bayesian network. The joint probability distribution between the cause nodes (data utilized for forecasting) and the effect node (data to be forecasted) in a constructed Bayesian network is described as a Gaussian mixture model (GMM) whose parameters are estimated via the competitive expectation maximization (CEM) algorithm. Finally, traffic flow forecasting is performed under the criterion of minimum mean square error (mmse). The approach departs from many existing traffic flow forecasting models in that it explicitly includes information from adjacent road links to analyze the trends of the current link statistically. Furthermore, it also encompasses the issue of traffic flow forecasting when incomplete data exist. Comprehensive experiments on urban vehicular traffic flow data of Beijing and comparisons with several other methods show that the Bayesian network is a very promising and effective approach for traffic flow modeling and forecasting, both for complete data and incomplete data

652 citations


Journal ArticleDOI
TL;DR: The present article proposes to employ another method, based on an analogy with statistical physics, called thermodynamic integration, which is applied to the comparison of several alternative models of amino-acid replacement, indicating that modeling pattern heterogeneity across sites tends to yield better models than standard empirical matrices.
Abstract: In the Bayesian paradigm, a common method for comparing two models is to compute the Bayes factor, defined as the ratio of their respective marginal likelihoods. In recent phylogenetic works, the numerical evaluation of marginal likelihoods has often been performed using the harmonic mean estimation procedure. In the present article, we propose to employ another method, based on an analogy with statistical physics, called thermodynamic integration. We describe the method, propose an implementation, and show on two analytical examples that this numerical method yields reliable estimates. In contrast, the harmonic mean estimator leads to a strong overestimation of the marginal likelihood, which is all the more pronounced as the model is higher dimensional. As a result, the harmonic mean estimator systematically favors more parameter-rich models, an artefact that might explain some recent puzzling observations, based on harmonic mean estimates, suggesting that Bayes factors tend to overscore complex models. Finally, we apply our method to the comparison of several alternative models of amino-acid replacement. We confirm our previous observations, indicating that modeling pattern heterogeneity across sites tends to yield better models than standard empirical matrices.

618 citations


Journal ArticleDOI
TL;DR: This work considers the application of SVMs to speaker and language recognition and uses a sequence kernel that compares sequences of feature vectors and produces a measure of similarity to build upon a simpler mean-squared error classifier to produce a more accurate system.

Journal ArticleDOI
TL;DR: Results from an empirical case study and a small Monte Carlo simulation show that failure to thoroughly consider the possible presence of local optima in the estimation of a growth mixture model can sometimes have serious consequences, possibly leading to adoption of an inferior solution that differs in substantively important ways from the actual maximum likelihood solution.
Abstract: Finite mixture models are well known to have poorly behaved likelihood functions featuring singularities and multiple optima. Growth mixture models may suffer from fewer of these problems, potentially benefiting from the structure imposed on the estimated class means and covariances by the specified growth model. As demonstrated here, however, local solutions may still be problematic. Results from an empirical case study and a small Monte Carlo simulation show that failure to thoroughly consider the possible presence of local optima in the estimation of a growth mixture model can sometimes have serious consequences, possibly leading to adoption of an inferior solution that differs in substantively important ways from the actual maximum likelihood solution. Often, the defaults of current software need to be overridden to thoroughly evaluate the parameter space and obtain confidence that the maximum likelihood solution has in fact been obtained.

Book ChapterDOI
07 May 2006
TL;DR: This paper proposes Probabilistic LDA, a generative probability model with which it can both extract the features and combine them for recognition, and shows applications to classification, hypothesis testing, class inference, and clustering.
Abstract: Linear dimensionality reduction methods, such as LDA, are often used in object recognition for feature extraction, but do not address the problem of how to use these features for recognition. In this paper, we propose Probabilistic LDA, a generative probability model with which we can both extract the features and combine them for recognition. The latent variables of PLDA represent both the class of the object and the view of the object within a class. By making examples of the same class share the class variable, we show how to train PLDA and use it for recognition on previously unseen classes. The usual LDA features are derived as a result of training PLDA, but in addition have a probability model attached to them, which automatically gives more weight to the more discriminative features. With PLDA, we can build a model of a previously unseen class from a single example, and can combine multiple examples for a better representation of the class. We show applications to classification, hypothesis testing, class inference, and clustering, on classes not observed during training.

Book ChapterDOI
01 Jan 2006
TL;DR: This chapter provides a tutorial introduction to gradient-based optical flow estimation by discussing least-squares and robust estimators, iterative coarse-to-fine refinement, different forms of parametric motion models, different conservation assumptions, probabilistic formulations, and robust mixture models.
Abstract: This chapter provides a tutorial introduction to gradient-based optical flow estimation. We discuss least-squares and robust estimators, iterative coarse-to-fine refinement, different forms of parametric motion models, different conservation assumptions, probabilistic formulations, and robust mixture models.

Journal ArticleDOI
TL;DR: Two criteria able to find the most convenient division of each class into a set of subclasses are derived and it is shown that this method is always the best or comparable to the best.
Abstract: Over the years, many discriminant analysis (DA) algorithms have been proposed for the study of high-dimensional data in a large variety of problems. Each of these algorithms is tuned to a specific type of data distribution (that which best models the problem at hand). Unfortunately, in most problems the form of each class pdf is a priori unknown, and the selection of the DA algorithm that best fits our data is done over trial-and-error. Ideally, one would like to have a single formulation which can be used for most distribution types. This can be achieved by approximating the underlying distribution of each class with a mixture of Gaussians. In this approach, the major problem to be addressed is that of determining the optimal number of Gaussians per class, i.e., the number of subclasses. In this paper, two criteria able to find the most convenient division of each class into a set of subclasses are derived. Extensive experimental results are shown using five databases. Comparisons are given against linear discriminant analysis (LDA), direct LDA (DLDA), heteroscedastic LDA (HLDA), nonparametric DA (NDA), and kernel-based LDA (K-LDA). We show that our method is always the best or comparable to the best

Journal ArticleDOI
TL;DR: The aims of this paper are to define Gaussian mixture models (GMMs) of colored texture on several feature spaces and to compare the performance of these models in various classification tasks, both with each other and with other models popular in the literature.

ReportDOI
01 Sep 2006
TL;DR: A number of features of the software have been changed in this version, and the functionality has been expanded to include regularization for normal mixture models via a Bayesian prior.
Abstract: : MCLUST is a contributed R package for normal mixture modeling and model-based clustering. It provides functions for parameter estimation via the EM algorithm for normal mixture models with a variety of covariance structures, and functions for simulation from these models. Also included are functions that combine model-based hierarchical clustering, EM for mixture estimation and the Bayesian Information Criterion (BIC) in comprehensive strategies for clustering, density estimation and discriminant analysis. There is additional functionality for displaying and visualizing the models along with clustering and classification results. A number of features of the software have been changed in this version, and the functionality has been expanded to include regularization for normal mixture models via a Bayesian prior. A web page with related links including license information can be found at http://www.stat.washington.edu/mclust.


Journal ArticleDOI
TL;DR: An automated algorithm for tissue segmentation of noisy, low-contrast magnetic resonance (MR) images of the brain is presented and the applicability of the framework can be extended to diseased brains and neonatal brains.
Abstract: An automated algorithm for tissue segmentation of noisy, low-contrast magnetic resonance (MR) images of the brain is presented. A mixture model composed of a large number of Gaussians is used to represent the brain image. Each tissue is represented by a large number of Gaussian components to capture the complex tissue spatial layout. The intensity of a tissue is considered a global feature and is incorporated into the model through tying of all the related Gaussian parameters. The expectation-maximization (EM) algorithm is utilized to learn the parameter-tied, constrained Gaussian mixture model. An elaborate initialization scheme is suggested to link the set of Gaussians per tissue type, such that each Gaussian in the set has similar intensity characteristics with minimal overlapping spatial supports. Segmentation of the brain image is achieved by the affiliation of each voxel to the component of the model that maximized the a posteriori probability. The presented algorithm is used to segment three-dimensional, T1-weighted, simulated and real MR images of the brain into three different tissues, under varying noise conditions. Results are compared with state-of-the-art algorithms in the literature. The algorithm does not use an atlas for initialization or parameter learning. Registration processes are therefore not required and the applicability of the framework can be extended to diseased brains and neonatal brains

Journal ArticleDOI
TL;DR: It is found that new hybrid mixture models are more suitable than latent class and factor (IRT) models and implications for DSM-V are discussed in terms of reporting results using both categories and dimensions.
Abstract: Aims This paper discusses the representation of diagnostic criteria using categorical and dimensional statistical models. Conventional modeling using categorical or continuous latent variables in the form of latent class analysis and factor (IRT) analysis has limitations for the analysis of diagnostic criteria. Methods New hybrid models are discussed which provide both categorical and dimensional representations in the same model using mixture models. Conventional and new models are applied and compared using recent data for Diagnostic and Statistical Manual of Mental Disorders version IV (DSM-IV) alcohol dependence and abuse criteria from the National Epidemiologic Survey on Alcohol and Related Conditions. Classification results from hybrid models are compared to the DSM-IV approach of using the number of diagnostic criteria fulfilled. Results It is found that new hybrid mixture models are more suitable than latent class and factor (IRT) models. Conclusion Implications for DSM-V are discussed in terms of reporting results using both categories and dimensions.

Journal ArticleDOI
TL;DR: This paper focuses on optimizing vector quantization (VQ) based speaker identification, which reduces the number of test vectors by pre-quantizing the test sequence prior to matching, and thenumber of speakers by pruning out unlikely speakers during the identification process.
Abstract: In speaker identification, most of the computation originates from the distance or likelihood computations between the feature vectors of the unknown speaker and the models in the database. The identification time depends on the number of feature vectors, their dimensionality, the complexity of the speaker models and the number of speakers. In this paper, we concentrate on optimizing vector quantization (VQ) based speaker identification. We reduce the number of test vectors by pre-quantizing the test sequence prior to matching, and the number of speakers by pruning out unlikely speakers during the identification process. The best variants are then generalized to Gaussian mixture model (GMM) based modeling. We apply the algorithms also to efficient cohort set search for score normalization in speaker verification. We obtain a speed-up factor of 16:1 in the case of VQ-based modeling with minor degradation in the identification accuracy, and 34:1 in the case of GMM-based modeling. An equal error rate of 7% can be reached in 0.84 s on average when the length of test utterance is 30.4 s.

Proceedings ArticleDOI
06 Aug 2006
TL;DR: A more robust method for pseudo feedback based on statistical language models to integrate the original query with feedback documents in a single probabilistic mixture model and regularize the estimation of the language model parameters in the model so that the information in the feedback documents can be gradually added to the originalquery.
Abstract: Pseudo-relevance feedback has proven to be an effective strategy for improving retrieval accuracy in all retrieval models. However the performance of existing pseudo feedback methods is often affected significantly by some parameters, such as the number of feedback documents to use and the relative weight of original query terms; these parameters generally have to be set by trial-and-error without any guidance. In this paper, we present a more robust method for pseudo feedback based on statistical language models. Our main idea is to integrate the original query with feedback documents in a single probabilistic mixture model and regularize the estimation of the language model parameters in the model so that the information in the feedback documents can be gradually added to the original query. Unlike most existing feedback methods, our new method has no parameter to tune. Experiment results on two representative data sets show that the new method is significantly more robust than a state-of-the-art baseline language modeling approach for feedback with comparable or better retrieval accuracy.

Journal ArticleDOI
TL;DR: The results support the use of a neutral semantic content text in databases for emotional speech synthesis by using "strong", "medium", and "weak" classifications.
Abstract: Emotion is an important element in expressive speech synthesis. Unlike traditional discrete emotion simulations, this paper attempts to synthesize emotional speech by using "strong", "medium", and "weak" classifications. This paper tests different models, a linear modification model (LMM), a Gaussian mixture model (GMM), and a classification and regression tree (CART) model. The linear modification model makes direct modification of sentence F0 contours and syllabic durations from acoustic distributions of emotional speech, such as, F0 topline, F0 baseline, durations, and intensities. Further analysis shows that emotional speech is also related to stress and linguistic information. Unlike the linear modification method, the GMM and CART models try to map the subtle prosody distributions between neutral and emotional speech. While the GMM just uses the features, the CART model integrates linguistic features into the mapping. A pitch target model which is optimized to describe Mandarin F0 contours is also introduced. For all conversion methods, a deviation of perceived expressiveness (DPE) measure is created to evaluate the expressiveness of the output speech. The results show that the LMM gives the worst results among the three methods. The GMM method is more suitable for a small training set, while the CART method gives the better emotional speech output if trained with a large context-balanced corpus. The methods discussed in this paper indicate ways to generate emotional speech in speech synthesis. The objective and subjective evaluation processes are also analyzed. These results support the use of a neutral semantic content text in databases for emotional speech synthesis

Proceedings Article
04 Dec 2006
TL;DR: This work proposes a learning algorithm based on the goal of margin maximization in continuous density hidden Markov models for automatic speech recognition (ASR) using Gaussian mixture models, and obtains competitive results for phonetic recognition on the TIMIT speech corpus.
Abstract: We study the problem of parameter estimation in continuous density hidden Markov models (CD-HMMs) for automatic speech recognition (ASR). As in support vector machines, we propose a learning algorithm based on the goal of margin maximization. Unlike earlier work on max-margin Markov networks, our approach is specifically geared to the modeling of real-valued observations (such as acoustic feature vectors) using Gaussian mixture models. Unlike previous discriminative frameworks for ASR, such as maximum mutual information and minimum classification error, our framework leads to a convex optimization, without any spurious local minima. The objective function for large margin training of CD-HMMs is defined over a parameter space of positive semidefinite matrices. Its optimization can be performed efficiently with simple gradient-based methods that scale well to large problems. We obtain competitive results for phonetic recognition on the TIMIT speech corpus.

Journal ArticleDOI
TL;DR: A novel time-domain feature extraction method is proposed for the GMM system to allow better detection of short-duration movements and more adaptable to a specific person or device.
Abstract: Accelerometry shows promise in providing an inexpensive but effective means of long-term ambulatory monitoring of elderly patients. The accurate classification of everyday movements should allow such a monitoring system to exhibit greater 'intelligence', improving its ability to detect and predict falls by forming a more specific picture of the activities of a person and thereby allowing more accurate tracking of the health parameters associated with those activities. With this in mind, this study aims to develop more robust and effective methods for the classification of postures and motions from data obtained using a single, waist-mounted, triaxial accelerometer; in particular, aiming to improve the flexibility and generality of the monitoring system, making it better able to detect and identify short-duration movements and more adaptable to a specific person or device. Two movement classification methods were investigated: a rule-based Heuristic system and a Gaussian mixture model (GMM)-based system. A novel time-domain feature extraction method is proposed for the GMM system to allow better detection of short-duration movements. A method for adapting the GMMs to compensate for the problem of limited user-specific training data is also proposed and investigated. Classification performance was considered in relation to data gathered in an unsupervised, directed routine conducted in a three-month field trial involving six elderly subjects. The GMM system was found to achieve a mean accuracy of 91.3%, distinguishing between three postures (sitting, standing and lying) and five movements (sit-to-stand, stand-to-sit, lie-to-stand, stand-to-lie and walking), compared to 71.1% achieved by the Heuristic system. The adaptation method was found to offer a mean accuracy of 92.2%; a relative improvement of 20.2% over tests without subject-specific data and 4.5% over tests using only a limited amount of subject-specific data. While limited to a restricted subset of possible motions and postures, these results provide a significant step in the search for a more robust and accurate ambulatory classification system.

Journal ArticleDOI
TL;DR: This paper addresses the problem of audio source separation with one single sensor, using a statistical model of the sources, based on a learning step from samples of each source separately, during which Gaussian scaled mixture models (GSMM) are trained.
Abstract: In this paper, we address the problem of audio source separation with one single sensor, using a statistical model of the sources. The approach is based on a learning step from samples of each source separately, during which we train Gaussian scaled mixture models (GSMM). During the separation step, we derive maximum a posteriori (MAP) and/or posterior mean (PM) estimates of the sources, given the observed audio mixture (Bayesian framework). From the experimental point of view, we test and evaluate the method on real audio examples.

Journal ArticleDOI
TL;DR: A Bayesian method for mixture model training that simultaneously treats the feature selection and the model selection problem and can simultaneously optimize over the number of components, the saliency of the features, and the parameters of the mixture model is presented.
Abstract: We present a Bayesian method for mixture model training that simultaneously treats the feature selection and the model selection problem. The method is based on the integration of a mixture model formulation that takes into account the saliency of the features and a Bayesian approach to mixture learning that can be used to estimate the number of mixture components. The proposed learning algorithm follows the variational framework and can simultaneously optimize over the number of components, the saliency of the features, and the parameters of the mixture model. Experimental results using high-dimensional artificial and real data illustrate the effectiveness of the method.

Journal ArticleDOI
TL;DR: The results show the improved sound quality obtained with the Student t prior and the better robustness to mixing matrices close to singularity of the Markov chain Monte Carlo approach.
Abstract: We present a Bayesian approach for blind separation of linear instantaneous mixtures of sources having a sparse representation in a given basis. The distributions of the coefficients of the sources in the basis are modeled by a Student t distribution, which can be expressed as a scale mixture of Gaussians, and a Gibbs sampler is derived to estimate the sources, the mixing matrix, the input noise variance and also the hyperparameters of the Student t distributions. The method allows for separation of underdetermined (more sources than sensors) noisy mixtures. Results are presented with audio signals using a modified discrete cosine transform basis and compared with a finite mixture of Gaussians prior approach. These results show the improved sound quality obtained with the Student t prior and the better robustness to mixing matrices close to singularity of the Markov chain Monte Carlo approach

Journal ArticleDOI
TL;DR: The Mixture Modeling (MIXMOD) program fits mixture models to a given data set for the purposes of density estimation, clustering or discriminant analysis, and fourteen different Gaussian models can be distinguished according to different assumptions regarding the component variance matrix eigenvalue decomposition.

Proceedings ArticleDOI
14 May 2006
TL;DR: A framework for large margin classification by Gaussian mixture models (GMMs), which have many parallels to support vector machines (SVMs) but use ellipsoids to model classes instead of half-spaces is developed.
Abstract: We develop a framework for large margin classification by Gaussian mixture models (GMMs). Large margin GMMs have many parallels to support vector machines (SVMs) but use ellipsoids to model classes instead of half-spaces. Model parameters are trained discriminatively to maximize the margin of correct classification, as measured in terms of Mahalanobis distances. The required optimization is convex over the model's parameter space of positive semidefinite matrices and can be performed efficiently. Large margin GMMs are naturally suited to large problems in multiway classification; we apply them to phonetic classification and recognition on the TIMIT database. On both tasks, we obtain significant improvement over baseline systems trained by maximum likelihood estimation. For the problem of phonetic classification, our results are competitive with other state-of-the-art classifiers, such as hidden conditional random fields.