scispace - formally typeset
Search or ask a question

Showing papers on "Mixture model published in 2010"


Book
23 Apr 2010
TL;DR: This chapter discusses how to improve the accuracy of Maximum Likelihood Analyses by extending EM to Multivariate Data, and the role of First Derivatives in this process.
Abstract: Part 1. An Introduction to Missing Data. 1.1 Introduction. 1.2 Chapter Overview. 1.3 Missing Data Patterns. 1.4 A Conceptual Overview of Missing Data heory. 1.5 A More Formal Description of Missing Data Theory. 1.6 Why Is the Missing Data Mechanism Important? 1.7 How Plausible Is the Missing at Random Mechanism? 1.8 An Inclusive Analysis Strategy. 1.9 Testing the Missing Completely at Random Mechanism. 1.10 Planned Missing Data Designs. 1.11 The Three-Form Design. 1.12 Planned Missing Data for Longitudinal Designs. 1.13 Conducting Power Analyses for Planned Missing Data Designs. 1.14 Data Analysis Example. 1.15 Summary. 1.16 Recommended Readings. Part 2. Traditional Methods for Dealing with Missing Data. 2.1 Chapter Overview. 2.2 An Overview of Deletion Methods. 2.3 Listwise Deletion. 2.4 Pairwise Deletion. 2.5 An Overview of Single Imputation Techniques. 2.6 Arithmetic Mean Imputation. 2.7 Regression Imputation. 2.8 Stochastic Regression Imputation. 2.9 Hot-Deck Imputation. 2.10 Similar Response Pattern Imputation. 2.11 Averaging the Available Items. 2.12 Last Observation Carried Forward. 2.13 An Illustrative Simulation Study. 2.14 Summary. 2.15 Recommended Readings. Part 3. An Introduction to Maximum Likelihood Estimation. 3.1 Chapter Overview. 3.2 The Univariate Normal Distribution. 3.3 The Sample Likelihood. 3.4 The Log-Likelihood. 3.5 Estimating Unknown Parameters. 3.6 The Role of First Derivatives. 3.7 Estimating Standard Errors. 3.8 Maximum Likelihood Estimation with Multivariate Normal Data. 3.9 A Bivariate Analysis Example. 3.10 Iterative Optimization Algorithms. 3.11 Significance Testing Using the Wald Statistic. 3.12 The Likelihood Ratio Test Statistic. 3.13 Should I Use the Wald Test or the Likelihood Ratio Statistic? 3.14 Data Analysis Example 1. 3.15 Data Analysis Example 2. 3.16 Summary. 3.17 Recommended Readings. Part 4. Maximum Likelihood Missing Data Handling. 4.1 Chapter Overview. 4.2 The Missing Data Log-Likelihood. 4.3 How Do the Incomplete Data Records Improve Estimation? 4.4 An Illustrative Computer Simulation Study. 4.5 Estimating Standard Errors with Missing Data. 4.6 Observed Versus Expected Information. 4.7 A Bivariate Analysis Example. 4.8 An Illustrative Computer Simulation Study. 4.9 An Overview of the EM Algorithm. 4.10 A Detailed Description of the EM Algorithm. 4.11 A Bivariate Analysis Example. 4.12 Extending EM to Multivariate Data. 4.13 Maximum Likelihood Software Options. 4.14 Data Analysis Example 1. 4.15 Data Analysis Example 2. 4.16 Data Analysis Example 3. 4.17 Data Analysis Example 4. 4.18 Data Analysis Example 5. 4.19 Summary. 4.20 Recommended Readings. Part 5. Improving the Accuracy of Maximum Likelihood Analyses. 5.1 Chapter Overview. 5.2 The Rationale for an Inclusive Analysis Strategy. 5.3 An Illustrative Computer Simulation Study. 5.4 Identifying a Set of Auxiliary Variables. 5.5 Incorporating Auxiliary Variables Into a Maximum Likelihood Analysis. 5.6 The Saturated Correlates Model. 5.7 The Impact of Non-Normal Data. 5.8 Robust Standard Errors. 5.9 Bootstrap Standard Errors. 5.10 The Rescaled Likelihood Ratio Test. 5.11 Bootstrapping the Likelihood Ratio Statistic. 5.12 Data Analysis Example 1. 5.13 Data Analysis Example 2. 5.14 Data Analysis Example 3. 5.15 Summary. 5.16 Recommended Readings. Part 6. An Introduction to Bayesian Estimation. 6.1 Chapter Overview. 6.2 What Makes Bayesian Statistics Different? 6.3 A Conceptual Overview of Bayesian Estimation. 6.4 Bayes' Theorem. 6.5 An Analysis Example. 6.6 How Does Bayesian Estimation Apply to Multiple Imputation? 6.7 The Posterior Distribution of the Mean. 6.8 The Posterior Distribution of the Variance. 6.9 The Posterior Distribution of a Covariance Matrix. 6.10 Summary. 6.11 Recommended Readings. Part 7. The Imputation Phase of Multiple Imputation. 7.1 Chapter Overview. 7.2 A Conceptual Description of the Imputation Phase. 7.3 A Bayesian Description of the Imputation Phase. 7.4 A Bivariate Analysis Example. 7.5 Data Augmentation with Multivariate Data. 7.6 Selecting Variables for Imputation. 7.7 The Meaning of Convergence. 7.8 Convergence Diagnostics. 7.9 Time-Series Plots. 7.10 Autocorrelation Function Plots. 7.11 Assessing Convergence from Alternate Starting Values. 7.12 Convergence Problems. 7.13 Generating the Final Set of Imputations. 7.14 How Many Data Sets Are Needed? 7.15 Summary. 7.16 Recommended Readings. Part 8. The Analysis and Pooling Phases of Multiple Imputation. 8.1 Chapter Overview. 8.2 The Analysis Phase. 8.3 Combining Parameter Estimates in the Pooling Phase. 8.4 Transforming Parameter Estimates Prior to Combining. 8.5 Pooling Standard Errors. 8.6 The Fraction of Missing Information and the Relative Increase in Variance. 8.7 When Is Multiple Imputation Comparable to Maximum Likelihood? 8.8 An Illustrative Computer Simulation Study. 8.9 Significance Testing Using the t Statistic. 8.10 An Overview of Multiparameter Significance Tests. 8.11 Testing Multiple Parameters Using the D1 Statistic. 8.12 Testing Multiple Parameters by Combining Wald Tests. 8.13 Testing Multiple Parameters by Combining Likelihood Ratio Statistics. 8.14 Data Analysis Example 1. 8.15 Data Analysis Example 2. 8.16 Data Analysis Example 3. 8.17 Summary. 8.18 Recommended Readings. Part 9. Practical Issues in Multiple Imputation. 9.1 Chapter Overview. 9.2 Dealing with Convergence Problems. 9.3 Dealing with Non-Normal Data. 9.4 To Round or Not to Round? 9.5 Preserving Interaction Effects. 9.6 Imputing Multiple-Item Questionnaires. 9.7 Alternate Imputation Algorithms. 9.8 Multiple Imputation Software Options. 9.9 Data Analysis Example 1. 9.10 Data Analysis Example 2. 9.11 Summary. 9.12 Recommended Readings. Part 10. Models for Missing Not at Random Data. 10.1 Chapter Overview. 10.2 An Ad Hoc Approach to Dealing with MNAR Data. 10.3 The Theoretical Rationale for MNAR Models. 10.4 The Classic Selection Model. 10.5 Estimating the Selection Model. 10.6 Limitations of the Selection Model. 10.7 An Illustrative Analysis. 10.8 The Pattern Mixture Model. 10.9 Limitations of the Pattern Mixture Model. 10.10 An Overview of the Longitudinal Growth Model. 10.11 A Longitudinal Selection Model. 10.12 Random Coefficient Selection Models. 10.13 Pattern Mixture Models for Longitudinal Analyses. 10.14 Identification Strategies for Longitudinal Pattern Mixture Models. 10.15 Delta Method Standard Errors. 10.16 Overview of the Data Analysis Examples. 10.17 Data Analysis Example 1. 10.18 Data Analysis Example 2. 10.19 Data Analysis Example 3. 10.20 Data Analysis Example 4. 10.21 Summary. 10.22 Recommended Readings. Part 11. Wrapping Things Up: Some Final Practical Considerations. 11.1 Chapter Overview. 11.2 Maximum Likelihood Software Options. 11.3 Multiple Imputation Software Options. 11.4 Choosing between Maximum Likelihood and Multiple Imputation. 11.5 Reporting the Results from a Missing Data Analysis. 11.6 Final Thoughts. 11.7 Recommended Readings.

3,910 citations


Journal ArticleDOI
TL;DR: A probabilistic method, called the Coherent Point Drift (CPD) algorithm, is introduced for both rigid and nonrigid point set registration and a fast algorithm is introduced that reduces the method computation complexity to linear.
Abstract: Point set registration is a key component in many computer vision tasks. The goal of point set registration is to assign correspondences between two sets of points and to recover the transformation that maps one point set to the other. Multiple factors, including an unknown nonrigid spatial transformation, large dimensionality of point set, noise, and outliers, make the point set registration a challenging problem. We introduce a probabilistic method, called the Coherent Point Drift (CPD) algorithm, for both rigid and nonrigid point set registration. We consider the alignment of two point sets as a probability density estimation problem. We fit the Gaussian mixture model (GMM) centroids (representing the first point set) to the data (the second point set) by maximizing the likelihood. We force the GMM centroids to move coherently as a group to preserve the topological structure of the point sets. In the rigid case, we impose the coherence constraint by reparameterization of GMM centroid locations with rigid parameters and derive a closed form solution of the maximization step of the EM algorithm in arbitrary dimensions. In the nonrigid case, we impose the coherence constraint by regularizing the displacement field and using the variational calculus to derive the optimal transformation. We also introduce a fast algorithm that reduces the method computation complexity to linear. We test the CPD algorithm for both rigid and nonrigid transformations in the presence of noise, outliers, and missing points, where CPD shows accurate results and outperforms current state-of-the-art methods.

2,429 citations


Posted Content
TL;DR: In this article, a general framework for image inverse problems is introduced, based on Gaussian mixture models, estimated via a computationally efficient MAP-EM algorithm, which shows that the resulting piecewise linear estimate stabilizes the estimation when compared to traditional sparse inverse problem techniques.
Abstract: A general framework for solving image inverse problems is introduced in this paper. The approach is based on Gaussian mixture models, estimated via a computationally efficient MAP-EM algorithm. A dual mathematical interpretation of the proposed framework with structured sparse estimation is described, which shows that the resulting piecewise linear estimate stabilizes the estimation when compared to traditional sparse inverse problem techniques. This interpretation also suggests an effective dictionary motivated initialization for the MAP-EM algorithm. We demonstrate that in a number of image inverse problems, including inpainting, zooming, and deblurring, the same algorithm produces either equal, often significantly better, or very small margin worse results than the best published ones, at a lower computational cost.

505 citations


Book ChapterDOI
01 Jan 2010
TL;DR: A statistical model can be called a latent class (LC) or mixture model if it assumes that some of its parameters differ across unobserved subgroups, LCs, or mixture components as mentioned in this paper.
Abstract: A statistical model can be called a latent class (LC) or mixture model if it assumes that some of its parameters differ across unobserved subgroups, LCs, or mixture components. This rather general idea has several seemingly unrelated applications, the most important of which are clustering, scaling, density estimation, and random-effects modeling. This article describes simple LC models for clustering, restricted LC models for scaling, and mixture regression models for nonparametric random-effects modeling, as well as gives an overview of recent developments in the field of LC analysis. Moreover, attention is paid to topics such as maximum likelihood estimation, identification issues, model selection, and software.

431 citations


Journal ArticleDOI
TL;DR: This paper describes a model-based expectation-maximization source separation and localization system for separating and localizing multiple sound sources from an underdetermined reverberant two-channel recording, and creates probabilistic spectrogram masks that can be used for source separation.
Abstract: This paper describes a system, referred to as model-based expectation-maximization source separation and localization (MESSL), for separating and localizing multiple sound sources from an underdetermined reverberant two-channel recording. By clustering individual spectrogram points based on their interaural phase and level differences, MESSL generates masks that can be used to isolate individual sound sources. We first describe a probabilistic model of interaural parameters that can be evaluated at individual spectrogram points. By creating a mixture of these models over sources and delays, the multi-source localization problem is reduced to a collection of single source problems. We derive an expectation-maximization algorithm for computing the maximum-likelihood parameters of this mixture model, and show that these parameters correspond well with interaural parameters measured in isolation. As a byproduct of fitting this mixture model, the algorithm creates probabilistic spectrogram masks that can be used for source separation. In simulated anechoic and reverberant environments, separations using MESSL produced on average a signal-to-distortion ratio 1.6 dB greater and perceptual evaluation of speech quality (PESQ) results 0.27 mean opinion score units greater than four comparable algorithms.

317 citations


Journal ArticleDOI
TL;DR: depmixS4 as discussed by the authors is a general framework for defining and estimating dependent mixture models in the R programming language, including standard Markov models, latent/hidden Markov model, and latent class and finite mixture distribution models.
Abstract: depmixS4 implements a general framework for defining and estimating dependent mixture models in the R programming language. This includes standard Markov models, latent/hidden Markov models, and latent class and finite mixture distribution models. The models can be fitted on mixed multivariate data with distributions from the glm family, the (logistic) multinomial, or the multivariate normal distribution. Other distributions can be added easily, and an example is provided with the exgaus distribution. Parameters are estimated by the expectation-maximization (EM) algorithm or, when (linear) constraints are imposed on the parameters, by direct numerical optimization with the Rsolnp or Rdonlp2 routines.

312 citations


Journal ArticleDOI
TL;DR: This paper proposes first selecting the total number of Gaussian mixture components, K, using BIC and then combining them hierarchically according to an entropy criterion, which yields a unique soft clustering for each number of clusters less than or equal to K.
Abstract: Model-based clustering consists of fitting a mixture model to data and identifying each cluster with one of its components. Multivariate normal distributions are typically used. The number of clusters is usually determined from the data, often using BIC. In practice, however, individual clusters can be poorly fitted by Gaussian distributions, and in that case model-based clustering tends to represent one non-Gaussian cluster by a mixture of two or more Gaussian distributions. If the number of mixture components is interpreted as the number of clusters, this can lead to overestimation of the number of clusters. This is because BIC selects the number of mixture components needed to provide a good approximation to the density, rather than the number of clusters as such. We propose first selecting the total number of Gaussian mixture components, K, using BIC and then combining them hierarchically according to an entropy criterion. This yields a unique soft clustering for each number of clusters less than or e...

299 citations


Journal ArticleDOI
TL;DR: In this article, a probabilistic approach for statistical modeling of the loads in distribution networks is presented, where the probability density functions (pdfs) of loads at different buses show a number of variations and cannot be represented by any specific distribution.
Abstract: This paper presents a probabilistic approach for statistical modeling of the loads in distribution networks. In a distribution network, the probability density functions (pdfs) of loads at different buses show a number of variations and cannot be represented by any specific distribution. The approach presented in this paper represents all the load pdfs through Gaussian mixture model (GMM). The expectation maximization (EM) algorithm is used to obtain the parameters of the mixture components. The performance of the method is demonstrated on a 95-bus generic distribution network model.

292 citations


Journal ArticleDOI
30 Jun 2010-Test
TL;DR: In this paper, a finite mixture of regressions (FMR) model for high-dimensional inhomogeneous data where the number of covariates may be much larger than sample size is considered.
Abstract: We consider a finite mixture of regressions (FMR) model for high-dimensional inhomogeneous data where the number of covariates may be much larger than sample size. We propose an l1-penalized maximum likelihood estimator in an appropriate parameterization. This kind of estimation belongs to a class of problems where optimization and theory for non-convex functions is needed. This distinguishes itself very clearly from high-dimensional estimation with convex loss- or objective functions as, for example, with the Lasso in linear or generalized linear models. Mixture models represent a prime and important example where non-convexity arises.

276 citations


Journal ArticleDOI
TL;DR: A detailed review into mixture models and model-based clustering is provided, for providing a convenient yet formal framework for clustering and classication.
Abstract: Finite mixture models have a long history in statistics, hav- ing been used to model pupulation heterogeneity, generalize distributional assumptions, and lately, for providing a convenient yet formal framework for clustering and classication. This paper provides a detailed review into mixture models and model-based clustering. Recent trends in the area, as well as open problems are also discussed.

263 citations


Proceedings ArticleDOI
25 Jul 2010
TL;DR: This paper proposes an efficient solution by modeling networked data as a mixture model composed of multiple normal communities and a set of randomly generated outliers, and applies the model on both synthetic data and DBLP data sets to demonstrate importance of this concept, as well as the effectiveness and efficiency of the proposed approach.
Abstract: Linked or networked data are ubiquitous in many applications. Examples include web data or hypertext documents connected via hyperlinks, social networks or user profiles connected via friend links, co-authorship and citation information, blog data, movie reviews and so on. In these datasets (called "information networks"), closely related objects that share the same properties or interests form a community. For example, a community in blogsphere could be users mostly interested in cell phone reviews and news. Outlier detection in information networks can reveal important anomalous and interesting behaviors that are not obvious if community information is ignored. An example could be a low-income person being friends with many rich people even though his income is not anomalously low when considered over the entire population. This paper first introduces the concept of community outliers (interesting points or rising stars for a more positive sense), and then shows that well-known baseline approaches without considering links or community information cannot find these community outliers. We propose an efficient solution by modeling networked data as a mixture model composed of multiple normal communities and a set of randomly generated outliers. The probabilistic model characterizes both data and links simultaneously by defining their joint distribution based on hidden Markov random fields (HMRF). Maximizing the data likelihood and the posterior of the model gives the solution to the outlier inference problem. We apply the model on both synthetic data and DBLP data sets, and the results demonstrate importance of this concept, as well as the effectiveness and efficiency of the proposed approach.

Proceedings ArticleDOI
23 Oct 2010
TL;DR: This paper gives the first polynomial time algorithm for proper density estimation for mixtures of k Gaussians that needs no assumptions on the mixture, and proves that such a dependence is necessary.
Abstract: Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate the mixture parameters. We give an algorithm for this problem that has running time and data requirements polynomial in the dimension and the inverse of the desired accuracy, with provably minimal assumptions on the Gaussians. As a simple consequence of our learning algorithm, we we give the first polynomial time algorithm for proper density estimation for mixtures of k Gaussians that needs no assumptions on the mixture. It was open whether proper density estimation was even statistically possible (with no assumptions) given only polynomially many samples, let alone whether it could be computationally efficient. The building blocks of our algorithm are based on the work (Kalai \emph{et al}, STOC 2010) that gives an efficient algorithm for learning mixtures of two Gaussians by considering a series of projections down to one dimension, and applying the method of moments to each univariate projection. A major technical hurdle in the previous work is showing that one can efficiently learn univariate mixtures of two Gaussians. In contrast, because pathological scenarios can arise when considering projections of mixtures of more than two Gaussians, the bulk of the work in this paper concerns how to leverage a weaker algorithm for learning univariate mixtures (of many Gaussians) to learn in high dimensions. Our algorithm employs hierarchical clustering and rescaling, together with methods for backtracking and recovering from the failures that can occur in our univariate algorithm. Finally, while the running time and data requirements of our algorithm depend exponentially on the number of Gaussians in the mixture, we prove that such a dependence is necessary.

Journal ArticleDOI
TL;DR: This work develops Bayesian inference based on data augmentation and Markov chain Monte Carlo (MCMC) sampling that allows robust modeling of high-dimensional multimodal and asymmetric data generated by popular biotechnological platforms such as flow cytometry.
Abstract: Skew-normal and skew-t distributions have proved to be useful for capturing skewness and kurtosis in data directly without transformation. Recently, finite mixtures of such distributions have been considered as a more general tool for handling heterogeneous data involving asymmetric behaviors across subpopulations. We consider such mixture models for both univariate as well as multivariate data. This allows robust modeling of high-dimensional multimodal and asymmetric data generated by popular biotechnological platforms such as flow cytometry. We develop Bayesian inference based on data augmentation and Markov chain Monte Carlo (MCMC) sampling. In addition to the latent allocations, data augmentation is based on a stochastic representation of the skew-normal distribution in terms of a random-effects model with truncated normal random effects. For finite mixtures of skew normals, this leads to a Gibbs sampling scheme that draws from standard densities only. This MCMC scheme is extended to mixtures of skew-t distributions based on representing the skew-t distribution as a scale mixture of skew normals. As an important application of our new method, we demonstrate how it provides a new computational framework for automated analysis of high-dimensional flow cytometric data. Using multivariate skew-normal and skew-t mixture models, we could model non-Gaussian cell populations rigorously and directly without transformation or projection to lower dimensions.

Journal ArticleDOI
TL;DR: This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models, which gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data.
Abstract: Motivation: In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the expectation–maximization algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the expanded parsimonious Gaussian mixture model (EPGMM) family, is then applied to two well-known gene expression data sets. Results: The performance of the EPGMM family of models is quantified using the adjusted Rand index. This family of models gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data. Availability: The reduced, preprocessed data that were analysed are available at www.paulmcnicholas.info Contact: pmcnicho@uoguelph.ca

Journal ArticleDOI
TL;DR: A new signal model is proposed where the leading vocal part is explicitly represented by a specific source/filter model and reaches state-of-the-art performances on all test sets.
Abstract: Extracting the main melody from a polyphonic music recording seems natural even to untrained human listeners. To a certain extent it is related to the concept of source separation, with the human ability of focusing on a specific source in order to extract relevant information. In this paper, we propose a new approach for the estimation and extraction of the main melody (and in particular the leading vocal part) from polyphonic audio signals. To that aim, we propose a new signal model where the leading vocal part is explicitly represented by a specific source/filter model. The proposed representation is investigated in the framework of two statistical models: a Gaussian Scaled Mixture Model (GSMM) and an extended Instantaneous Mixture Model (IMM). For both models, the estimation of the different parameters is done within a maximum-likelihood framework adapted from single-channel source separation techniques. The desired sequence of fundamental frequencies is then inferred from the estimated parameters. The results obtained in a recent evaluation campaign (MIREX08) show that the proposed approaches are very promising and reach state-of-the-art performances on all test sets.

Proceedings ArticleDOI
14 Mar 2010
TL;DR: An acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space, and this style of acoustic model allows for a much more compact representation.
Abstract: We describe an acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space. We call this a Subspace Gaussian Mixture Model (SGMM). Globally shared parameters define the subspace. This style of acoustic model allows for a much more compact representation and gives better results than a conventional modeling approach, particularly with smaller amounts of training data.

Book ChapterDOI
05 Sep 2010
TL;DR: This work proposes to use a mixture of holistic templates (e.g. HOG) and discriminative learning for joint viewpoint classification and category detection and shows great promise comparing to a number of baselines including discrete nearest neighbor and linear regression.
Abstract: Object viewpoint classification aims at predicting an approximate 3D pose of objects in a scene and is receiving increasing attention. State-of-the-art approaches to viewpoint classification use generative models to capture relations between object parts. In this work we propose to use a mixture of holistic templates (e.g. HOG) and discriminative learning for joint viewpoint classification and category detection. Inspired by the work of Felzenszwalb et al 2009, we discriminatively train multiple components simultaneously for each object category. A large number of components are learned in the mixture and they are associated with canonical viewpoints of the object through different levels of supervision, being fully supervised, semi-supervised, or unsupervised. We show that discriminative learning is capable of producing mixture components that directly provide robust viewpoint classification, significantly outperforming the state of the art: we improve the viewpoint accuracy on the Savarese et al 3D Object database from 57% to 74%, and that on the VOC 2006 car database from 73% to 86%. In addition, the mixture-of-templates approach to object viewpoint/pose has a natural extension to the continuous case by discriminatively learning a linear appearance model locally at each discrete view. We evaluate continuous viewpoint estimation on a dataset of everyday objects collected using IMUs for groundtruth annotation: our mixture model shows great promise comparing to a number of baselines including discrete nearest neighbor and linear regression.

Proceedings ArticleDOI
14 Mar 2010
TL;DR: This work reports experiments on a different approach to multilingual speech recognition, in which the phone sets are entirely distinct but the model has parameters not tied to specific states that are shared across languages.
Abstract: Although research has previously been done on multilingual speech recognition, it has been found to be very difficult to improve over separately trained systems. The usual approach has been to use some kind of “universal phone set” that covers multiple languages. We report experiments on a different approach to multilingual speech recognition, in which the phone sets are entirely distinct but the model has parameters not tied to specific states that are shared across languages. We use a model called a “Subspace Gaussian Mixture Model” where states' distributions are Gaussian Mixture Models with a common structure, constrained to lie in a subspace of the total parameter space. The parameters that define this subspace can be shared across languages. We obtain substantial WER improvements with this approach, especially with very small amounts of in-language training data.

Proceedings Article
21 Jun 2010
TL;DR: The IBP compound Dirichlet process (ICD) is developed, a Bayesian nonparametric prior that decouples across-data prevalence and within-data proportion in a mixed membership model and shows superior performance over the HDP-based topic model.
Abstract: The hierarchical Dirichlet process (HDP) is a Bayesian nonparametric mixed membership model—each data point is modeled with a collection of components of different proportions. Though powerful, the HDP makes an assumption that the probability of a component being exhibited by a data point is positively correlated with its proportion within that data point. This might be an undesirable assumption. For example, in topic modeling, a topic (component) might be rare throughout the corpus but dominant within those documents (data points) where it occurs. We develop the IBP compound Dirichlet process (ICD), a Bayesian nonparametric prior that decouples across-data prevalence and within-data proportion in a mixed membership model. The ICD combines properties from the HDP and the Indian buffet process (IBP), a Bayesian nonparametric prior on binary matrices. The ICD assigns a subset of the shared mixture components to each data point. This subset, the data point's "focus", is determined independently from the amount that each of its components contribute. We develop an ICD mixture model for text, the focused topic model (FTM), and show superior performance over the HDP-based topic model.

Journal ArticleDOI
TL;DR: In this article, the load probability density function (pdf) in the distribution network shows a number of variations at different nodes and cannot be represented by any specific distribution, and an approach to utilise the loads as pseudo-measurements for the purpose of distribution system state estimation (DSSE).
Abstract: This study presents an approach to utilise the loads as pseudo-measurements for the purpose of distribution system state estimation (DSSE). The load probability density function (pdf) in the distribution network shows a number of variations at different nodes and cannot be represented by any specific distribution. The approach presented in this study represents all the load pdfs through the Gaussian mixture model (GMM). The expectation maximisation (EM) algorithm is used to obtain the parameters of the mixture components. The standard weighted least squares (WLS) algorithm utilises these load models as pseudo-measurements. The effectiveness of WLS is assessed through some statistical measures such as bias, consistency and quality of the estimates in a 95-bus generic distribution network model.

Proceedings Article
11 Jul 2010
TL;DR: A Bayesian matrix factorization model that performs regression against side information known about the data in addition to the observations is introduced and applied to the Netflix Prize problem of predicting movie ratings given an incomplete user-movie ratings matrix.
Abstract: Matrix factorization is a fundamental technique in machine learning that is applicable to collaborative filtering, information retrieval and many other areas. In collaborative filtering and many other tasks, the objective is to fill in missing elements of a sparse data matrix. One of the biggest challenges in this case is filling in a column or row of the matrix with very few observations. In this paper we introduce a Bayesian matrix factorization model that performs regression against side information known about the data in addition to the observations. The side information helps by adding observed entries to the factored matrices. We also introduce a nonpara-metric mixture model for the prior of the rows and columns of the factored matrices that gives a different regularization for each latent class. Besides providing a richer prior, the posterior distribution of mixture assignments reveals the latent classes. Using Gibbs sampling for inference, we apply our model to the Netflix Prize problem of predicting movie ratings given an incomplete user-movie ratings matrix. Incorporating rating information with gathered metadata information, our Bayesian approach outperforms other matrix factorization techniques even when using fewer dimensions.

30 Jan 2010
TL;DR: A recent survey of different statistical methods used in background modeling, focusing on the first generation methods: Single Gaussian, Mixture of Gaussians, Kernel Density Estimation and Subspace Learning using PCA.
Abstract: Background modeling is often used in the context of moving objects detection from static cameras. Numerous methods have been developed over the recent years and the most used are the statistical ones. The purpose of this chapter is to provide a recent survey of these different statistical methods. For this, we have classified them in term of generation following the years of publication and the statistical tools used. We then focus on the first generation methods: Single Gaussian, Mixture of Gaussians, Kernel Density Estimation and Subspace Learning using PCA. These original methods are reminded and then we have classified their different improvements in term of strategies. After analyzing the strategies and identifying their limitations, we conclude with several promising directions for future research.

Journal ArticleDOI
TL;DR: This paper used the autoregressive moving average as well as from Yule-Walker and Burg's methods, to extract the power density spectrum from representative signal samples, and found that Burg's method for spectrum estimation together with a support vector machine classifier yields the best classification results.
Abstract: The analysis of electroencephalograms continues to be a problem due to our limited understanding of the signal origin. This limited understanding leads to ill-defined models, which in turn make it hard to design effective evaluation methods. Despite these shortcomings, electroencephalogram analysis is a valuable tool in the evaluation of neurological disorders and the evaluation of overall cerebral activity. We compared different model based power spectral density estimation methods and different classification methods. Specifically, we used the autoregressive moving average as well as from Yule-Walker and Burg's methods, to extract the power density spectrum from representative signal samples. Local maxima and minima were detected from these spectra. In this paper, the locations of these extrema are used as input to different classifiers. The three classifiers we used were: Gaussian mixture model, artificial neural network, and support vector machine. The classification results are documented with confusion matrices and compared with receiver operating characteristic curves. We found that Burg's method for spectrum estimation together with a support vector machine classifier yields the best classification results. This combination reaches a classification rate of 93.33%, the sensitivity is 98.33% and the specificy is 96.67%.

Journal ArticleDOI
TL;DR: An unsupervised Bayesian classification scheme for separating overlapped nuclei and a segmentation approach that incorporates a priori knowledge about the regular shape of clumped nuclei to yield more accurate segmentation results.
Abstract: In a fully automatic cell extraction process, one of the main issues to overcome is the problem related to extracting overlapped nuclei since such nuclei will often affect the quantitative analysis of cell images. In this paper, we present an unsupervised Bayesian classification scheme for separating overlapped nuclei. The proposed approach first involves applying the distance transform to overlapped nuclei. The topographic surface generated by distance transform is viewed as a mixture of Gaussians in the proposed algorithm. In order to learn the distribution of the topographic surface, the parametric expectation-maximization (EM) algorithm is employed. Cluster validation is performed to determine how many nuclei are overlapped. Our segmentation approach incorporates a priori knowledge about the regular shape of clumped nuclei to yield more accurate segmentation results. Experimental results show that the proposed method yields superior segmentation performance, compared to those produced by conventional schemes.

Journal ArticleDOI
TL;DR: A Bayesian hierarchical model is proposed to fulfil the aims of estimating the characteristics that bring a team to lose or win a game, or to predict the score of a particular match, and a more complex mixture model is specified that results in a better fit to the observed data.
Abstract: The problem of modelling football data has become increasingly popular in the last few years and many different models have been proposed with the aim of estimating the characteristics that bring a team to lose or win a game, or to predict the score of a particular match. We propose a Bayesian hierarchical model to fulfil both these aims and test its predictive strength based on data about the Italian Serie A 1991-1992 championship. To overcome the issue of overshrinkage produced by the Bayesian hierarchical model, we specify a more complex mixture model that results in a better fit to the observed data. We test its performance using an example of the Italian Serie A 2007-2008 championship.

Journal ArticleDOI
TL;DR: This paper compares the choice of conjugate and non-conjugate base distributions on a particular class of DPM models, the Dirichlet process Gaussian mixture model (DPGMM), and shows that better density models can result from using a wider class of priors with no or only a modest increase in computational effort.
Abstract: In the Bayesian mixture modeling framework it is possible to infer the necessary number of components to model the data and therefore it is unnecessary to explicitly restrict the number of components. Nonparametric mixture models sidestep the problem of finding the "correct" number of mixture components by assuming infinitely many components. In this paper Dirichlet process mixture (DPM) models are cast as infinite mixture models and inference using Markov chain Monte Carlo is described. The specification of the priors on the model parameters is often guided by mathematical and practical convenience. The primary goal of this paper is to compare the choice of conjugate and non-conjugate base distributions on a particular class of DPM models which is widely used in applications, the Dirichlet process Gaussian mixture model (DPGMM). We compare computational efficiency and modeling performance of DPGMM defined using a conjugate and a conditionally conjugate base distribution. We show that better density models can result from using a wider class of priors with no or only a modest increase in computational effort.

Journal ArticleDOI
TL;DR: Convergence rates of Bayesian density estimators based on finite location-scale mixtures of exponential power distributions are studied to derive posterior concentration rates, with priors based on these mixture models.
Abstract: We study convergence rates of Bayesian density estimators based on finite location-scale mixtures of exponential power distributions. We construct approximations of �-Holder densities be continuous mix- tures of exponential power distributions, leading to approximations of the �-Holder densities by finite mixtures. These results are then used to derive posterior concentration rates, with priors based on these mixture models. The rates are minimax (up to a logn term) and since the priors are inde- pendent of the smoothness the rates are adaptive to the smoothness.

Journal ArticleDOI
TL;DR: Model-based clustering using a family of Gaussian mixture models, with parsimonious factor analysis-like covariance structure, is described and an ecient algorithm for its implementation is presented, showing its eectiveness when compared to existing software.

Book ChapterDOI
05 Sep 2010
TL;DR: A mixture sparse coding model that can produce high-dimensional sparse representations very efficiently that works pretty well with linear classifiers and effectively encourages data that are similar to each other to enjoy similar sparse representations is described.
Abstract: Sparse coding of sensory data has recently attracted notable attention in research of learning useful features from the unlabeled data. Empirical studies show that mapping the data into a significantly higher-dimensional space with sparse coding can lead to superior classification performance. However, computationally it is challenging to learn a set of highly over-complete dictionary bases and to encode the test data with the learned bases. In this paper, we describe a mixture sparse coding model that can produce high-dimensional sparse representations very efficiently. Besides the computational advantage, the model effectively encourages data that are similar to each other to enjoy similar sparse representations. What's more, the proposed model can be regarded as an approximation to the recently proposed local coordinate coding (LCC), which states that sparse coding can approximately learn the nonlinear manifold of the sensory data in a locally linear manner. Therefore, the feature learned by the mixture sparse coding model works pretty well with linear classifiers. We apply the proposed model to PASCAL VOC 2007 and 2009 datasets for the classification task, both achieving state-of-the-art performances.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: A novel method for the discovery and statistical representation of motion patterns in a scene observed by a static camera, and an algorithm for learning these patterns from dense optical flow in a hierarchical, unsupervised fashion.
Abstract: We present a novel method for the discovery and statistical representation of motion patterns in a scene observed by a static camera. Related methods involving learning of patterns of activity rely on trajectories obtained from object detection and tracking systems, which are unreliable in complex scenes of crowded motion. We propose a mixture model representation of salient patterns of optical flow, and present an algorithm for learning these patterns from dense optical flow in a hierarchical, unsupervised fashion. Using low level cues of noisy optical flow, K-means is employed to initialize a Gaussian mixture model for temporally segmented clips of video. The components of this mixture are then filtered and instances of motion patterns are computed using a simple motion model, by linking components across space and time. Motion patterns are then initialized and membership of instances in different motion patterns is established by using KL divergence between mixture distributions of pattern instances. Finally, a pixel level representation of motion patterns is proposed by deriving conditional expectation of optical flow. Results of extensive experiments are presented for multiple surveillance sequences containing numerous patterns involving both pedestrian and vehicular traffic.