Showing papers on "Mixture model published in 2007"

PDF

Open Access

Journal Article•DOI•

Deciding on the Number of Classes in Latent Class Analysis and Growth Mixture Modeling: A Monte Carlo Simulation Study

[...]

Karen L. Nylund¹, Tihomir Asparouhov, Bengt Muthén¹•Institutions (1)

05 Dec 2007-Structural Equation Modeling

TL;DR: Whereas the Bayesian Information Criterion performed the best of the ICs, the bootstrap likelihood ratio test proved to be a very consistent indicator of classes across all of the models considered.

...read moreread less

Abstract: Mixture modeling is a widely applied data analysis technique used to identify unobserved heterogeneity in a population. Despite mixture models' usefulness in practice, one unresolved issue in the application of mixture models is that there is not one commonly accepted statistical indicator for deciding on the number of classes in a study population. This article presents the results of a simulation study that examines the performance of likelihood-based tests and the traditionally used Information Criterion (ICs) used for determining the number of classes in mixture modeling. We look at the performance of these tests and indexes for 3 types of mixture models: latent class analysis (LCA), a factor mixture model (FMA), and a growth mixture models (GMM). We evaluate the ability of the tests and indexes to correctly identify the number of classes at three different sample sizes (n = 200, 500, 1,000). Whereas the Bayesian Information Criterion performed the best of the ICs, the bootstrap likelihood ratio test ...

...read moreread less

7,716 citations

Proceedings Article•DOI•

Fisher Kernels on Visual Vocabularies for Image Categorization

[...]

Florent Perronnin¹, Christopher R. Dance¹•Institutions (1)

Xerox¹

17 Jun 2007

TL;DR: This work shows that Fisher kernels can actually be understood as an extension of the popular bag-of-visterms, and proposes to apply this framework to image categorization where the input signals are images and where the underlying generative model is a visual vocabulary: a Gaussian mixture model which approximates the distribution of low-level features in images.

...read moreread less

Abstract: Within the field of pattern classification, the Fisher kernel is a powerful framework which combines the strengths of generative and discriminative approaches. The idea is to characterize a signal with a gradient vector derived from a generative probability model and to subsequently feed this representation to a discriminative classifier. We propose to apply this framework to image categorization where the input signals are images and where the underlying generative model is a visual vocabulary: a Gaussian mixture model which approximates the distribution of low-level features in images. We show that Fisher kernels can actually be understood as an extension of the popular bag-of-visterms. Our approach demonstrates excellent performance on two challenging databases: an in-house database of 19 object/scene categories and the recently released VOC 2006 database. It is also very practical: it has low computational needs both at training and test time and vocabularies trained on one set of categories can be applied to another set without any significant loss in performance.

...read moreread less

1,874 citations

Journal Article•DOI•

Some Aspects of Measurement Error in Linear Regression of Astronomical Data

[...]

Brandon C. Kelly¹•Institutions (1)

University of Arizona¹

20 Aug 2007-The Astrophysical Journal

TL;DR: In this paper, a Bayesian method was proposed to account for measurement errors in linear regression of astronomical data. The method is based on deriving a likelihood function for the measured data, and focus on the case when the intrinsic distribution of the independent variables can be approximated using a mixture of Gaussian functions.

...read moreread less

Abstract: I describe a Bayesian method to account for measurement errors in linear regression of astronomical data. The method allows for heteroscedastic and possibly correlated measurement errors and intrinsic scatter in the regression relationship. The method is based on deriving a likelihood function for the measured data, and I focus on the case when the intrinsic distribution of the independent variables can be approximated using a mixture of Gaussian functions. I generalize the method to incorporate multiple independent variables, nondetections, and selection effects (e.g., Malmquist bias). A Gibbs sampler is described for simulating random draws from the probability distribution of the parameters, given the observed data. I use simulation to compare the method with other common estimators. The simulations illustrate that the Gaussian mixture model outperforms other common estimators and can effectively give constraints on the regression parameters, even when the measurement errors dominate the observed scatter, source detection fraction is low, or the intrinsic distribution of the independent variables is not a mixture of Gaussian functions. I conclude by using this method to fit the X-ray spectral slope as a function of Eddington ratio using a sample of 39 z 0.8 radio-quiet quasars. I confirm the correlation seen by other authors between the radio-quiet quasar X-ray spectral slope and the Eddington ratio, where the X-ray spectral slope softens as the Eddington ratio increases. IDL routines are made available for performing the regression.

...read moreread less

1,264 citations

Journal Article•DOI•

On Learning, Representing, and Generalizing a Task in a Humanoid Robot

[...]

Sylvain Calinon¹, F. Guenter¹, Aude Billard¹•Institutions (1)

École Normale Supérieure¹

01 Apr 2007

TL;DR: A programming-by-demonstration framework for generically extracting the relevant features of a given task and for addressing the problem of generalizing the acquired knowledge to different contexts is presented.

...read moreread less

Abstract: We present a programming-by-demonstration framework for generically extracting the relevant features of a given task and for addressing the problem of generalizing the acquired knowledge to different contexts. We validate the architecture through a series of experiments, in which a human demonstrator teaches a humanoid robot simple manipulatory tasks. A probability-based estimation of the relevance is suggested by first projecting the motion data onto a generic latent space using principal component analysis. The resulting signals are encoded using a mixture of Gaussian/Bernoulli distributions (Gaussian mixture model/Bernoulli mixture model). This provides a measure of the spatio-temporal correlations across the different modalities collected from the robot, which can be used to determine a metric of the imitation performance. The trajectories are then generalized using Gaussian mixture regression. Finally, we analytically compute the trajectory which optimizes the imitation metric and use this to generalize the skill to different contexts

...read moreread less

1,089 citations

Proceedings Article•DOI•

Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models

[...]

John R. Hershey¹, Peder A. Olsen¹•Institutions (1)

IBM¹

15 Apr 2007

TL;DR: Two new methods, the variational approximation and the Variational upper bound, are introduced and compared to existing methods and the benefits of each one are considered and the performance of each is evaluated through numerical experiments.

...read moreread less

Abstract: The Kullback Leibler (KL) divergence is a widely used tool in statistics and pattern recognition. The KL divergence between two Gaussian mixture models (GMMs) is frequently needed in the fields of speech and image recognition. Unfortunately the KL divergence between two GMMs is not analytically tractable, nor does any efficient computational algorithm exist. Some techniques cope with this problem by replacing the KL divergence with other functions that can be computed efficiently. We introduce two new methods, the variational approximation and the variational upper bound, and compare them to existing methods. We discuss seven different techniques in total and weigh the benefits of each one against the others. To conclude we evaluate the performance of each one through numerical experiments.

...read moreread less

998 citations

Journal Article•DOI•

Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

[...]

Tomoki Toda¹, Alan W. Black², Keiichi Tokuda³•Institutions (3)

Nara Institute of Science and Technology¹, Carnegie Mellon University², Nagoya Institute of Technology³

01 Nov 2007-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: In this article, a Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers, and a conversion method based on the maximum-likelihood estimation of a spectral parameter trajectory is proposed.

...read moreread less

Abstract: In this paper, we describe a novel spectral conversion method for voice conversion (VC). A Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers. The conventional method converts spectral parameters frame by frame based on the minimum mean square error. Although it is reasonably effective, the deterioration of speech quality is caused by some problems: 1) appropriate spectral movements are not always caused by the frame-based conversion process, and 2) the converted spectra are excessively smoothed by statistical modeling. In order to address those problems, we propose a conversion method based on the maximum-likelihood estimation of a spectral parameter trajectory. Not only static but also dynamic feature statistics are used for realizing the appropriate converted spectrum sequence. Moreover, the oversmoothing effect is alleviated by considering a global variance feature of the converted spectra. Experimental results indicate that the performance of VC can be dramatically improved by the proposed method in view of both speech quality and conversion accuracy for speaker individuality.

...read moreread less

914 citations

Book Chapter•DOI•

A Hilbert space embedding for distributions

[...]

Alexander J. Smola, Arthur Gretton, Le Song, Bernhard Schölkopf

01 Oct 2007

TL;DR: This work describes a technique for comparing distributions without the need for density estimation as an intermediate step, which relies on mapping the distributions into a reproducing kernel Hilbert space.

...read moreread less

Abstract: We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in two-sample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation.

...read moreread less

909 citations

Journal Article•DOI•

Model‐based clustering for social networks

[...]

Mark S. Handcock¹, Adrian E. Raftery¹, Jeremy Tantrum²•Institutions (2)

University of Washington¹, Microsoft²

01 Mar 2007-Journal of The Royal Statistical Society Series A-statistics in Society

TL;DR: A new model is proposed, the latent position cluster model, under which the probability of a tie between two actors depends on the distance between them in an unobserved Euclidean ‘social space’, and the actors’ locations in the latent social space arise from a mixture of distributions, each corresponding to a cluster.

...read moreread less

Abstract: Summary. Network models are widely used to represent relations between interacting units or actors. Network data often exhibit transitivity, meaning that two actors that have ties to a third actor are more likely to be tied than actors that do not, homophily by attributes of the actors or dyads, and clustering. Interest often focuses on finding clusters of actors or ties, and the number of groups in the data is typically unknown. We propose a new model, the latent position cluster model, under which the probability of a tie between two actors depends on the distance between them in an unobserved Euclidean ‘social space’, and the actors’ locations in the latent social space arise from a mixture of distributions, each corresponding to a cluster. We propose two estimation methods: a two-stage maximum likelihood method and a fully Bayesian method that uses Markov chain Monte Carlo sampling. The former is quicker and simpler, but the latter performs better. We also propose a Bayesian way of determining the number of clusters that are present by using approximate conditional Bayes factors. Our model represents transitivity, homophily by attributes and clustering simultaneously and does not require the number of clusters to be known. The model makes it easy to simulate realistic networks with clustering, which are potentially useful as inputs to models of more complex systems of which the network is part, such as epidemic models of infectious disease. We apply the model to two networks of social relations. A free software package in the R statistical language, latentnet, is available to analyse data by using the model.

...read moreread less

785 citations

Journal Article•DOI•

Mixture models and exploratory analysis in networks

[...]

Mark Newman¹, Elizabeth Leicht•Institutions (1)

University of Michigan¹

05 Jun 2007-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: A general technique for detecting structural features in large-scale network data that works by dividing the nodes of a network into classes such that the members of each class have similar patterns of connection to other nodes is described.

...read moreread less

Abstract: Networks are widely used in the biological, physical, and social sciences as a concise mathematical representation of the topology of systems of interacting components. Understanding the structure of these networks is one of the outstanding challenges in the study of complex systems. Here we describe a general technique for detecting structural features in large-scale network data that works by dividing the nodes of a network into classes such that the members of each class have similar patterns of connection to other nodes. Using the machinery of probabilistic mixture models and the expectation–maximization algorithm, we show that it is possible to detect, without prior knowledge of what we are looking for, a very broad range of types of structure in networks. We give a number of examples demonstrating how the method can be used to shed light on the properties of real-world networks, including social and information networks.

...read moreread less

577 citations

MCLUST Version 3 for R: Normal Mixture Modeling and Model-Based Clustering †

[...]

Chris Fraley, Adrian E. Raftery

01 Jan 2007

TL;DR: A number of features of the software have been changed in this version, and the functionality has been expanded to include regularization for normal mixture models via a Bayesian prior.

...read moreread less

Abstract: MCLUST is a contributed R package for normal mixture modeling and model-based clustering. It provides functions for parameter estimation via the EM algorithm for normal mixture models with a variety of covariance structures, and functions for simulation from these models. Also included are functions that combine model-based hierarchical clustering, EM for mixture estimation and the Bayesian Information Criterion (BIC) in comprehensive strategies for clustering, density estimation and discriminant analysis. There is additional functionality for displaying and visualizing the models along with clustering and classification results. A number of features of the software have been changed in this version, and the functionality has been expanded to include regularization for normal mixture models via a Bayesian prior. A web page with related links including license information can be found at http://www.stat.washington.edu/mclust.

...read moreread less

494 citations

Journal Article•DOI•

Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering

[...]

Chris Fraley¹, Adrian E. Raftery¹•Institutions (1)

University of Washington¹

01 Sep 2007-Journal of Classification

TL;DR: A modified version of BIC is proposed, where the likelihood is evaluated at the MAP instead of the MLE, and the resulting method avoids degeneracies and singularities, but when these are not present it gives similar results to the standard method using MLE.

...read moreread less

Abstract: Normal mixture models are widely used for statistical modeling of data, including cluster analysis. However maximum likelihood estimation (MLE) for normal mixtures using the EM algorithm may fail as the result of singularities or degeneracies. To avoid this, we propose replacing the MLE by a maximum a posteriori (MAP) estimator, also found by the EM algorithm. For choosing the number of components and the model parameterization, we propose a modified version of BIC, where the likelihood is evaluated at the MAP instead of the MLE. We use a highly dispersed proper conjugate prior, containing a small fraction of one observation's worth of information. The resulting method avoids degeneracies and singularities, but when these are not present it gives similar results to the standard method using MLE, EM and BIC.

...read moreread less

Journal Article•DOI•

Performance of Factor Mixture Models as a Function of Model Size, Covariate Effects, and Class-Specific Parameters.

[...]

Gitta H. Lubke¹, Bengt Muthén²•Institutions (2)

University of Notre Dame¹, University of California, Los Angeles²

05 Dec 2007-Structural Equation Modeling

TL;DR: In this article, the authors investigate the performance of factor mixture models for the analysis of multivariate data obtained from a population consisting of distinct latent classes and focus on covariate effects, model size and class-specific versus class-invariant parameters.

...read moreread less

Abstract: Factor mixture models are designed for the analysis of multivariate data obtained from a population consisting of distinct latent classes. A common factor model is assumed to hold within each of the latent classes. Factor mixture modeling involves obtaining estimates of the model parameters, and may also be used to assign subjects to their most likely latent class. This simulation study investigates aspects of model performance such as parameter coverage and correct class membership assignment and focuses on covariate effects, model size, and class-specific versus class-invariant parameters. When fitting true models, parameter coverage is good for most parameters even for the smallest class separation investigated in this study (0.5 SD between 2 classes). The same holds for convergence rates. Correct class assignment is unsatisfactory for the small class separation without covariates, but improves dramatically with increasing separation, covariate effects, or both. Model performance is not influe...

...read moreread less

Proceedings Article•DOI•

Multi-task reinforcement learning: a hierarchical Bayesian approach

[...]

Aaron Wilson¹, Alan Fern¹, Soumya Ray¹, Prasad Tadepalli¹•Institutions (1)

Oregon State University¹

20 Jun 2007

TL;DR: This work considers the problem of multi-task reinforcement learning, where the agent needs to solve a sequence of Markov Decision Processes chosen randomly from a fixed but unknown distribution, using a hierarchical Bayesian infinite mixture model.

...read moreread less

Abstract: We consider the problem of multi-task reinforcement learning, where the agent needs to solve a sequence of Markov Decision Processes (MDPs) chosen randomly from a fixed but unknown distribution. We model the distribution over MDPs using a hierarchical Bayesian infinite mixture model. For each novel MDP, we use the previously learned distribution as an informed prior for modelbased Bayesian reinforcement learning. The hierarchical Bayesian framework provides a strong prior that allows us to rapidly infer the characteristics of new environments based on previous environments, while the use of a nonparametric model allows us to quickly adapt to environments we have not encountered before. In addition, the use of infinite mixtures allows for the model to automatically learn the number of underlying MDP components. We evaluate our approach and show that it leads to significant speedups in convergence to an optimal policy after observing only a small number of tasks.

...read moreread less

Journal Article•

Penalized Model-Based Clustering with Application to Variable Selection

[...]

Wei Pan, Xiaotong Shen

01 May 2007-Journal of Machine Learning Research

TL;DR: A penalized likelihood approach with an L1 penalty function is proposed, automatically realizing variable selection via thresholding and delivering a sparse solution in model-based clustering analysis with a common diagonal covariance matrix.

...read moreread less

Abstract: Variable selection in clustering analysis is both challenging and important. In the context of model-based clustering analysis with a common diagonal covariance matrix, which is especially suitable for "high dimension, low sample size" settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method.

...read moreread less

Journal Article•DOI•

Bayesian density regression

[...]

David B. Dunson¹, Natesh S. Pillai², Juhyun Park³•Institutions (3)

National Institutes of Health¹, Duke University², University of North Carolina at Chapel Hill³

01 Apr 2007-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: The paper considers Bayesian methods for density regression, allowing a random probability distribution to change flexibly with multiple predictors, and proposes a kernel‐based weighting scheme that incorporates weights that are dependent on the distance between subjects’ predictor values.

...read moreread less

Abstract: Summary The paper considers Bayesian methods for density regression, allowing a random probability distribution to change flexibly with multiple predictors The conditional response distribution is expressed as a non-parametric mixture of regression models, with the mixture distribution changing with predictors A class of weighted mixture of Dirichlet process priors is proposed for the uncountable collection of mixture distributions It is shown that this specification results in a generalized Polya urn scheme, which incorporates weights that are dependent on the distance between subjects’ predictor values To allow local dependence in the mixture distributions, we propose a kernel-based weighting scheme A Gibbs sampling algorithm is developed for posterior computation The methods are illustrated by using simulated data examples and an epidemiologic application

...read moreread less

Journal Article•DOI•

Reduced Support Vector Machines: A Statistical Theory

[...]

Yuh-Jye Lee¹, Su-Yun Huang²•Institutions (2)

National Taiwan University of Science and Technology¹, Academia Sinica²

01 Jan 2007-IEEE Transactions on Neural Networks

TL;DR: This paper study the RSVM from the viewpoint of sampling design, its robustness, and the spectral analysis of the reduced kernel, which indicates that the approximation kernels can retain most of the relevant information for learning tasks in the full kernel.

...read moreread less

Abstract: In dealing with large data sets, the reduced support vector machine (RSVM) was proposed for the practical objective to overcome some computational difficulties as well as to reduce the model complexity. In this paper, we study the RSVM from the viewpoint of sampling design, its robustness, and the spectral analysis of the reduced kernel. We consider the nonlinear separating surface as a mixture of kernels. Instead of a full model, the RSVM uses a reduced mixture with kernels sampled from certain candidate set. Our main results center on two major themes. One is the robustness of the random subset mixture model. The other is the spectral analysis of the reduced kernel. The robustness is judged by a few criteria as follows: 1) model variation measure; 2) model bias (deviation) between the reduced model and the full model; and 3) test power in distinguishing the reduced model from the full one. For the spectral analysis, we compare the eigenstructures of the full kernel matrix and the approximation kernel matrix. The approximation kernels are generated by uniform random subsets. The small discrepancies between them indicate that the approximation kernels can retain most of the relevant information for learning tasks in the full kernel. We focus on some statistical theory of the reduced set method mainly in the context of the RSVM. The use of a uniform random subset is not limited to the RSVM. This approach can act as a supplemental algorithm on top of a basic optimization algorithm, wherein the actual optimization takes place on the subset-approximated data. The statistical properties discussed in this paper are still valid

...read moreread less

Proceedings Article•DOI•

Mixture-Model Adaptation for SMT

[...]

George Foster¹, Roland Kuhn¹•Institutions (1)

National Research Council¹

23 Jun 2007

TL;DR: A mixture-model approach to adapting a Statistical Machine Translation System for new domains, using weights that depend on text distances to mixture components to achieve gains of approximately one BLEU percentage point over a state-of-the art non-adapted baseline system is described.

...read moreread less

Abstract: We describe a mixture-model approach to adapting a Statistical Machine Translation System for new domains, using weights that depend on text distances to mixture components. We investigate a number of variants on this approach, including cross-domain versus dynamic adaptation; linear versus loglinear mixtures; language and translation model adaptation; different methods of assigning weights; and granularity of the source unit being adapted to. The best methods achieve gains of approximately one BLEU percentage point over a state-of-the art non-adapted baseline system.

...read moreread less

Journal Article•DOI•

Object Trajectory-Based Activity Classification and Recognition Using Hidden Markov Models

[...]

Faisal Bashir¹, Ashfaq Khokhar¹, Dan Schonfeld¹•Institutions (1)

University of Illinois at Chicago¹

01 Jul 2007-IEEE Transactions on Image Processing

TL;DR: This paper presents novel classification algorithms for recognizing object activity using object motion trajectory, and uses hidden Markov models (HMMs) with a data-driven design in terms of number of states and topology.

...read moreread less

Abstract: Motion trajectories provide rich spatiotemporal information about an object's activity. This paper presents novel classification algorithms for recognizing object activity using object motion trajectory. In the proposed classification system, trajectories are segmented at points of change in curvature, and the subtrajectories are represented by their principal component analysis (PCA) coefficients. We first present a framework to robustly estimate the multivariate probability density function based on PCA coefficients of the subtrajectories using Gaussian mixture models (GMMs). We show that GMM-based modeling alone cannot capture the temporal relations and ordering between underlying entities. To address this issue, we use hidden Markov models (HMMs) with a data-driven design in terms of number of states and topology (e.g., left-right versus ergodic). Experiments using a database of over 5700 complex trajectories (obtained from UCI-KDD data archives and Columbia University Multimedia Group) subdivided into 85 different classes demonstrate the superiority of our proposed HMM-based scheme using PCA coefficients of subtrajectories in comparison with other techniques in the literature.

...read moreread less

Journal Article•DOI•

Towards a coherent statistical framework for dense deformable template estimation

[...]

Stéphanie Allassonnière¹, Yali Amit², Alain Trouvé¹•Institutions (2)

University of Paris¹, University of Chicago²

01 Feb 2007-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: A rigorous Bayesian framework is proposed for which it is proved asymptotic consistency of the maximum a posteriori estimate and which leads to an effective iterative estimation algorithm of the geometric and photometric parameters in the small sample setting.

...read moreread less

Abstract: Summary. The problem of estimating probabilistic deformable template models in the field of computer vision or of probabilistic atlases in the field of computational anatomy has not yet received a coherent statistical formulation and remains a challenge. We provide a careful definition and analysis of a well-defined statistical model based on dense deformable templates for grey level images of deformable objects. We propose a rigorous Bayesian framework for which we prove asymptotic consistency of the maximum a posteriori estimate and which leads to an effective iterative estimation algorithm of the geometric and photometric parameters in the small sample setting. The model is extended to mixtures of finite numbers of such components leading to a fine description of the photometric and geometric variations of an object class. We illustrate some of the ideas with images of handwritten digits and apply the estimated models to classification through maximum likelihood.

...read moreread less

Journal Article•DOI•

Controlling the reinforcement in Bayesian non-parametric mixture models

[...]

Antonio Lijoi¹, Ramsés H. Mena², Igor Prünster³•Institutions (3)

University of Pavia¹, National Autonomous University of Mexico², University of Turin³

01 Sep 2007-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: A Bayesian non‐parametric approach is taken and adopt a hierarchical model with a suitable non-parametric prior obtained from a generalized gamma process to solve the problem of determining the number of components in a mixture model.

...read moreread less

Abstract: Summary. The paper deals with the problem of determining the number of components in a mixture model. We take a Bayesian non-parametric approach and adopt a hierarchical model with a suitable non-parametric prior for the latent structure. A commonly used model for such a problem is the mixture of Dirichlet process model. Here, we replace the Dirichlet process with a more general non-parametric prior obtained from a generalized gamma process. The basic feature of this model is that it yields a partition structure for the latent variables which is of Gibbs type. This relates to the well-known (exchangeable) product partition models. If compared with the usual mixture of Dirichlet process model the advantage of the generalization that we are examining relies on the availability of an additional parameter σ belonging to the interval (0,1): it is shown that such a parameter greatly influences the clustering behaviour of the model. A value of σ that is close to 1 generates a large number of clusters, most of which are of small size. Then, a reinforcement mechanism which is driven by σ acts on the mass allocation by penalizing clusters of small size and favouring those few groups containing a large number of elements. These features turn out to be very useful in the context of mixture modelling. Since it is difficult to specify a priori the reinforcement rate, it is reasonable to specify a prior for σ. Hence, the strength of the reinforcement mechanism is controlled by the data.

...read moreread less

Journal Article•DOI•

Variable Selection in Finite Mixture of Regression Models

[...]

Abbas Khalili¹, Jiahua Chen¹•Institutions (1)

University of British Columbia¹

01 Sep 2007-Journal of the American Statistical Association

TL;DR: A penalized likelihood approach for variable selection in FMR models is introduced that introduces penalties that depend on the size of the regression coefficients and the mixture structure and requires much less computing power than existing methods.

...read moreread less

Abstract: In the applications of finite mixture of regression (FMR) models, often many covariates are used, and their contributions to the response variable vary from one component to another of the mixture model. This creates a complex variable selection problem. Existing methods, such as the Akaike information criterion and the Bayes information criterion, are computationally expensive as the number of covariates and components in the mixture model increases. In this article we introduce a penalized likelihood approach for variable selection in FMR models. The new method introduces penalties that depend on the size of the regression coefficients and the mixture structure. The new method is shown to be consistent for variable selection. A data-adaptive method for selecting tuning parameters and an EM algorithm for efficient numerical computations are developed. Simulations show that the method performs very well and requires much less computing power than existing methods. The new method is illustrated by analyzin...

...read moreread less

Book Chapter•DOI•

Probabilistic models for expert finding

[...]

Hui Fang¹, ChengXiang Zhai¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

02 Apr 2007

TL;DR: This paper proposes and develops a general probabilistic framework for studying expert finding problem and derive two families of generative models (candidate generation models and topic generation models) from the framework that subsume most existing language models proposed for expert finding.

...read moreread less

Abstract: A common task in many applications is to find persons who are knowledgeable about a given topic (i.e., expert finding). In this paper, we propose and develop a general probabilistic framework for studying expert finding problem and derive two families of generative models (candidate generation models and topic generation models) from the framework. These models subsume most existing language models proposed for expert finding. We further propose several techniques to improve the estimation of the proposed models, including incorporating topic expansion, using a mixture model to model candidate mentions in the supporting documents, and defining an email count-based prior in the topic generation model. Our experiments show that the proposed estimation strategies are all effective to improve retrieval accuracy.

...read moreread less

Journal Article•

Finite mixture modelling using the skew normal distribution

[...]

Tsung-I Lin, Jack C. Lee, Shu Y. Yen

01 Jul 2007-Statistica Sinica

TL;DR: In this paper, the problem of analyzing a mixture of skew nor-mal distributions from the likelihood-based and Bayesian perspectives is addressed, and a fully Bayesian approach using the Markov chain Monte Carlo method is developed to carry out posterior analyses.

...read moreread less

Abstract: Normal mixture models provide the most popular framework for mod- elling heterogeneity in a population with continuous outcomes arising in a variety of subclasses. In the last two decades, the skew normal distribution has been shown beneficial in dealing with asymmetric data in various theoretic and applied prob- lems. In this article, we address the problem of analyzing a mixture of skew nor- mal distributions from the likelihood-based and Bayesian perspectives, respectively. Computational techniques using EM-type algorithms are employed for iteratively computing maximum likelihood estimates. Also, a fully Bayesian approach using the Markov chain Monte Carlo method is developed to carry out posterior analyses. Numerical results are illustrated through two examples.

...read moreread less

Proceedings Article•

Collapsed variational Dirichlet process mixture models

[...]

Kenichi Kurihara¹, Max Welling², Yee Whye Teh³•Institutions (3)

Tokyo Institute of Technology¹, University of California, Irvine², National University of Singapore³

06 Jan 2007

TL;DR: A number of variational Bayesian approximations to the Dirichlet process (DP) mixture model are studied and a novel collapsed VB approximation where mixture weights are marginalized out is considered.

...read moreread less

Abstract: Nonparametric Bayesian mixture models, in particular Dirichlet process (DP) mixture models, have shown great promise for density estimation and data clustering. Given the size of today's datasets, computational efficiency becomes an essential ingredient in the applicability of these techniques to real world data. We study and experimentally compare a number of variational Bayesian (VB) approximations to the DP mixture model. In particular we consider the standard VB approximation where parameters are assumed to be independent from cluster assignment variables, and a novel collapsed VB approximation where mixture weights are marginalized out. For both VB approximations we consider two different ways to approximate the DP, by truncating the stick-breaking construction, and by using a finite mixture model with a symmetric Dirichlet prior.

...read moreread less

Journal Article•DOI•

Stochastic downscaling of precipitation: From dry events to heavy rainfalls

[...]

M. Vrac¹, M. Vrac², Philippe Naveau²•Institutions (2)

University of Chicago¹, Centre national de la recherche scientifique²

01 Jul 2007-Water Resources Research

TL;DR: In this paper, the authors proposed a new distribution for local precipitation via a probability mixture model of Gamma and Generalized Pareto (GP) distributions, which was tested on real and simulated data, and also compared to classical rainfall densities.

...read moreread less

Abstract: [1] Downscaling precipitation is a difficult challenge for the climate community. We propose and study a new stochastic weather typing approach to perform such a task. In addition to providing accurate small and medium precipitation, our procedure possesses built-in features that allow us to model adequately extreme precipitation distributions. First, we propose a new distribution for local precipitation via a probability mixture model of Gamma and Generalized Pareto (GP) distributions. The latter one stems from Extreme Value Theory (EVT). The performance of this mixture is tested on real and simulated data, and also compared to classical rainfall densities. Then our downscaling method, extending the recently developed nonhomogeneous stochastic weather typing approach, is presented. It can be summarized as a three-step program. First, regional weather precipitation patterns are constructed through a hierarchical ascending clustering method. Second, daily transitions among our precipitation patterns are represented by a nonhomogeneous Markov model influenced by large-scale atmospheric variables like NCEP reanalyses. Third, conditionally on these regional patterns, precipitation occurrence and intensity distributions are modeled as statistical mixtures. Precipitation amplitudes are assumed to follow our mixture of Gamma and GP densities. The proposed downscaling approach is applied to 37 weather stations in Illinois and compared to various possible parameterizations and to a direct modeling. Model selection procedures show that choosing one GP distribution shape parameter per pattern for all stations provides the best rainfall representation amongst all tested models. This work highlights the importance of EVT distributions to improve the modeling and downscaling of local extreme precipitations.

...read moreread less

Journal Article•DOI•

Texture classification and segmentation using wavelet packet frame and Gaussian mixture model

[...]

Soo Chang Kim¹, Tae Jin Kang¹•Institutions (1)

Seoul National University¹

01 Apr 2007-Pattern Recognition

TL;DR: The proposed method was successfully applied to Brodatz mosaic image segmentation and fabric defect detection and can be expanded to an unsupervised texture segmentation using a Kullback-Leibler divergence between two Gaussian mixtures.

...read moreread less

Proceedings Article•DOI•

Confidence-based policy learning from demonstration using Gaussian mixture models

[...]

Sonia Chernova¹, Manuela Veloso¹•Institutions (1)

Carnegie Mellon University¹

14 May 2007

TL;DR: This work contributes an approach for interactive policy learning through expert demonstration that allows an agent to actively request and effectively represent demonstration examples, and introduces the confident execution approach, which focuses learning on relevant parts of the domain by enabling the agent to identify the need for and request demonstrations for specific part of the state space.

...read moreread less

Abstract: We contribute an approach for interactive policy learning through expert demonstration that allows an agent to actively request and effectively represent demonstration examples. In order to address the inherent uncertainty of human demonstration, we represent the policy as a set of Gaussian mixture models (GMMs), where each model, with multiple Gaussian components, corresponds to a single action. Incrementally received demonstration examples are used as training data for the GMM set. We then introduce our confident execution approach, which focuses learning on relevant parts of the domain by enabling the agent to identify the need for and request demonstrations for specific parts of the state space. The agent selects between demonstration and autonomous execution based on statistical analysis of the uncertainty of the learned Gaussian mixture set. As it achieves proficiency at its task and gains confidence in its actions, the agent operates with increasing autonomy, eliminating the need for unnecessary demonstrations of already acquired behavior, and reducing both the training time and the demonstration workload of the expert. We validate our approach with experiments in simulated and real robot domains.

...read moreread less

Journal Article•DOI•

Interpretation and inference in mixture models: Simple MCMC works

[...]

John Geweke¹•Institutions (1)

University of Iowa¹

01 Apr 2007-Computational Statistics & Data Analysis

TL;DR: The mixture model likelihood function is invariant with respect to permutation of the components of the mixture, and simple and widely used Markov chain Monte Carlo algorithms with data augmentation reliably recover the entire posterior distribution.

...read moreread less

Journal Article•DOI•

Robust mixture modeling using the skew t distribution

[...]

Tsung-I Lin¹, Jack C. Lee², Wan J. Hsieh²•Institutions (2)

National Chung Hsing University¹, National Chiao Tung University²

01 Jun 2007-Statistics and Computing

TL;DR: This article proposes a robust mixture framework based on the skew t distribution to efficiently deal with heavy-tailedness, extra skewness and multimodality in a wide range of settings and presents analytically simple EM-type algorithms for iteratively computing maximum likelihood estimates.

...read moreread less

Abstract: A finite mixture model using the Student's t distribution has been recognized as a robust extension of normal mixtures. Recently, a mixture of skew normal distributions has been found to be effective in the treatment of heterogeneous data involving asymmetric behaviors across subclasses. In this article, we propose a robust mixture framework based on the skew t distribution to efficiently deal with heavy-tailedness, extra skewness and multimodality in a wide range of settings. Statistical mixture modeling based on normal, Student's t and skew normal distributions can be viewed as special cases of the skew t mixture model. We present analytically simple EM-type algorithms for iteratively computing maximum likelihood estimates. The proposed methodology is illustrated by analyzing a real data example.

...read moreread less

Book Chapter•DOI•

Combined support vector machines and hidden Markov models for modeling facial action temporal dynamics

[...]

Michel Valstar¹, Maja Pantic¹•Institutions (1)

Imperial College London¹

20 Oct 2007

TL;DR: Two methods to improve on the base system by replacing the SVM with a hybrid SVM/Hidden Markov Model (HMM) classifier to model time in the classifier and showing that both techniques contribute to an improved classification accuracy are proposed.

...read moreread less

Abstract: The analysis of facial expression temporal dynamics is of great importance for many real-world applications. Being able to automatically analyse facial muscle actions (Action Units, AUs) in terms of recognising their neutral, onset, apex and offset phases would greatly benefit application areas as diverse as medicine, gaming and security. The base system in this paper uses Support Vector Machines (SVMs) and a set of simple geometrical features derived from automatically detected and tracked facial feature point data to segment a facial action into its temporal phases. We propose here two methods to improve on this base system in terms of classification accuracy. The first technique describes the original time-independent set of features over a period of time using polynomial parametrisation. The second technique replaces the SVM with a hybrid SVM/Hidden Markov Model (HMM) classifier to model time in the classifier. Our results show that both techniques contribute to an improved classification accuracy. Modeling the temporal dynamics by the hybrid SVM-HMM classifier attained a statistically significant increase of recall and precision by 4.5% and 7.0%, respectively.

...read moreread less

Collapse