scispace - formally typeset
Search or ask a question

Showing papers on "Mixture model published in 2007"


Journal ArticleDOI
TL;DR: Whereas the Bayesian Information Criterion performed the best of the ICs, the bootstrap likelihood ratio test proved to be a very consistent indicator of classes across all of the models considered.
Abstract: Mixture modeling is a widely applied data analysis technique used to identify unobserved heterogeneity in a population. Despite mixture models' usefulness in practice, one unresolved issue in the application of mixture models is that there is not one commonly accepted statistical indicator for deciding on the number of classes in a study population. This article presents the results of a simulation study that examines the performance of likelihood-based tests and the traditionally used Information Criterion (ICs) used for determining the number of classes in mixture modeling. We look at the performance of these tests and indexes for 3 types of mixture models: latent class analysis (LCA), a factor mixture model (FMA), and a growth mixture models (GMM). We evaluate the ability of the tests and indexes to correctly identify the number of classes at three different sample sizes (n = 200, 500, 1,000). Whereas the Bayesian Information Criterion performed the best of the ICs, the bootstrap likelihood ratio test ...

7,716 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: This work shows that Fisher kernels can actually be understood as an extension of the popular bag-of-visterms, and proposes to apply this framework to image categorization where the input signals are images and where the underlying generative model is a visual vocabulary: a Gaussian mixture model which approximates the distribution of low-level features in images.
Abstract: Within the field of pattern classification, the Fisher kernel is a powerful framework which combines the strengths of generative and discriminative approaches. The idea is to characterize a signal with a gradient vector derived from a generative probability model and to subsequently feed this representation to a discriminative classifier. We propose to apply this framework to image categorization where the input signals are images and where the underlying generative model is a visual vocabulary: a Gaussian mixture model which approximates the distribution of low-level features in images. We show that Fisher kernels can actually be understood as an extension of the popular bag-of-visterms. Our approach demonstrates excellent performance on two challenging databases: an in-house database of 19 object/scene categories and the recently released VOC 2006 database. It is also very practical: it has low computational needs both at training and test time and vocabularies trained on one set of categories can be applied to another set without any significant loss in performance.

1,874 citations


Journal ArticleDOI
TL;DR: In this paper, a Bayesian method was proposed to account for measurement errors in linear regression of astronomical data. The method is based on deriving a likelihood function for the measured data, and focus on the case when the intrinsic distribution of the independent variables can be approximated using a mixture of Gaussian functions.
Abstract: I describe a Bayesian method to account for measurement errors in linear regression of astronomical data. The method allows for heteroscedastic and possibly correlated measurement errors and intrinsic scatter in the regression relationship. The method is based on deriving a likelihood function for the measured data, and I focus on the case when the intrinsic distribution of the independent variables can be approximated using a mixture of Gaussian functions. I generalize the method to incorporate multiple independent variables, nondetections, and selection effects (e.g., Malmquist bias). A Gibbs sampler is described for simulating random draws from the probability distribution of the parameters, given the observed data. I use simulation to compare the method with other common estimators. The simulations illustrate that the Gaussian mixture model outperforms other common estimators and can effectively give constraints on the regression parameters, even when the measurement errors dominate the observed scatter, source detection fraction is low, or the intrinsic distribution of the independent variables is not a mixture of Gaussian functions. I conclude by using this method to fit the X-ray spectral slope as a function of Eddington ratio using a sample of 39 z 0.8 radio-quiet quasars. I confirm the correlation seen by other authors between the radio-quiet quasar X-ray spectral slope and the Eddington ratio, where the X-ray spectral slope softens as the Eddington ratio increases. IDL routines are made available for performing the regression.

1,264 citations


Journal ArticleDOI
01 Apr 2007
TL;DR: A programming-by-demonstration framework for generically extracting the relevant features of a given task and for addressing the problem of generalizing the acquired knowledge to different contexts is presented.
Abstract: We present a programming-by-demonstration framework for generically extracting the relevant features of a given task and for addressing the problem of generalizing the acquired knowledge to different contexts. We validate the architecture through a series of experiments, in which a human demonstrator teaches a humanoid robot simple manipulatory tasks. A probability-based estimation of the relevance is suggested by first projecting the motion data onto a generic latent space using principal component analysis. The resulting signals are encoded using a mixture of Gaussian/Bernoulli distributions (Gaussian mixture model/Bernoulli mixture model). This provides a measure of the spatio-temporal correlations across the different modalities collected from the robot, which can be used to determine a metric of the imitation performance. The trajectories are then generalized using Gaussian mixture regression. Finally, we analytically compute the trajectory which optimizes the imitation metric and use this to generalize the skill to different contexts

1,089 citations


Proceedings ArticleDOI
John R. Hershey1, Peder A. Olsen1
15 Apr 2007
TL;DR: Two new methods, the variational approximation and the Variational upper bound, are introduced and compared to existing methods and the benefits of each one are considered and the performance of each is evaluated through numerical experiments.
Abstract: The Kullback Leibler (KL) divergence is a widely used tool in statistics and pattern recognition. The KL divergence between two Gaussian mixture models (GMMs) is frequently needed in the fields of speech and image recognition. Unfortunately the KL divergence between two GMMs is not analytically tractable, nor does any efficient computational algorithm exist. Some techniques cope with this problem by replacing the KL divergence with other functions that can be computed efficiently. We introduce two new methods, the variational approximation and the variational upper bound, and compare them to existing methods. We discuss seven different techniques in total and weigh the benefits of each one against the others. To conclude we evaluate the performance of each one through numerical experiments.

998 citations


Journal ArticleDOI
TL;DR: In this article, a Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers, and a conversion method based on the maximum-likelihood estimation of a spectral parameter trajectory is proposed.
Abstract: In this paper, we describe a novel spectral conversion method for voice conversion (VC). A Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers. The conventional method converts spectral parameters frame by frame based on the minimum mean square error. Although it is reasonably effective, the deterioration of speech quality is caused by some problems: 1) appropriate spectral movements are not always caused by the frame-based conversion process, and 2) the converted spectra are excessively smoothed by statistical modeling. In order to address those problems, we propose a conversion method based on the maximum-likelihood estimation of a spectral parameter trajectory. Not only static but also dynamic feature statistics are used for realizing the appropriate converted spectrum sequence. Moreover, the oversmoothing effect is alleviated by considering a global variance feature of the converted spectra. Experimental results indicate that the performance of VC can be dramatically improved by the proposed method in view of both speech quality and conversion accuracy for speaker individuality.

914 citations


Book ChapterDOI
01 Oct 2007
TL;DR: This work describes a technique for comparing distributions without the need for density estimation as an intermediate step, which relies on mapping the distributions into a reproducing kernel Hilbert space.
Abstract: We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in two-sample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation.

909 citations


Journal ArticleDOI
TL;DR: A new model is proposed, the latent position cluster model, under which the probability of a tie between two actors depends on the distance between them in an unobserved Euclidean ‘social space’, and the actors’ locations in the latent social space arise from a mixture of distributions, each corresponding to a cluster.
Abstract: Summary. Network models are widely used to represent relations between interacting units or actors. Network data often exhibit transitivity, meaning that two actors that have ties to a third actor are more likely to be tied than actors that do not, homophily by attributes of the actors or dyads, and clustering. Interest often focuses on finding clusters of actors or ties, and the number of groups in the data is typically unknown. We propose a new model, the latent position cluster model, under which the probability of a tie between two actors depends on the distance between them in an unobserved Euclidean ‘social space’, and the actors’ locations in the latent social space arise from a mixture of distributions, each corresponding to a cluster. We propose two estimation methods: a two-stage maximum likelihood method and a fully Bayesian method that uses Markov chain Monte Carlo sampling. The former is quicker and simpler, but the latter performs better. We also propose a Bayesian way of determining the number of clusters that are present by using approximate conditional Bayes factors. Our model represents transitivity, homophily by attributes and clustering simultaneously and does not require the number of clusters to be known. The model makes it easy to simulate realistic networks with clustering, which are potentially useful as inputs to models of more complex systems of which the network is part, such as epidemic models of infectious disease. We apply the model to two networks of social relations. A free software package in the R statistical language, latentnet, is available to analyse data by using the model.

785 citations


Journal ArticleDOI
TL;DR: A general technique for detecting structural features in large-scale network data that works by dividing the nodes of a network into classes such that the members of each class have similar patterns of connection to other nodes is described.
Abstract: Networks are widely used in the biological, physical, and social sciences as a concise mathematical representation of the topology of systems of interacting components. Understanding the structure of these networks is one of the outstanding challenges in the study of complex systems. Here we describe a general technique for detecting structural features in large-scale network data that works by dividing the nodes of a network into classes such that the members of each class have similar patterns of connection to other nodes. Using the machinery of probabilistic mixture models and the expectation–maximization algorithm, we show that it is possible to detect, without prior knowledge of what we are looking for, a very broad range of types of structure in networks. We give a number of examples demonstrating how the method can be used to shed light on the properties of real-world networks, including social and information networks.

577 citations


01 Jan 2007
TL;DR: A number of features of the software have been changed in this version, and the functionality has been expanded to include regularization for normal mixture models via a Bayesian prior.
Abstract: MCLUST is a contributed R package for normal mixture modeling and model-based clustering. It provides functions for parameter estimation via the EM algorithm for normal mixture models with a variety of covariance structures, and functions for simulation from these models. Also included are functions that combine model-based hierarchical clustering, EM for mixture estimation and the Bayesian Information Criterion (BIC) in comprehensive strategies for clustering, density estimation and discriminant analysis. There is additional functionality for displaying and visualizing the models along with clustering and classification results. A number of features of the software have been changed in this version, and the functionality has been expanded to include regularization for normal mixture models via a Bayesian prior. A web page with related links including license information can be found at http://www.stat.washington.edu/mclust.

494 citations


Journal ArticleDOI
TL;DR: A modified version of BIC is proposed, where the likelihood is evaluated at the MAP instead of the MLE, and the resulting method avoids degeneracies and singularities, but when these are not present it gives similar results to the standard method using MLE.
Abstract: Normal mixture models are widely used for statistical modeling of data, including cluster analysis. However maximum likelihood estimation (MLE) for normal mixtures using the EM algorithm may fail as the result of singularities or degeneracies. To avoid this, we propose replacing the MLE by a maximum a posteriori (MAP) estimator, also found by the EM algorithm. For choosing the number of components and the model parameterization, we propose a modified version of BIC, where the likelihood is evaluated at the MAP instead of the MLE. We use a highly dispersed proper conjugate prior, containing a small fraction of one observation's worth of information. The resulting method avoids degeneracies and singularities, but when these are not present it gives similar results to the standard method using MLE, EM and BIC.

Journal ArticleDOI
TL;DR: In this article, the authors investigate the performance of factor mixture models for the analysis of multivariate data obtained from a population consisting of distinct latent classes and focus on covariate effects, model size and class-specific versus class-invariant parameters.
Abstract: Factor mixture models are designed for the analysis of multivariate data obtained from a population consisting of distinct latent classes. A common factor model is assumed to hold within each of the latent classes. Factor mixture modeling involves obtaining estimates of the model parameters, and may also be used to assign subjects to their most likely latent class. This simulation study investigates aspects of model performance such as parameter coverage and correct class membership assignment and focuses on covariate effects, model size, and class-specific versus class-invariant parameters. When fitting true models, parameter coverage is good for most parameters even for the smallest class separation investigated in this study (0.5 SD between 2 classes). The same holds for convergence rates. Correct class assignment is unsatisfactory for the small class separation without covariates, but improves dramatically with increasing separation, covariate effects, or both. Model performance is not influe...

Proceedings ArticleDOI
20 Jun 2007
TL;DR: This work considers the problem of multi-task reinforcement learning, where the agent needs to solve a sequence of Markov Decision Processes chosen randomly from a fixed but unknown distribution, using a hierarchical Bayesian infinite mixture model.
Abstract: We consider the problem of multi-task reinforcement learning, where the agent needs to solve a sequence of Markov Decision Processes (MDPs) chosen randomly from a fixed but unknown distribution. We model the distribution over MDPs using a hierarchical Bayesian infinite mixture model. For each novel MDP, we use the previously learned distribution as an informed prior for modelbased Bayesian reinforcement learning. The hierarchical Bayesian framework provides a strong prior that allows us to rapidly infer the characteristics of new environments based on previous environments, while the use of a nonparametric model allows us to quickly adapt to environments we have not encountered before. In addition, the use of infinite mixtures allows for the model to automatically learn the number of underlying MDP components. We evaluate our approach and show that it leads to significant speedups in convergence to an optimal policy after observing only a small number of tasks.

Journal Article
TL;DR: A penalized likelihood approach with an L1 penalty function is proposed, automatically realizing variable selection via thresholding and delivering a sparse solution in model-based clustering analysis with a common diagonal covariance matrix.
Abstract: Variable selection in clustering analysis is both challenging and important. In the context of model-based clustering analysis with a common diagonal covariance matrix, which is especially suitable for "high dimension, low sample size" settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method.

Journal ArticleDOI
TL;DR: The paper considers Bayesian methods for density regression, allowing a random probability distribution to change flexibly with multiple predictors, and proposes a kernel‐based weighting scheme that incorporates weights that are dependent on the distance between subjects’ predictor values.
Abstract: Summary The paper considers Bayesian methods for density regression, allowing a random probability distribution to change flexibly with multiple predictors The conditional response distribution is expressed as a non-parametric mixture of regression models, with the mixture distribution changing with predictors A class of weighted mixture of Dirichlet process priors is proposed for the uncountable collection of mixture distributions It is shown that this specification results in a generalized Polya urn scheme, which incorporates weights that are dependent on the distance between subjects’ predictor values To allow local dependence in the mixture distributions, we propose a kernel-based weighting scheme A Gibbs sampling algorithm is developed for posterior computation The methods are illustrated by using simulated data examples and an epidemiologic application

Journal ArticleDOI
TL;DR: This paper study the RSVM from the viewpoint of sampling design, its robustness, and the spectral analysis of the reduced kernel, which indicates that the approximation kernels can retain most of the relevant information for learning tasks in the full kernel.
Abstract: In dealing with large data sets, the reduced support vector machine (RSVM) was proposed for the practical objective to overcome some computational difficulties as well as to reduce the model complexity. In this paper, we study the RSVM from the viewpoint of sampling design, its robustness, and the spectral analysis of the reduced kernel. We consider the nonlinear separating surface as a mixture of kernels. Instead of a full model, the RSVM uses a reduced mixture with kernels sampled from certain candidate set. Our main results center on two major themes. One is the robustness of the random subset mixture model. The other is the spectral analysis of the reduced kernel. The robustness is judged by a few criteria as follows: 1) model variation measure; 2) model bias (deviation) between the reduced model and the full model; and 3) test power in distinguishing the reduced model from the full one. For the spectral analysis, we compare the eigenstructures of the full kernel matrix and the approximation kernel matrix. The approximation kernels are generated by uniform random subsets. The small discrepancies between them indicate that the approximation kernels can retain most of the relevant information for learning tasks in the full kernel. We focus on some statistical theory of the reduced set method mainly in the context of the RSVM. The use of a uniform random subset is not limited to the RSVM. This approach can act as a supplemental algorithm on top of a basic optimization algorithm, wherein the actual optimization takes place on the subset-approximated data. The statistical properties discussed in this paper are still valid

Proceedings ArticleDOI
23 Jun 2007
TL;DR: A mixture-model approach to adapting a Statistical Machine Translation System for new domains, using weights that depend on text distances to mixture components to achieve gains of approximately one BLEU percentage point over a state-of-the art non-adapted baseline system is described.
Abstract: We describe a mixture-model approach to adapting a Statistical Machine Translation System for new domains, using weights that depend on text distances to mixture components. We investigate a number of variants on this approach, including cross-domain versus dynamic adaptation; linear versus loglinear mixtures; language and translation model adaptation; different methods of assigning weights; and granularity of the source unit being adapted to. The best methods achieve gains of approximately one BLEU percentage point over a state-of-the art non-adapted baseline system.

Journal ArticleDOI
TL;DR: This paper presents novel classification algorithms for recognizing object activity using object motion trajectory, and uses hidden Markov models (HMMs) with a data-driven design in terms of number of states and topology.
Abstract: Motion trajectories provide rich spatiotemporal information about an object's activity. This paper presents novel classification algorithms for recognizing object activity using object motion trajectory. In the proposed classification system, trajectories are segmented at points of change in curvature, and the subtrajectories are represented by their principal component analysis (PCA) coefficients. We first present a framework to robustly estimate the multivariate probability density function based on PCA coefficients of the subtrajectories using Gaussian mixture models (GMMs). We show that GMM-based modeling alone cannot capture the temporal relations and ordering between underlying entities. To address this issue, we use hidden Markov models (HMMs) with a data-driven design in terms of number of states and topology (e.g., left-right versus ergodic). Experiments using a database of over 5700 complex trajectories (obtained from UCI-KDD data archives and Columbia University Multimedia Group) subdivided into 85 different classes demonstrate the superiority of our proposed HMM-based scheme using PCA coefficients of subtrajectories in comparison with other techniques in the literature.

Journal ArticleDOI
TL;DR: A rigorous Bayesian framework is proposed for which it is proved asymptotic consistency of the maximum a posteriori estimate and which leads to an effective iterative estimation algorithm of the geometric and photometric parameters in the small sample setting.
Abstract: Summary. The problem of estimating probabilistic deformable template models in the field of computer vision or of probabilistic atlases in the field of computational anatomy has not yet received a coherent statistical formulation and remains a challenge. We provide a careful definition and analysis of a well-defined statistical model based on dense deformable templates for grey level images of deformable objects. We propose a rigorous Bayesian framework for which we prove asymptotic consistency of the maximum a posteriori estimate and which leads to an effective iterative estimation algorithm of the geometric and photometric parameters in the small sample setting. The model is extended to mixtures of finite numbers of such components leading to a fine description of the photometric and geometric variations of an object class. We illustrate some of the ideas with images of handwritten digits and apply the estimated models to classification through maximum likelihood.

Journal ArticleDOI
TL;DR: A Bayesian non‐parametric approach is taken and adopt a hierarchical model with a suitable non-parametric prior obtained from a generalized gamma process to solve the problem of determining the number of components in a mixture model.
Abstract: Summary. The paper deals with the problem of determining the number of components in a mixture model. We take a Bayesian non-parametric approach and adopt a hierarchical model with a suitable non-parametric prior for the latent structure. A commonly used model for such a problem is the mixture of Dirichlet process model. Here, we replace the Dirichlet process with a more general non-parametric prior obtained from a generalized gamma process. The basic feature of this model is that it yields a partition structure for the latent variables which is of Gibbs type. This relates to the well-known (exchangeable) product partition models. If compared with the usual mixture of Dirichlet process model the advantage of the generalization that we are examining relies on the availability of an additional parameter σ belonging to the interval (0,1): it is shown that such a parameter greatly influences the clustering behaviour of the model. A value of σ that is close to 1 generates a large number of clusters, most of which are of small size. Then, a reinforcement mechanism which is driven by σ acts on the mass allocation by penalizing clusters of small size and favouring those few groups containing a large number of elements. These features turn out to be very useful in the context of mixture modelling. Since it is difficult to specify a priori the reinforcement rate, it is reasonable to specify a prior for σ. Hence, the strength of the reinforcement mechanism is controlled by the data.

Journal ArticleDOI
TL;DR: A penalized likelihood approach for variable selection in FMR models is introduced that introduces penalties that depend on the size of the regression coefficients and the mixture structure and requires much less computing power than existing methods.
Abstract: In the applications of finite mixture of regression (FMR) models, often many covariates are used, and their contributions to the response variable vary from one component to another of the mixture model. This creates a complex variable selection problem. Existing methods, such as the Akaike information criterion and the Bayes information criterion, are computationally expensive as the number of covariates and components in the mixture model increases. In this article we introduce a penalized likelihood approach for variable selection in FMR models. The new method introduces penalties that depend on the size of the regression coefficients and the mixture structure. The new method is shown to be consistent for variable selection. A data-adaptive method for selecting tuning parameters and an EM algorithm for efficient numerical computations are developed. Simulations show that the method performs very well and requires much less computing power than existing methods. The new method is illustrated by analyzin...

Book ChapterDOI
02 Apr 2007
TL;DR: This paper proposes and develops a general probabilistic framework for studying expert finding problem and derive two families of generative models (candidate generation models and topic generation models) from the framework that subsume most existing language models proposed for expert finding.
Abstract: A common task in many applications is to find persons who are knowledgeable about a given topic (i.e., expert finding). In this paper, we propose and develop a general probabilistic framework for studying expert finding problem and derive two families of generative models (candidate generation models and topic generation models) from the framework. These models subsume most existing language models proposed for expert finding. We further propose several techniques to improve the estimation of the proposed models, including incorporating topic expansion, using a mixture model to model candidate mentions in the supporting documents, and defining an email count-based prior in the topic generation model. Our experiments show that the proposed estimation strategies are all effective to improve retrieval accuracy.

Journal Article
TL;DR: In this paper, the problem of analyzing a mixture of skew nor-mal distributions from the likelihood-based and Bayesian perspectives is addressed, and a fully Bayesian approach using the Markov chain Monte Carlo method is developed to carry out posterior analyses.
Abstract: Normal mixture models provide the most popular framework for mod- elling heterogeneity in a population with continuous outcomes arising in a variety of subclasses. In the last two decades, the skew normal distribution has been shown beneficial in dealing with asymmetric data in various theoretic and applied prob- lems. In this article, we address the problem of analyzing a mixture of skew nor- mal distributions from the likelihood-based and Bayesian perspectives, respectively. Computational techniques using EM-type algorithms are employed for iteratively computing maximum likelihood estimates. Also, a fully Bayesian approach using the Markov chain Monte Carlo method is developed to carry out posterior analyses. Numerical results are illustrated through two examples.

Proceedings Article
06 Jan 2007
TL;DR: A number of variational Bayesian approximations to the Dirichlet process (DP) mixture model are studied and a novel collapsed VB approximation where mixture weights are marginalized out is considered.
Abstract: Nonparametric Bayesian mixture models, in particular Dirichlet process (DP) mixture models, have shown great promise for density estimation and data clustering. Given the size of today's datasets, computational efficiency becomes an essential ingredient in the applicability of these techniques to real world data. We study and experimentally compare a number of variational Bayesian (VB) approximations to the DP mixture model. In particular we consider the standard VB approximation where parameters are assumed to be independent from cluster assignment variables, and a novel collapsed VB approximation where mixture weights are marginalized out. For both VB approximations we consider two different ways to approximate the DP, by truncating the stick-breaking construction, and by using a finite mixture model with a symmetric Dirichlet prior.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a new distribution for local precipitation via a probability mixture model of Gamma and Generalized Pareto (GP) distributions, which was tested on real and simulated data, and also compared to classical rainfall densities.
Abstract: [1] Downscaling precipitation is a difficult challenge for the climate community. We propose and study a new stochastic weather typing approach to perform such a task. In addition to providing accurate small and medium precipitation, our procedure possesses built-in features that allow us to model adequately extreme precipitation distributions. First, we propose a new distribution for local precipitation via a probability mixture model of Gamma and Generalized Pareto (GP) distributions. The latter one stems from Extreme Value Theory (EVT). The performance of this mixture is tested on real and simulated data, and also compared to classical rainfall densities. Then our downscaling method, extending the recently developed nonhomogeneous stochastic weather typing approach, is presented. It can be summarized as a three-step program. First, regional weather precipitation patterns are constructed through a hierarchical ascending clustering method. Second, daily transitions among our precipitation patterns are represented by a nonhomogeneous Markov model influenced by large-scale atmospheric variables like NCEP reanalyses. Third, conditionally on these regional patterns, precipitation occurrence and intensity distributions are modeled as statistical mixtures. Precipitation amplitudes are assumed to follow our mixture of Gamma and GP densities. The proposed downscaling approach is applied to 37 weather stations in Illinois and compared to various possible parameterizations and to a direct modeling. Model selection procedures show that choosing one GP distribution shape parameter per pattern for all stations provides the best rainfall representation amongst all tested models. This work highlights the importance of EVT distributions to improve the modeling and downscaling of local extreme precipitations.

Journal ArticleDOI
TL;DR: The proposed method was successfully applied to Brodatz mosaic image segmentation and fabric defect detection and can be expanded to an unsupervised texture segmentation using a Kullback-Leibler divergence between two Gaussian mixtures.

Proceedings ArticleDOI
14 May 2007
TL;DR: This work contributes an approach for interactive policy learning through expert demonstration that allows an agent to actively request and effectively represent demonstration examples, and introduces the confident execution approach, which focuses learning on relevant parts of the domain by enabling the agent to identify the need for and request demonstrations for specific part of the state space.
Abstract: We contribute an approach for interactive policy learning through expert demonstration that allows an agent to actively request and effectively represent demonstration examples. In order to address the inherent uncertainty of human demonstration, we represent the policy as a set of Gaussian mixture models (GMMs), where each model, with multiple Gaussian components, corresponds to a single action. Incrementally received demonstration examples are used as training data for the GMM set. We then introduce our confident execution approach, which focuses learning on relevant parts of the domain by enabling the agent to identify the need for and request demonstrations for specific parts of the state space. The agent selects between demonstration and autonomous execution based on statistical analysis of the uncertainty of the learned Gaussian mixture set. As it achieves proficiency at its task and gains confidence in its actions, the agent operates with increasing autonomy, eliminating the need for unnecessary demonstrations of already acquired behavior, and reducing both the training time and the demonstration workload of the expert. We validate our approach with experiments in simulated and real robot domains.

Journal ArticleDOI
John Geweke1
TL;DR: The mixture model likelihood function is invariant with respect to permutation of the components of the mixture, and simple and widely used Markov chain Monte Carlo algorithms with data augmentation reliably recover the entire posterior distribution.

Journal ArticleDOI
TL;DR: This article proposes a robust mixture framework based on the skew t distribution to efficiently deal with heavy-tailedness, extra skewness and multimodality in a wide range of settings and presents analytically simple EM-type algorithms for iteratively computing maximum likelihood estimates.
Abstract: A finite mixture model using the Student's t distribution has been recognized as a robust extension of normal mixtures. Recently, a mixture of skew normal distributions has been found to be effective in the treatment of heterogeneous data involving asymmetric behaviors across subclasses. In this article, we propose a robust mixture framework based on the skew t distribution to efficiently deal with heavy-tailedness, extra skewness and multimodality in a wide range of settings. Statistical mixture modeling based on normal, Student's t and skew normal distributions can be viewed as special cases of the skew t mixture model. We present analytically simple EM-type algorithms for iteratively computing maximum likelihood estimates. The proposed methodology is illustrated by analyzing a real data example.

Book ChapterDOI
20 Oct 2007
TL;DR: Two methods to improve on the base system by replacing the SVM with a hybrid SVM/Hidden Markov Model (HMM) classifier to model time in the classifier and showing that both techniques contribute to an improved classification accuracy are proposed.
Abstract: The analysis of facial expression temporal dynamics is of great importance for many real-world applications. Being able to automatically analyse facial muscle actions (Action Units, AUs) in terms of recognising their neutral, onset, apex and offset phases would greatly benefit application areas as diverse as medicine, gaming and security. The base system in this paper uses Support Vector Machines (SVMs) and a set of simple geometrical features derived from automatically detected and tracked facial feature point data to segment a facial action into its temporal phases. We propose here two methods to improve on this base system in terms of classification accuracy. The first technique describes the original time-independent set of features over a period of time using polynomial parametrisation. The second technique replaces the SVM with a hybrid SVM/Hidden Markov Model (HMM) classifier to model time in the classifier. Our results show that both techniques contribute to an improved classification accuracy. Modeling the temporal dynamics by the hybrid SVM-HMM classifier attained a statistically significant increase of recall and precision by 4.5% and 7.0%, respectively.