scispace - formally typeset
Search or ask a question

Showing papers on "Mixture model published in 2009"


01 Jan 2009
TL;DR: Gaussian Mixture Model parameters are estimated from training data using the iterative Expectation-Maximization (EM) algorithm or Maximum A Posteriori (MAP) estimation from a well-trained prior model.
Abstract: Definition A Gaussian Mixture Model (GMM) is a parametric probability density function represented as a weighted sum of Gaussian component densities. GMMs are commonly used as a parametric model of the probability distribution of continuous measurements or features in a biometric system, such as vocal-tract related spectral features in a speaker recognition system. GMM parameters are estimated from training data using the iterative Expectation-Maximization (EM) algorithm or Maximum A Posteriori (MAP) estimation from a well-trained prior model.

1,323 citations


Book
01 Jan 2009
TL;DR: The Rasch Models for Ordered Polytomous Data and the Generalized Partial Credit Model: Conceptual Development of the Multiple-Choice Model, and Issues to Consider in Selecting among the 1PL, 2PL, and 3PL Models.
Abstract: Symbols and Acronyms Part 1 Introduction to Measurement Measurement Some Measurement Issues Item Response Theory Classical Test Theory Latent Class Analysis Summary Part 2 The One-Parameter Model Conceptual Development of the Rasch Model The One-Parameter Model The One-Parameter Logistic Model and the Rasch Model Assumptions underlying the Model An Empirical Data Set: The Mathematics Data Set Conceptually Estimating an Individual's Location Some Pragmatic Characteristics of Maximum Likelihood Estimates The Standard Error of Estimate and Information An Instrument's Estimation Capacity Summary Part 3 Joint Maximum Likelihood Parameter Estimation Joint Maximum Likelihood Estimation Indeterminacy of Parameter Estimates How Large a Calibration Sample? Example: Application of the Rasch Model to the Mathematics Data, JMLE Summary Part 4 Marginal Maximum Likelihood Parameter Estimation Marginal Maximum Likelihood Estimation Estimating an Individual's Location: Expected A Posteriori Example: Application of the Rasch Model to the Mathematics Data, MMLE Metric Transformation and the Total Characteristic Function Summary Part 5 The Two-Parameter Model Conceptual Development of the Two-Parameter Model Information for the Two-Parameter Model Conceptual Parameter Estimation for the 2PL Model How Large a Calibration Sample? Metric Transformation, 2PL Model Example: Application of the 2PL Model to the Mathematics Data, MMLE Information and Relative Efficiency Summary Part 6 The Three-Parameter Model Conceptual Development of the Three-Parameter Model Additional Comments about the Pseudo-Guessing Parameter Conceptual Estimation for the 3PL Model How Large a Calibration Sample? Assessing Conditional Independence Example: Application of the 3PL Model to the Mathematics Data, MMLE Assessing Person Fit: Appropriateness Measurement Information for the Three-Parameter Model Metric Transformation, 3PL Model Handling Missing Responses Issues to Consider in Selecting among the 1PL, 2PL, and 3PL Models Summary Part 7 Rasch Models for Ordered Polytomous Data Conceptual Development of the Partial Credit Model Conceptual Parameter Estimation of the PC Model Example: Application of the PC Model to a Reasoning Ability Instrument, MMLE The Rating Scale Model Conceptual Estimation of the RS Model Example: Application of the RS Model to an Attitudes toward Condom Scale, JMLE How Large a Calibration Sample? Information for the PC and RS Models Metric Transformation, PC and RS Models Summary Part 8 Non-Rasch Models for Ordered Polytomous Data The Generalized Partial Credit Model Example: Application of the GPC Model to a Reasoning Ability Instrument, MMLE Conceptual Development of the Graded Response Model How Large a Calibration Sample? Example: Application of the GR Model to an Attitudes toward Condom Scale, MMLE Information for Graded Data Metric Transformation, GPC and GR Models Summary Part 9 Models for Nominal Polytomous Data Conceptual Development of the Nominal Response Model How Large a Calibration Sample? Example: Application of the NR Model to a Science Test, MMLE Example: Mixed Model Calibration of the Science Test-NR and PC Models, MMLE Example: NR and PC Mixed Model Calibration of the Science Test, Collapsed Options, MMLE Information for the NR Model Metric Transformation, NR Model Conceptual Development of the Multiple-Choice Model Example: Application of the MC Model to a Science Test, MMLE Example: Application of the BS Model to a Science Test, MMLE Summary Part 10 Models for Multidimensional Data Conceptual Development of a Multidimensional IRT Model Multidimensional Item Location and Discrimination Item Vectors and Vector Graphs The Multidimensional Three-Parameter Logistic Model Assumptions of the MIRT Model Estimation of the M2PL Model Information for the M2PL Model Indeterminacy in MIRT Metric Transformation, M2PL Model Example: Application of the M2PL Model, Normal-Ogive Harmonic Analysis Robust Method Obtaining Person Location Estimates Summary Part 11 Linking and Equating Equating Defined Equating: Data Collection Phase Equating: Transformation Phase Example: Application of the Total Characteristic Function Equating Summary Part 12 Differential Item Functioning Differential Item Functioning and Item Bias Mantel-Haenszel Chi-Square The TSW Likelihood Ratio Test Logistic Regression Example: DIF Analysis Summary Appendix A: Maximum Likelihood Estimation of Person Locations Estimating an Individual's Location: Empirical Maximum Likelihood Estimation Estimating an Individual's Location: Newton's Method for MLE Revisiting Zero Variance Binary Response Patterns Appendix B: Maximum Likelihood Estimation of Item Locations Appendix C: The Normal Ogive Models Conceptual Development of the Normal Ogive Model The Relationship between IRT Statistics and Traditional Item Analysis Indices Relationship of the Two-Parameter Normal Ogive and Logistic Models Extending the Two-Parameter Normal Ogive Model to a Multidimensional Space Appendix D: Computerized Adaptive Testing A Brief History Fixed-Branching Techniques Variable-Branching Techniques Advantages of Variable-Branching over Fixed-Branching Methods IRT-Based Variable-Branching Adaptive Testing Algorithm Appendix E Miscellanea Linear Logistic Test Model (LLTM) Using Principal Axis for Estimating Item Discrimination Infinite Item Discrimination Parameter Estimates Example: NOHARM Unidimensional Calibration An Approximate Chi-Square Statistic for NOHARM Mixture Models Relative Efficiency, Monotonicity, and Information FORTRAN Formats Example: Mixed Model Calibration of the Science Test-NR and 2PL Models, MMLE Example: Mixed Model Calibration of the Science Test-NR and GR Models, MMLE Odds, Odds Ratios, and Logits The Person Response Function Linking: A Temperature Analogy Example Should DIF Analyses Be Based on Latent Classes? The Separation and Reliability Indices Dependency in Traditional Item Statistics and Observed Scores

1,296 citations


Journal ArticleDOI
TL;DR: The mixtools package for R provides a set of functions for analyzing a variety of finite mixture models, which include both traditional methods, such as EM algorithms for univariate and multivariate normal mixtures, and newer methods that reflect some recent research in finite mixture Models.
Abstract: The mixtools package for R provides a set of functions for analyzing a variety of nite mixture models. These functions include both traditional methods, such as EM algorithms for univariate and multivariate normal mixtures, and newer methods that reect some recent research in nite mixture models. In the latter category, mixtools provides algorithms for estimating parameters in a wide range of dierent mixture-of-regression contexts, in multinomial mixtures such as those arising from discretizing continuous multivariate data, in nonparametric situations where the multivariate component densities are completely unspecied, and in semiparametric situations such as a univariate location mixture of symmetric but otherwise unspecied densities. Many of the algorithms of the mixtools package are EM algorithms or are based on EM-like ideas, so this article includes an overview of EM algorithms for nite mixture models.

1,079 citations


Book
28 Apr 2009
TL;DR: The model Likelihood evaluation Parameter estimation by maximum likelihood Model checking Inferring the underlying state Models for a heterogeneous group of subjects Other modifications or extensions Application to caterpillar feeding behavior appear at the end of most chapters.
Abstract: MODEL STRUCTURE, PROPERTIES, AND METHODS Mixture Distributions and Markov Chains Introduction Independent mixture models Markov chains Hidden Markov Models: Definition and Properties A simple hidden Markov model The basics The likelihood Estimation by Direct Maximization of the Likelihood Introduction Scaling the likelihood computation Maximization subject to constraints Other problems Example: earthquakes Standard errors and confidence intervals Example: parametric bootstrap Estimation by the EM Algorithm Forward and backward probabilities The EM algorithm Examples of EM applied to Poisson HMMs Discussion Forecasting, Decoding, and State Prediction Conditional distributions Forecast distributions Decoding State prediction Model Selection and Checking Model selection by AIC and BIC Model checking with pseudo-residuals Examples Discussion Bayesian Inference for Poisson HMMs Applying the Gibbs sampler to Poisson HMMs Bayesian estimation of the number of states Example: earthquakes Discussion Extensions of the Basic Hidden Markov Model Introduction HMMs with general univariate state-dependent distribution HMMs based on a second-order Markov chain HMMs for multivariate series Series which depend on covariates Models with additional dependencies APPLICATIONS Epileptic Seizures Introduction Models fitted Model checking by pseudo-residuals Eruptions of the Old Faithful Geyser Introduction Binary time series of short and long eruptions Normal HMMs for durations and waiting times Bivariate model for durations and waiting times Drosophila Speed and Change of Direction Introduction Von Mises distributions Von Mises HMMs for the two subjects Circular autocorrelation functions Bivariate model Wind Direction at Koeberg Introduction Wind direction as classified into 16 categories Wind direction as a circular variable Models for Financial Series Thinly traded shares Multivariate HMM for returns on four shares Stochastic volatility models Births at Edendale Hospital Introduction Models for the proportion Caesarean Models for the total number of deliveries Conclusion Cape Town Homicides and Suicides Introduction Firearm homicides as a proportion of all homicides, suicides, and legal intervention homicides The number of firearm homicides Firearm homicide and suicide proportions Proportion in each of the five categories Animal-Behavior Model with Feedback Introduction The model Likelihood evaluation Parameter estimation by maximum likelihood Model checking Inferring the underlying state Models for a heterogeneous group of subjects Other modifications or extensions Application to caterpillar feeding behavior Discussion Appendix A: Examples of R code Stationary Poisson HMM, numerical maximization More on Poisson HMMs, including EM Bivariate normal state-dependent distributions Categorical HMM, constrained optimization Appendix B: Some Proofs Factorization needed for forward probabilities Two results for backward probabilities Conditional independence of Xt1 and XTt+1 References Author Index Subject Index Exercises appear at the end of most chapters.

876 citations


Journal ArticleDOI
TL;DR: A novel unsupervised learning framework to model activities and interactions in crowded and complicated scenes with many kinds of activities co-occurring, and three hierarchical Bayesian models are proposed that advance existing language models, such as LDA and HDP.
Abstract: We propose a novel unsupervised learning framework to model activities and interactions in crowded and complicated scenes. Hierarchical Bayesian models are used to connect three elements in visual surveillance: low-level visual features, simple "atomic" activities, and interactions. Atomic activities are modeled as distributions over low-level visual features, and multi-agent interactions are modeled as distributions over atomic activities. These models are learnt in an unsupervised way. Given a long video sequence, moving pixels are clustered into different atomic activities and short video clips are clustered into different interactions. In this paper, we propose three hierarchical Bayesian models, Latent Dirichlet Allocation (LDA) mixture model, Hierarchical Dirichlet Process (HDP) mixture model, and Dual Hierarchical Dirichlet Processes (Dual-HDP) model. They advance existing language models, such as LDA [1] and HDP [2]. Our data sets are challenging video sequences from crowded traffic scenes and train station scenes with many kinds of activities co-occurring. Without tracking and human labeling effort, our framework completes many challenging visual surveillance tasks of board interest such as: (1) discovering typical atomic activities and interactions; (2) segmenting long video sequences into different interactions; (3) segmenting motions into different activities; (4) detecting abnormality; and (5) supporting high-level queries on activities and interactions.

522 citations


Journal ArticleDOI
TL;DR: Two inherent characteristics of hyperspectral data, piecewise smoothness of spectral data and sparseness of abundance fraction of every material, are introduced to nonnegative matrix factorization (NMF) and the gradient-based optimization algorithm is presented and the monotonic convergence of the algorithm is proved.
Abstract: Hyperspectral unmixing is a process to identify the constituent materials and estimate the corresponding fractions from the mixture. During the last few years, nonnegative matrix factorization (NMF), as a suitable candidate for the linear spectral mixture model, has been applied to unmix hyperspectral data. Unfortunately, the local minima caused by the nonconvexity of the objective function makes the solution nonunique, thus only the nonnegativity constraint is not sufficient enough to lead to a well-defined problem. Therefore, in this paper, two inherent characteristics of hyperspectral data, piecewise smoothness (both temporal and spatial) of spectral data and sparseness of abundance fraction of every material, are introduced to NMF. The adaptive potential function from discontinuity adaptive Markov random field model is used to describe the smoothness constraint while preserving discontinuities in spectral data. At the same time, two NMF algorithms, nonsmooth NMF and NMF with sparseness constraint, are used to quantify the degree of sparseness of material abundances. A gradient-based optimization algorithm is presented, and the monotonic convergence of the algorithm is proved. Three important facts are exploited in our method: First, both the spectra and abundances are nonnegative; second, the variation of the material spectra and abundance images is piecewise smooth in wavelength and spatial spaces, respectively; third, the abundance distribution of each material is almost sparse in the scene. Experiments using synthetic and real data demonstrate that the proposed algorithm provides an effective unsupervised technique for hyperspectral unmixing.

389 citations


Proceedings ArticleDOI
01 Dec 2009
TL;DR: An unsupervised learning framework is presented to address the problem of detecting spoken keywords by using segmental dynamic time warping to compare the Gaussian posteriorgrams between keyword samples and test utterances and obtaining the keyword detection result.
Abstract: In this paper, we present an unsupervised learning framework to address the problem of detecting spoken keywords. Without any transcription information, a Gaussian Mixture Model is trained to label speech frames with a Gaussian posteriorgram. Given one or more spoken examples of a keyword, we use segmental dynamic time warping to compare the Gaussian posteriorgrams between keyword samples and test utterances. The keyword detection result is then obtained by ranking the distortion scores of all the test utterances. We examine the TIMIT corpus as a development set to tune the parameters in our system, and the MIT Lecture corpus for more substantial evaluation. The results demonstrate the viability and effectiveness of our unsupervised learning framework on the keyword spotting task.

350 citations


01 Jan 2009
TL;DR: This dissertation extends the original two-dimensional NDT registration algorithm of Biber and Straser to 3D and introduces a number of improvements and proposes to use a combination of local visual features and Colour-NDT for robust registration of coloured 3D scans.
Abstract: This dissertation is concerned with three-dimensional (3D) sensing and 3D scan representation. Three-dimensional records are important tools in several disciplines; such as medical imaging, archaeology, and mobile robotics. This dissertation proposes the normal-distributions transform, NDT, as a general 3D surface representation with applications in scan registration, localisation, loop detection, and surface-structure analysis. After applying NDT, the surface is represented by a smooth function with analytic derivatives. This representation has several attractive properties. The smooth function representation makes it possible to use standard numerical optimisation methods, such as Newton’s method, for 3D registration. This dissertation extends the original two-dimensional NDT registration algorithm of Biber and Straser to 3D and introduces a number of improvements. The 3D-NDT scan-registration algorithm is compared to current de facto standard registration algorithms. 3D-NDT scan registration with the proposed extensions is shown to be more robust, more accurate, and faster than the popular ICP algorithm. An additional benefit is that 3D-NDT registration provides a confidence measure of the result with little additional effort. Furthermore, a kernel-based extension to 3D-NDT for registering coloured data is proposed. Approaches based on local visual features typically use only a small fraction of the available 3D points for registration. In contrast, Colour-NDT uses all of the available 3D data. The dissertation proposes to use a combination of local visual features and Colour-NDT for robust registration of coloured 3D scans. Also building on NDT, a novel approach using 3D laser scans to perform appearance-based loop detection for mobile robots is proposed. Loop detection is an importantproblem in the SLAM (simultaneous localisation and mapping) domain. The proposed approach uses only the appearance of 3D point clouds to detect loops and requires nopose information. It exploits the NDT surface representation to create histograms based on local surface orientation and smoothness. The surface-shape histograms compress the input data by two to three orders of magnitude. Because of the high compression rate, the histograms can be matched efficiently to compare the appearance of two scans. Rotation invariance is achieved by aligning scans with respect to dominant surface orientations. In order to automatically determine the threshold that separates scans at loop closures from nonoverlapping ones, the proposed approach uses expectation maximisation to fit a Gamma mixture model to the output similarity measures. In order to enable more high-level tasks, it is desirable to extract semantic information from 3D models. One important task where such 3D surface analysis is useful is boulder detection for mining vehicles. This dissertation presents a method, also inspired by NDT, that provides clues as to where the pile is, where the bucket should be placed for loading, and where there are obstacles. The points of 3D point clouds are classified based on the surrounding surface roughness and orientation. Other potential applications include extraction of drivable paths over uneven surfaces.

313 citations


Journal ArticleDOI
TL;DR: In this paper, a general approach for establishing identifiability of hidden class models utilizing algebraic arguments is presented. But this approach is restricted to a class of models, such as mixtures of both finite and nonparametric product distributions.
Abstract: While hidden class models of various types arise in many statistical applications, it is often difficult to establish the identifiability of their parameters. Focusing on models in which there is some structure of independence of some of the observed variables conditioned on hidden ones, we demonstrate a general approach for establishing identifiability utilizing algebraic arguments. A theorem of J. Kruskal for a simple latent-class model with finite state space lies at the core of our results, though we apply it to a diverse set of models. These include mixtures of both finite and nonparametric product distributions, hidden Markov models and random graph mixture models, and lead to a number of new results and improvements to old ones. In the parametric setting, this approach indicates that for such models, the classical definition of identifiability is typically too strong. Instead generic identifiability holds, which implies that the set of nonidentifiable parameters has measure zero, so that parameter inference is still meaningful. In particular, this sheds light on the properties of finite mixtures of Bernoulli products, which have been used for decades despite being known to have nonidentifiable parameters. In the nonparametric setting, we again obtain identifiability only when certain restrictions are placed on the distributions that are mixed, but we explicitly describe the conditions.

308 citations


Book ChapterDOI
Peter Schlattmann1
01 Jan 2009

305 citations


Posted Content
TL;DR: The distance dependent Chinese restaurant process (CRP) as mentioned in this paper is a flexible class of distributions over partitions that allows for non-exchangeability and can be used to model many kinds of dependencies between data in infinite clustering models.
Abstract: We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for non-exchangeability. This class can be used to model many kinds of dependencies between data in infinite clustering models, including dependencies across time or space. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both observed and mixture settings. We study its performance with three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data. We also show its alternative formulation of the traditional CRP leads to a faster-mixing Gibbs sampling algorithm than the one based on the original formulation.

Journal ArticleDOI
TL;DR: The results show that the nonlinear model of spectral unmixing outperforms the linear model, especially in the scenes with translucent crown on a white background.
Abstract: The spectral unmixing of mixed pixels is a key factor in remote sensing images, especially for hyperspectral imagery. A commonly used approach to spectral unmixing has been linear unmixing. However, the question of whether linear or nonlinear processes dominate spectral signatures of mixed pixels is still an unresolved matter. In this study, we put forward a new nonlinear model for inferring end-member fractions within hyperspectral scenes. This study focuses on comparing the nonlinear model with a linear model. A detail comparative analysis of the fractions 'sunlit crown', 'sunlit background' and 'shadow' between the two methods was carried out through visualization, and comparing with supervised classification using a database of laboratory simulated-forest scenes. Our results show that the nonlinear model of spectral unmixing outperforms the linear model, especially in the scenes with translucent crown on a white background. A nonlinear mixture model is needed to account for the multiple scattering between tree crowns and background.

Journal ArticleDOI
TL;DR: This article develops a nonparametric Bayes approach, which defines a prior with full support on the space of distributions for multiple unordered categorical variables, and shows this can be accomplished through a Dirichlet process mixture of product multinomial distributions, which is also a convenient form for posterior computation.
Abstract: Modeling of multivariate unordered categorical (nominal) data is a challenging problem, particularly in high dimensions and cases in which one wishes to avoid strong assumptions about the dependence structure. Commonly used approaches rely on the incorporation of latent Gaussian random variables or parametric latent class models. The goal of this article is to develop a nonparametric Bayes approach, which defines a prior with full support on the space of distributions for multiple unordered categorical variables. This support condition ensures that we are not restricting the dependence structure a priori. We show this can be accomplished through a Dirichlet process mixture of product multinomial distributions, which is also a convenient form for posterior computation. Methods for nonparametric testing of violations of independence are proposed, and the methods are applied to model positional dependence within transcription factor binding motifs.

Journal ArticleDOI
TL;DR: A model generalizing the model of Raftery and Dean (2006) is proposed to specify the role of each variable, which does not need any prior assumptions about the linear link between the selected and discarded variables.
Abstract: This article is concerned with variable selection for cluster analysis. The problem is regarded as a model selection problem in the model-based cluster analysis context. A model generalizing the model of Raftery and Dean (2006, Journal of the American Statistical Association 101, 168-178) is proposed to specify the role of each variable. This model does not need any prior assumptions about the linear link between the selected and discarded variables. Models are compared with Bayesian information criterion. Variable role is obtained through an algorithm embedding two backward stepwise algorithms for variable selection for clustering and linear regression. The model identifiability is established and the consistency of the resulting criterion is proved under regularity conditions. Numerical experiments on simulated datasets and a genomic application highlight the interest of the procedure.

Journal ArticleDOI
TL;DR: The authors present several currently available modeling options, all of which make appropriate distributional assumptions for the observed categorical data, and describe a longitudinal latent class analysis, which requires fewer assumptions than the first 3.
Abstract: Analyzing problem-behavior trajectories can be difficult. The data are generally categorical and often quite skewed, violating distributional assumptions of standard normal-theory statistical models. In this article, the authors present several currently available modeling options, all of which make appropriate distributional assumptions for the observed categorical data. Three are based on the generalized linear model: a hierarchical generalized linear model, a growth mixture model, and a latent class growth analysis. They also describe a longitudinal latent class analysis, which requires fewer assumptions than the first 3. Finally, they illustrate all of the models using actual longitudinal adolescent alcohol-use data. They guide the reader through the model-selection process, comparing the results in terms of convergence properties, fit and residuals, parsimony, and interpretability. Advances in computing and statistical software have made the tools for these types of analyses readily accessible to most researchers. Using appropriate models for categorical data will lead to more accurate and reliable results, and their application in real data settings could contribute to substantive advancements in the field of development and the science of prevention.

Journal ArticleDOI
TL;DR: A comprehensive Bayesian non‐parametric analysis of random probabilities which are obtained by normalizing random measures with independent increments (NRMI), which allows to derive a generalized Blackwell–MacQueen sampling scheme, which is then adapted to cover also mixture models driven by general NRMIs.
Abstract: . One of the main research areas in Bayesian Nonparametrics is the proposal and study of priors which generalize the Dirichlet process. In this paper, we provide a comprehensive Bayesian non-parametric analysis of random probabilities which are obtained by normalizing random measures with independent increments (NRMI). Special cases of these priors have already shown to be useful for statistical applications such as mixture models and species sampling problems. However, in order to fully exploit these priors, the derivation of the posterior distribution of NRMIs is crucial: here we achieve this goal and, indeed, provide explicit and tractable expressions suitable for practical implementation. The posterior distribution of an NRMI turns out to be a mixture with respect to the distribution of a specific latent variable. The analysis is completed by the derivation of the corresponding predictive distributions and by a thorough investigation of the marginal structure. These results allow to derive a generalized Blackwell–MacQueen sampling scheme, which is then adapted to cover also mixture models driven by general NRMIs.

Proceedings ArticleDOI
02 Nov 2009
TL;DR: This work systematically compares five representative state-of-the-art methods for estimating query language models with pseudo feedback in ad hoc information retrieval, and proposes several heuristics that are intuitively related to the good retrieval performance of an estimation method.
Abstract: We systematically compare five representative state-of-the-art methods for estimating query language models with pseudo feedback in ad hoc information retrieval, including two variants of the relevance language model, two variants of the mixture feedback model, and the divergence minimization estimation method Our experiment results show that a variant of relevance model and a variant of the mixture model tend to outperform other methods We further propose several heuristics that are intuitively related to the good retrieval performance of an estimation method, and show that the variations in how these heuristics are implemented in different methods provide a good explanation of many empirical observations

Journal ArticleDOI
TL;DR: A statistical framework for the word-spotting problem which employs hidden Markov models (HMMs) to model keywords and a Gaussian mixture model (GMM) for score normalization is introduced.

Journal ArticleDOI
TL;DR: A novel batch process monitoring approach to handle batch processes with multiple operation phases by combining the Gaussian mixture model (GMM) with hybrid unfolding of a multiway data matrix to partition all the sampling points into different clusters.
Abstract: A novel batch process monitoring approach is proposed in this article to handle batch processes with multiple operation phases. The basic idea is to combine the Gaussian mixture model (GMM) with hybrid unfolding of a multiway data matrix to partition all the sampling points into different clusters. Then, two sequential cluster alignments are used to adjust clusters so that each of them only contains consecutive sampling instants, and all the training batches at the same sampling time belong to the same cluster. The identified multiple clusters correspond to different operation phases in the batch process. Further, a localized probability index is defined to examine each sampling point of a monitored batch relative to its corresponding operation phase. Subsequently, the occurrence and duration of process faults can be detected in this way. The proposed batch monitoring approach is applied to a simulated penicillin fermentation process and compared with the conventional multiway principal component analysis...

Journal ArticleDOI
TL;DR: In this article, an alternative formulation that could be used for capturing heterogeneity in crash count models through the use of finite mixture regression models was proposed. But, the model is not suitable for highway safety studies, since the sources of dispersion can be varied and unknown to the transportation analysts.

Proceedings Article
06 Jul 2009
TL;DR: This paper presents a first attempt to evaluate two previously proposed methods for statistical anomaly detection in sea traffic, namely the Gaussian Mixture Model and the adaptive Kernel Density Estimator, and indicates that KDE more accurately captures finer details of normal data.
Abstract: This paper presents a first attempt to evaluate two previously proposed methods for statistical anomaly detection in sea traffic, namely the Gaussian Mixture Model (GMM) and the adaptive Kernel Density Estimator (KDE). A novel performance measure related to anomaly detection, together with an intermediate performance measure related to normalcy modeling, are proposed and evaluated using recorded AIS data of vessel traffic and simulated anomalous trajectories. The normalcy modeling evaluation indicates that KDE more accurately captures finer details of normal data. Yet, results from anomaly detection show no significant difference between the two techniques and the performance of both is considered suboptimal. Part of the explanation is that the methods are based on a rather artificial division of data into geographical cells. The paper therefore discusses other clustering approaches based on more informed features of data and more background knowledge regarding the structure and natural classes of the data.

Proceedings ArticleDOI
01 Sep 2009
TL;DR: A new image representation to capture both the appearance and spatial information for image classification applications is proposed and it is justified that the traditional histogram representation and the spatial pyramid matching are special cases of the hierarchical Gaussianization.
Abstract: In this paper, we propose a new image representation to capture both the appearance and spatial information for image classification applications First, we model the feature vectors, from the whole corpus, from each image and at each individual patch, in a Bayesian hierarchical framework using mixtures of Gaussians After such a hierarchical Gaussianization, each image is represented by a Gaussian mixture model (GMM) for its appearance, and several Gaussian maps for its spatial layout Then we extract the appearance information from the GMM parameters, and the spatial information from global and local statistics over Gaussian maps Finally, we employ a supervised dimension reduction technique called DAP (discriminant attribute projection) to remove noise directions and to further enhance the discriminating power of our representation We justify that the traditional histogram representation and the spatial pyramid matching are special cases of our hierarchical Gaussianization We compare our new representation with other approaches in scene classification, object recognition and face recognition, and our performance ranks among the top in all three tasks

Journal ArticleDOI
TL;DR: A robust salient region detection framework based on the color and orientation distribution in images is presented and is carried out on a large image database annotated with ldquoground-truthrdquo salient regions, provided by Microsoft Research Asia, which enables us to conduct robust objective level comparisons with other salient regions detection algorithms.
Abstract: We present a robust salient region detection framework based on the color and orientation distribution in images. The proposed framework consists of a color saliency framework and an orientation saliency framework. The color saliency framework detects salient regions based on the spatial distribution of the component colors in the image space and their remoteness in the color space. The dominant hues in the image are used to initialize an expectation-maximization (EM) algorithm to fit a Gaussian mixture model in the hue-saturation (H-S) space. The mixture of Gaussians framework in H-S space is used to compute the inter-cluster distance in the H-S domain as well as the relative spread among the corresponding colors in the spatial domain. Orientation saliency framework detects salient regions in images based on the global and local behavior of different orientations in the image. The oriented spectral information from the Fourier transform of the local patches in the image is used to obtain the local orientation histogram of the image. Salient regions are further detected by identifying spatially confined orientations and with the local patches that possess high orientation entropy contrast. The final saliency map is selected as either color saliency map or orientation saliency map by automatically identifying which of the maps leads to the correct identification of the salient region. The experiments are carried out on a large image database annotated with ldquoground-truthrdquo salient regions, provided by Microsoft Research Asia, which enables us to conduct robust objective level comparisons with other salient region detection algorithms.

Proceedings ArticleDOI
01 Sep 2009
TL;DR: This work presents an approach to visual tracking based on dividing a target into multiple regions, or fragments, represented by a Gaussian mixture model in a joint feature-spatial space, with each ellipsoid corresponding to a different fragment.
Abstract: We present an approach to visual tracking based on dividing a target into multiple regions, or fragments The target is represented by a Gaussian mixture model in a joint feature-spatial space, with each ellipsoid corresponding to a different fragment The fragments are automatically adapted to the image data, being selected by an efficient region-growing procedure and updated according to a weighted average of the past and present image statistics Modeling of target and background are performed in a Chan-Vese manner, using the framework of level sets to preserve accurate boundaries of the target The extracted target boundaries are used to learn the dynamic shape of the target over time, enabling tracking to continue under total occlusion Experimental results on a number of challenging sequences demonstrate the effectiveness of the technique

Journal ArticleDOI
TL;DR: In this paper, the EM-test for homogeneity is applied to finite normal mixtures, and it is shown that the limiting distribution is a simple function of the $0.5\chi^2_0+0.1$ and $2_1$ distributions when the mixing variances are equal but unknown.
Abstract: Normal mixture distributions are arguably the most important mixture models, and also the most technically challenging. The likelihood function of the normal mixture model is unbounded based on a set of random samples, unless an artificial bound is placed on its component variance parameter. Moreover, the model is not strongly identifiable so it is hard to differentiate between over dispersion caused by the presence of a mixture and that caused by a large variance, and it has infinite Fisher information with respect to mixing proportions. There has been extensive research on finite normal mixture models, but much of it addresses merely consistency of the point estimation or useful practical procedures, and many results require undesirable restrictions on the parameter space. We show that an EM-test for homogeneity is effective at overcoming many challenges in the context of finite normal mixtures. We find that the limiting distribution of the EM-test is a simple function of the $0.5\chi^2_0+0.5\chi^2_1$ and $\chi^2_1$ distributions when the mixing variances are equal but unknown and the $\chi^2_2$ when variances are unequal and unknown. Simulations show that the limiting distributions approximate the finite sample distribution satisfactorily. Two genetic examples are used to illustrate the application of the EM-test.

Proceedings ArticleDOI
01 Sep 2009
TL;DR: A novel variant of the RANSAC algorithm that is much more efficient, in particular when dealing with problems with low inlier ratios, and serves as a general framework that works well with three possible grouping strategies investigated in this paper, including a novel optical flow based clustering approach.
Abstract: We present a novel variant of the RANSAC algorithm that is much more efficient, in particular when dealing with problems with low inlier ratios Our algorithm assumes that there exists some grouping in the data, based on which we introduce a new binomial mixture model rather than the simple binomial model as used in RANSAC We prove that in the new model it is more efficient to sample data from a smaller numbers of groups and groups with more tentative correspondences, which leads to a new sampling procedure that uses progressive numbers of groups We demonstrate our algorithm on two classical geometric vision problems: wide-baseline matching and camera resectioning The experiments show that the algorithm serves as a general framework that works well with three possible grouping strategies investigated in this paper, including a novel optical flow based clustering approach The results show that our algorithm is able to achieve a significant performance gain compared to the standard RANSAC and PROSAC

Proceedings ArticleDOI
19 Apr 2009
TL;DR: A system that detects human falls in the home environment, distinguishing them from competing noise, by using only the audio signal from a single far-field microphone, using a Gaussian mixture model (GMM) supervector, whose Euclidean distance measures the pairwise difference between audio segments.
Abstract: We present a system that detects human falls in the home environment, distinguishing them from competing noise, by using only the audio signal from a single far-field microphone. The proposed system models each fall or noise segment by means of a Gaussian mixture model (GMM) supervector, whose Euclidean distance measures the pairwise difference between audio segments. A support vector machine built on a kernel between GMM supervectors is employed to classify audio segments into falls and various types of noise. Experiments on a dataset of human falls, collected as part of the Netcarity project, show that the method improves fall classification F-score to 67% from 59% of a baseline GMM classifier. The approach also effectively addresses the more difficult fall detection problem, where audio segment boundaries are unknown. Specifically, we employ it to reclassify confusable segments produced by a dynamic programming scheme based on traditional GMMs. Such post-processing improves a fall detection accuracy metric by 5% relative.

Proceedings ArticleDOI
20 Jun 2009
TL;DR: This paper presents an online statistical learning approach to model the background appearance variations under cast shadows based on the bi-illuminant dichromatic reflection model, and derives physics-based color features under the assumptions of constant ambient illumination and light sources with common spectral power distributions.
Abstract: Cast shadows induced by moving objects often cause serious problems to many vision applications. We present in this paper an online statistical learning approach to model the background appearance variations under cast shadows. Based on the bi-illuminant (i.e. direct light sources and ambient illumination) dichromatic reflection model, we derive physics-based color features under the assumptions of constant ambient illumination and light sources with common spectral power distributions. We first use one Gaussian mixture model (GMM) to learn the color features, which are constant regardless of the background surfaces or illuminant colors in a scene. Then, we build up one pixel based GMM for each pixel to learn the local shadow features. To overcome the slow convergence rate in the conventional GMM learning, we update the pixel-based GMMs through confidence-rated learning. The proposed method can rapidly learn model parameters in an unsupervised way and adapt to illumination conditions or environment changes. Furthermore, we demonstrate that our method is robust to scenes with few foreground activities and videos captured at low or unsteady frame rates.

Journal ArticleDOI
TL;DR: An unsupervised approach for feature selection and extraction in mixtures of generalized Dirichlet (GD) distributions that is able to extract independent and non-Gaussian features without loss of accuracy is presented.
Abstract: This paper presents an unsupervised approach for feature selection and extraction in mixtures of generalized Dirichlet (GD) distributions. Our method defines a new mixture model that is able to extract independent and non-Gaussian features without loss of accuracy. The proposed model is learned using the expectation-maximization algorithm by minimizing the message length of the data set. Experimental results show the merits of the proposed methodology in the categorization of object images.

Journal ArticleDOI
TL;DR: Using the generalized Bayesian theorem, an extension of Bayes' theorem in the belief function framework, a criterion generalizing the likelihood function is derived, allowing the ability of this approach to exploit partial information about class labels.