scispace - formally typeset
Search or ask a question

Showing papers on "Mixture model published in 1999"


Proceedings ArticleDOI
23 Jun 1999
TL;DR: This paper discusses modeling each pixel as a mixture of Gaussians and using an on-line approximation to update the model, resulting in a stable, real-time outdoor tracker which reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes.
Abstract: A common method for real-time segmentation of moving regions in image sequences involves "background subtraction", or thresholding the error between an estimate of the image without moving objects and the current image. The numerous approaches to this problem differ in the type of background model used and the procedure used to update the model. This paper discusses modeling each pixel as a mixture of Gaussians and using an on-line approximation to update the model. The Gaussian, distributions of the adaptive mixture model are then evaluated to determine which are most likely to result from a background process. Each pixel is classified based on whether the Gaussian distribution which represents it most effectively is considered part of the background model. This results in a stable, real-time outdoor tracker which reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes. This system has been run almost continuously for 16 months, 24 hours a day, through rain and snow.

7,660 citations


Journal ArticleDOI
TL;DR: PCA is formulated within a maximum likelihood framework, based on a specific form of gaussian latent variable model, which leads to a well-defined mixture model for probabilistic principal component analyzers, whose parameters can be determined using an expectation-maximization algorithm.
Abstract: Principal component analysis (PCA) is one of the most popular techniques for processing, compressing, and visualizing data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Therefore, previous attempts to formulate mixture models for PCA have been ad hoc to some extent. In this article, PCA is formulated within a maximum likelihood framework, based on a specific form of gaussian latent variable model. This leads to a well-defined mixture model for probabilistic principal component analyzers, whose parameters can be determined using an expectationmaximization algorithm. We discuss the advantages of this model in the context of clustering, density modeling, and local dimensionality reduction, and we demonstrate its application to image compression and handwritten digit recognition.

1,927 citations


Journal ArticleDOI
TL;DR: The research is motivated by a repeated measurement study using a random coefficient model to assess the influence of latent growth trajectory class membership on the probability of a binary disease outcome.
Abstract: Summary. This paper discusses the analysis of an extended finite mixture model where the latent classes corresponding to the mixture components for one set of observed variables influence a second set of observed variables. The research is motivated by a repeated measurement study using a random coefficient model to assess the influence of latent growth trajectory class membership on the probability of a binary disease outcome. More generally, this model can be seen as a combination of latent class modeling and conventional mixture modeling. The EM algorithm is used for estimation. As an illustration, a random-coefficient growth model for the prediction of alcohol dependence from three latent classes of heavy alcohol use trajectories among young adults is analyzed.

1,377 citations


Proceedings Article
29 Nov 1999
TL;DR: This paper presents an infinite Gaussian mixture model which neatly sidesteps the difficult problem of finding the "right" number of mixture components and uses an efficient parameter-free Markov Chain that relies entirely on Gibbs sampling.
Abstract: In a Bayesian mixture model it is not necessary a priori to limit the number of components to be finite. In this paper an infinite Gaussian mixture model is presented which neatly sidesteps the difficult problem of finding the "right" number of mixture components. Inference in the model is done using an efficient parameter-free Markov Chain that relies entirely on Gibbs sampling.

1,278 citations


Proceedings Article
29 Nov 1999
TL;DR: This paper presents a novel practical framework for Bayesian model averaging and model selection in probabilistic graphical models that approximates full posterior distributions over model parameters and structures, as well as latent variables, in an analytical manner.
Abstract: This paper presents a novel practical framework for Bayesian model averaging and model selection in probabilistic graphical models. Our approach approximates full posterior distributions over model parameters and structures, as well as latent variables, in an analytical manner. These posteriors fall out of a free-form optimization procedure, which naturally incorporates conjugate priors. Unlike in large sample approximations, the posteriors are generally non-Gaussian and no Hessian needs to be computed. Predictive quantities are obtained analytically. The resulting algorithm generalizes the standard Expectation Maximization algorithm, and its convergence is guaranteed. We demonstrate that this approach can be applied to a large class of models in several domains, including mixture models and source separation.

870 citations


Proceedings ArticleDOI
17 Oct 1999
TL;DR: This work presents the first provably correct algorithm for learning a mixture of Gaussians, which returns the true centers of the Gaussian to within the precision specified by the user with high probability.
Abstract: Mixtures of Gaussians are among the most fundamental and widely used statistical models. Current techniques for learning such mixtures from data are local search heuristics with weak performance guarantees. We present the first provably correct algorithm for learning a mixture of Gaussians. This algorithm is very simple and returns the true centers of the Gaussians to within the precision specified by the user with high probability. It runs in time only linear in the dimension of the data and polynomial in the number of Gaussians.

757 citations


Journal ArticleDOI
TL;DR: An expectation-maximization (EM) algorithm is presented, which performs unsupervised learning of an associated probabilistic model of the mixing situation and is shown to be superior to ICA since it can learn arbitrary source densities from the data.
Abstract: We introduce the independent factor analysis (IFA) method for recovering independent hidden sources from their observed mixtures. IFA generalizes and unifies ordinary factor analysis (FA), principal component analysis (PCA), and independent component analysis (ICA), and can handle not only square noiseless mixing but also the general case where the number of mixtures differs from the number of sources and the data are noisy. IFA is a two-step procedure. In the first step, the source densities, mixing matrix, and noise covariance are estimated from the observed data by maximum likelihood. For this purpose we present an expectation-maximization (EM) algorithm, which performs unsupervised learning of an associated probabilistic model of the mixing situation. Each source in our model is described by a mixture of gaussians; thus, all the probabilistic calculations can be performed analytically. In the second step, the sources are reconstructed from the observed data by an optimal nonlinear estimator. A variational approximation of this algorithm is derived for cases with a large number of sources, where the exact algorithm becomes intractable. Our IFA algorithm reduces to the one for ordinary FA when the sources become gaussian, and to an EM algorithm for PCA in the zero-noise limit. We derive an additional EM algorithm specifically for noiseless IFA. This algorithm is shown to be superior to ICA since it can learn arbitrary source densities from the data. Beyond blind separation, IFA can be used for modeling multidimensional data by a highly constrained mixture of gaussians and as a tool for nonlinear signal encoding.

573 citations


Journal ArticleDOI
TL;DR: MCLUST is a software package for cluster analysis written in Fortran and interfaced to the S PLUS commercial software package and includes functions that combine hierarchical clustering EM and the Bayesian Information Criterion BIC in a comprehensive clustering strategy.
Abstract: MCLUST is a software package for cluster analysis written in Fortran and interfaced to the S PLUS commercial software package It implements parameterized Gaussian hierarchical clustering algorithms and the EM algorithm for parameterized Gaussian mixture models with the possible addition of a Poisson noise term MCLUST also includes functions that combine hierarchical clustering EM and the Bayesian Information Criterion BIC in a comprehensive clustering strategy Methods of this type have shown promise in a number of practical applications including character recognition tissue segmenta tion mine eld and seismic fault detection identi cation of textile aws from images and classi cation of astronomical data A web page with related links can be found at

519 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider right-censored survival data for populations with a surviving (cure) fraction and propose a model that is quite different from the standard mixture model for cure rates.
Abstract: We consider Bayesian methods for right-censored survival data for populations with a surviving (cure) fraction. We propose a model that is quite different from the standard mixture model for cure rates. We provide a natural motivation and interpretation of the model and derive several novel properties of it. First, we show that the model has a proportional hazards structure, with the covariates depending naturally on the cure rate. Second, we derive several properties of the hazard function for the proposed model and establish mathematical relationships with the mixture model for cure rates. Prior elicitation is discussed in detail, and classes of noninformative and informative prior distributions are proposed. Several theoretical properties of the proposed priors and resulting posteriors are derived, and comparisons are made to the standard mixture model. A real dataset from a melanoma clinical trial is discussed in detail.

444 citations


Journal ArticleDOI
TL;DR: The use of adaptive Gaussian mixtures to model the colour distributions of objects is described to perform robust, real-time tracking under varying illumination, viewing geometry and camera parameters.

368 citations


Journal ArticleDOI
TL;DR: In this article, the shape variation displayed by a class of objects can be represented as probability density function, allowing us to determine plausible and implausible examples of the class using a mixture of gaussians.

Book
01 Nov 1999
TL;DR: Theory of nonparametric mixture models algorithms the likelihood ratio test for the number of components C.A.MAN-applications and meta-analysis moment estimators of the variance of the mixing distribution are studied.
Abstract: Theory of nonparametric mixture models algorithms the likelihood ratio test for the number of components C.A.MAN-application - meta-analysis moment estimators of the variance of the mixing distribution C.A.MAN-application - disease mapping other C.A.MAN-applications.

Journal ArticleDOI
TL;DR: This paper develops a topic-dependent, sentence-level mixture language model which takes advantage of the topic constraints in a sentence or article, and introduces topic- dependent dynamic adaptation techniques in the framework of the mixture model, using n-gram caches and content word unigram caches.
Abstract: Standard statistical language models use n-grams to capture local dependencies, or use dynamic modeling techniques to track dependencies within an article. In this paper, we investigate a new statistical language model that captures topic-related dependencies of words within and across sentences. First, we develop a topic-dependent, sentence-level mixture language model which takes advantage of the topic constraints in a sentence or article. Second, we introduce topic-dependent dynamic adaptation techniques in the framework of the mixture model, using n-gram caches and content word unigram caches. Experiments with the static (or unadapted) mixture model on the North American Business (NAB) task show a 21% reduction in perplexity and a 3-4% improvement in recognition accuracy over a general n-gram model, giving a larger gain than that obtained with supervised dynamic cache modeling. Further experiments on the Switchboard corpus also showed a small improvement in performance with the sentence-level mixture model. Cache modeling techniques introduced in the mixture framework contributed a further 14% reduction in perplexity and a small improvement in recognition accuracy on the NAB task for both supervised and unsupervised adaptation.

Journal ArticleDOI
TL;DR: In this article, the authors consider mixtures of multivariate normals where the expected value for each component depends on possibly nonnormal regressor variables and the expected values and covariance matrices of the mixture components are parameterized using conditional mean-and covariance-structures.
Abstract: Models and parameters of finite mixtures of multivariate normal densities conditional on regressor variables are specified and estimated. We consider mixtures of multivariate normals where the expected value for each component depends on possibly nonnormal regressor variables. The expected values and covariance matrices of the mixture components are parameterized using conditional mean- and covariance-structures. We discuss the construction of the likelihood function, estimation of the mixture model with regressors using three different EM algorithms, estimation of the asymptotic covariance matrix of parameters and testing for the number of mixture components. In addition to simulation studies, data on food preferences are analyzed.

Proceedings Article
29 Nov 1999
TL;DR: A new method for multivariate density estimation is developed based on the Support Vector Method (SVM) solution of inverse ill-posed problems that compared favorably to both Parzen's method and the Gaussian Mixture Model method.
Abstract: A new method for multivariate density estimation is developed based on the Support Vector Method (SVM) solution of inverse ill-posed problems. The solution has the form of a mixture of densities. This method with Gaussian kernels compared favorably to both Parzen's method and the Gaussian Mixture Model method. For synthetic data we achieve more accurate estimates for densities of 2, 6, 12, and 40 dimensions.

Journal ArticleDOI
TL;DR: The three-cluster result is found to be robust with respect to variations in data preprocessing and data analysis parameters, and thus yields clear evidence for three clusters in the NH 700-mb data.
Abstract: A mixture model is a flexible probability density estimation technique, consisting of a linear combination of k component densities. Such a model is applied to estimate clustering in Northern Hemisphere (NH) 700-mb geopotential height anomalies. A key feature of this approach is its ability to estimate a posterior probability distribution for k, the number of clusters, given the data and the model. The number of clusters that is most likely to fit the data is thus determined objectively. A dataset of 44 winters of NH 700-mb fields is projected onto its two leading empirical orthogonal functions (EOFs) and analyzed using mixtures of Gaussian components. Cross-validated likelihood is used to determine the best value of k, the number of clusters. The posterior probability so determined peaks at k = 3 and thus yields clear evidence for three clusters in the NH 700-mb data. The three-cluster result is found to be robust with respect to variations in data preprocessing and data analysis parameters. The...

Journal ArticleDOI
TL;DR: The model characterizes the sequence of measurements by assuming that its probability density function depends on the state of an underlying Markov chain, and the parameter vector includes distribution parameters and transition probabilities between the states.
Abstract: The analysis of routinely collected surveillance data is an important challenge in public health practice. We present a method based on a hidden Markov model for monitoring such time series. The model characterizes the sequence of measurements by assuming that its probability density function depends on the state of an underlying Markov chain. The parameter vector includes distribution parameters and transition probabilities between the states. Maximum likelihood estimates are obtained with a modified EM algorithm. Extensions are provided to take into account trend and seasonality in the data. The method is demonstrated on two examples: the first seeks to characterize influenza-like illness incidence rates with a mixture of Gaussian distributions, and the other, poliomyelitis counts with mixture of Poisson distributions. The results justify a wider use of this method for analysing surveillance data.

Journal ArticleDOI
TL;DR: A Bayesian nonparametric procedure for density estimation, for data in a closed, bounded interval, say [0,1], using a prior based on Bemstein polynomials to express the density as a mixture of given beta densities, with random weights and a random number of components.
Abstract: We propose a Bayesian nonparametric procedure for density estimation, for data in a closed, bounded interval, say [0,1]. To this aim, we use a prior based on Bemstein polynomials. This corresponds to expressing the density of the data as a mixture of given beta densities, with random weights and a random number of components. The density estimate is then obtained as the corresponding predictive density function. Comparison with classical and Bayesian kernel estimates is provided. The proposed procedure is illustrated in an example; an MCMC algorithm for approximating the estimate is also discussed.

Journal ArticleDOI
TL;DR: An algorithm called EMMIX is described that automatically undertakes the fitting of normal or t-component mixture models to multivariate data, using maximum likelikhood via the EM algorithm, including the provision of suitable initial values if not supplied by the user.
Abstract: We consider the fitting of normal or t-component mixture models to multivariate data, using maximum likelikhood via the EM algorithm. This approach requires the initial specification of an initial estimate of the vector of unknown parameters, or equivalently of an initial classification of the data with respect to the components of the mixture model under fit. We describe an algorithm called EMMIX that automatically undertakes this fitting: including the provision of suitable initial values if not supplied by the user. The EMMIX algorithm has several options, including the option to carry out a resampling-based test for the number of components in the mixture model.

Journal ArticleDOI
TL;DR: In this paper, a modified entropy criterion for choosing the number of clusters arising from a mixture model was presented. But it was not valid to decide between one and more than one clusters.

Journal ArticleDOI
TL;DR: This paper describes an approach based on a relatively new technique, support vector machines (SVMs), and contrasts this with more established algorithms such as linear spectral mixture models (LSMM) and artificial neural networks (ANN).

Journal Article
TL;DR: An algorithm called EMMIX is described that automatically undertakes the fitting of normal or t-component mixture models to multivariate data, using maximum likelihood via the EM algorithm, including the provision of suitable initial values if not supplied by the user.
Abstract: We consider the fitting of normal or t-component mixture models to multivariate data, using maximum likelihood via the EM algorithm. This approach requires the initial specification of an initial estimate of the vector of unknown parameters, or equivalently, of an initial classification of the data with respect to the components of the mixture model under fit. We describe an algorithm called EMMIX that automatically undertakes this fitting, including the provision of suitable initial values if not supplied by the user. The EMMIX algorithm has several options, including the option to carry out a resampling-based test for the number of components in the mixture model.

Journal ArticleDOI
TL;DR: In this article, three approaches for reliability modelling of continuous state devices are presented, one uses the random process to fit model parameters of a statistical distribution as functions of time, the second approach uses the general path model to fit parameters of the model as function of time and the third approach uses multiple linear regression to fit the distribution of lifetime directly.
Abstract: Three approaches for reliability modelling of continuous state devices are presented in this paper. One uses the random process to fit model parameters of a statistical distribution as functions of time. This approach allows the data set to be from any general distribution. The second approach uses the general path model to fit parameters of the model as functions of time. The relationship between the random process model and the general path model is illustrated. The third approach uses multiple linear regression to fit the distribution of lifetime directly. This approach has less restriction on the degradation data to be analyzed. All three approaches are illustrated with examples. Finally a mixture model is proposed which can be used to model both catastrophic failures and degradation failures. This mixture model also shows engineers how to design experiments to collect both hard failure data and soft failure data. Topics for further investigation in continuous device reliability modelling include further investigation of the mixture model, application of these models to practical situations, and using complex statistical distributions to fit degradation data.

Journal ArticleDOI
TL;DR: Comparative analysis of various algorithms that use data from the Landsat Thematic Mapper (TM) satellite to estimate mixtures of vegetation types within forest stands concludes that the new ARTMAP mixture system produces the most accurate overall results.

Journal ArticleDOI
TL;DR: In this paper, a class of Bayesian multiscale models (BMSM's) for one-dimensional inhomogeneous Poisson processes is introduced, where the focus is on estimating the (discretized) intensity function underlying the process.
Abstract: I introduce a class of Bayesian multiscale models (BMSM's) for one-dimensional inhomogeneous Poisson processes. The focus is on estimating the (discretized) intensity function underlying the process. Unlike the usual transform-based approach at the heart of most wavelet-based methods for Gaussian data, these BMSM's are constructed using recursive dyadic partitions (RDP's) within an entirely likelihood-based framework. Each RDP may be associated with a binary tree, and a new multiscale prior distribution is introduced for the unknown intensity through the placement of mixture distributions at each of the nodes of the tree. The concept of model mixing is then applied to a complete collection of such trees. In addition to allowing for the inclusion of full location/scale information in the model, this last step also is fundamental both in inducing stationarity in the prior distribution and in enabling a given intensity function to be approximated at the resolution of the data. Under squared-error lo...

Book ChapterDOI
26 Jul 1999
TL;DR: This paper proposes a new minimum description length (MDL) type criterion, termed MMDL(f or mixture MDL), to select the number of components of the model, based on the identification of an "equivalent sample size", for each component, which does not coincide with the full sample size.
Abstract: Consider the problem of fitting a finite Gaussian mixture, with an unknown number of components, to observed data. This paper proposes a new minimum description length (MDL) type criterion, termed MMDL(f or mixture MDL), to select the number of components of the model. MMDLis based on the identification of an "equivalent sample size", for each component, which does not coincide with the full sample size. We also introduce an algorithm based on the standard expectation-maximization (EM) approach together with a new agglomerative step, called agglomerative EM (AEM). The experiments here reported have shown that MMDLo utperforms existing criteria of comparable computational cost. The good behavior of AEM, namely its good robustness with respect to initialization, is also illustrated experimentally.

Proceedings Article
01 Jan 1999
TL;DR: In this article, a spectral domain, speech enhancement algorithm is proposed based on a mixture model for the short time spectrum of the clean speech signal, and on a maximum assumption in the production of the noisy speech spectrum.
Abstract: We present a spectral domain, speech enhancement algorithm. The new algorithm is based on a mixture model for the short time spectrum of the clean speech signal, and on a maximum assumption in the production of the noisy speech spectrum. In the past this model was used in the context of noise robust speech recognition. In this paper we show that this model is also effective for improving the quality of speech signals corrupted by additive noise. The computational requirements of the algorithm can be significantly reduced, essentially without paying performance penalties, by incorporating a dual codebook scheme with tied variances. Experiments, using recorded speech signals and actual noise sources, show that in spite of its low computational requirements, the algorithm shows improved performance compared to alternative speech enhancement algorithms.

01 Jan 1999
TL;DR: The Cambridge Research Laboratory was founded in 1987 to advance the state of the art in both core computing and human-computer interaction, and to use the knowledge gained to support the Company's corporate objectives through interconnected pursuits in technology creation, advanced systems engineering, and business development.
Abstract: The Cambridge Research Laboratory was founded in 1987 to advance the state of the art in both core computing and human-computer interaction, and to use the knowledge so gained to support the Company's corporate objectives. We believe this is best accomplished through interconnected pursuits in technology creation, advanced systems engineering, and business development. We are multimedia data. We recognize and embrace a technology creation model which is characterized by three major phases: Freedom: The lifeblood of the Laboratory comes from the observations and imaginations of our research staff. It is here that challenging research problems are uncovered (through discussions with customers, through interactions with others in the Corporation, through other professional interactions, through reading, and the like) or that new ideas are born. For any such problem or idea, this phase culminates in the nucleation of a project team around a well-articulated central research question and the outlining of a research plan. Focus: Once a team is formed, we aggressively pursue the creation of new technology based on the plan. This may involve direct collaboration with other technical professionals inside and outside the Corporation. This phase culminates in the demonstrable creation of new technology which may take any of a number of forms—a journal article, a technical talk, a working prototype, a patent application, or some combination of these. The research team is typically augmented with other resident professionals—engineering and business development—who work as integral members of the core team to prepare preliminary plans for how best to leverage this new knowledge, either through internal transfer of technology or through other means. Follow-through: We actively pursue taking the best technologies to the marketplace. For those opportunities which are not immediately transferred internally and where the team has identified a significant opportunity, the business development and engineering staff will lead early-stage commercial development, often in conjunction with members of the research staff. While the value to the Corporation of taking these new ideas to the market is clear, it also has a significant positive impact on our future research work by providing the means to understand intimately the problems and opportunities in the market and to more fully exercise our ideas and concepts in real-world settings. Throughout this process, communicating our understanding is a critical part of what we do, and participating in the larger technical community—through the publication of refereed journal articles and the presentation of our ideas at conferences—is …

Journal ArticleDOI
TL;DR: Spectral matching and linear mixture modeling techniques have been applied to synthetic imagery and AVIRIS SWIR imagery of a semi-arid rangeland in order to determine their effectiveness as mapping tools, the synergism between the two methods, and their advantages, and limitations for Rangeland resource exploitation and management as mentioned in this paper.

Proceedings Article
29 Nov 1999
TL;DR: This work describes a method for learning an overcomplete set of basis functions for the purpose of modeling sparse structure in images and shows that when the prior is in such a form, there exist efficient methods for learning the basis functions as well as the parameters of the prior.
Abstract: We describe a method for learning an overcomplete set of basis functions for the purpose of modeling sparse structure in images. The sparsity of the basis function coefficients is modeled with a mixture-of-Gaussians distribution. One Gaussian captures nonactive coefficients with a small-variance distribution centered at zero, while one or more other Gaussians capture active coefficients with a large-variance distribution. We show that when the prior is in such a form, there exist efficient methods for learning the basis functions as well as the parameters of the prior. The performance of the algorithm is demonstrated on a number of test cases and also on natural images. The basis functions learned on natural images are similar to those obtained with other methods, but the sparse form of the coefficient distribution is much better described. Also, since the parameters of the prior are adapted to the data, no assumption about sparse structure in the images need be made a priori, rather it is learned from the data.