scispace - formally typeset
Search or ask a question

Showing papers on "Mixture model published in 1993"


Journal ArticleDOI
TL;DR: In this paper, the Gibbs sampler is used to indirectly sample from the multinomial posterior distribution on the set of possible subset choices to identify the promising subsets by their more frequent appearance in the Gibbs sample.
Abstract: A crucial problem in building a multiple regression model is the selection of predictors to include. The main thrust of this article is to propose and develop a procedure that uses probabilistic considerations for selecting promising subsets. This procedure entails embedding the regression setup in a hierarchical normal mixture model where latent variables are used to identify subset choices. In this framework the promising subsets of predictors can be identified as those with higher posterior probability. The computational burden is then alleviated by using the Gibbs sampler to indirectly sample from this multinomial posterior distribution on the set of possible subset choices. Those subsets with higher probability—the promising ones—can then be identified by their more frequent appearance in the Gibbs sample.

2,780 citations


Book ChapterDOI
01 Aug 1993
TL;DR: An expectation-maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning is presented and an online learning algorithm in which the parameters are updated incrementally is developed.
Abstract: We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIMs). Learning is treated as a maximum likelihood problem; in particular, we present an expectation-maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an online learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain.

1,689 citations


Proceedings ArticleDOI
15 Jun 1993
TL;DR: A new approach based on the use of a probabilistic mixture model to explicitly represent multiple motions within a patch is presented, which can provide robust estimates of the optical flow values in the presence of outliers and multiple motions.
Abstract: The computation of optical flow relies on merging information available over an image patch to form an estimate of 2-D image velocity at a point. This merging process raises many issues. These include the treatment of outliers in component velocity measurements and the modeling of multiple motions within a patch which arise from occlusion boundaries or transparency. A new approach for dealing with these issues is presented. It is based on the use of a probabilistic mixture model to explicitly represent multiple motions within a patch. A simple extension of the EM-algorithm is used to compute a maximum likelihood estimate for the various motion parameters. Preliminary experiments indicate that this approach is computationally efficient, and that it can provide robust estimates of the optical flow values in the presence of outliers and multiple motions. >

278 citations


Book ChapterDOI
01 Jan 1993
TL;DR: The informational complexity (ICOMP) criterion of IFIM of this author is derived and proposed as a new criterion for choosing the number of clusters in the mixture-model and the significance of ICOMP is illustrated.
Abstract: This paper considers the problem of choosing the number of component clusters of individuals within the context of the standard mixture of multivariate normal distributions. Often the number of mixture clusters K is unknown, but varying and needs to be estimated. A two-stage iterative maximum-likelihood procedure is used as a clustering criterion to estimate the parameters of the mixture-model under several different covariance structures. An approximate component-wise inverse-Fisher information (IFIM) for the mixture-model is obtained. Then the informational complexity (ICOMP) criterion of IFIM of this author (Bozdogan 1988, 1990a, 1990b) is derived and proposed as a new criterion for choosing the number of clusters in the mixture-model. For comparative purposes, Akaike’s (1973) information criterion (AIC), and Rissanen’s (1978) minimum description length (MDL) criterion are also introduced and derived for the mixture-model. Numerical examples are shown on simulated multivariate normal data sets with a known number of mixture clusters to illustrate the significance of ICOMP in choosing the number of clusters and the best fitting model.

191 citations


Proceedings ArticleDOI
27 Apr 1993
TL;DR: A segmental speech model is used to develop a secondary processing algorithm that rescores putative events hypothesized by a primary HMM word spotter to try to improve performance by discriminating true keywords from false alarms.
Abstract: The authors present a segmental speech model that explicitly models the dynamics in a variable-duration speech segment by using a time-varying trajectory model of the speech features in the segment. Each speech segment is represented by a set of statistics which includes a time-varying trajectory, a residual error covariance around the trajectory, and the number of frames in the segment. These statistics replace the frames in the segment and become the data that are modeled by either HMMs (hidden Markov models) or mixture models. This segment model is used to develop a secondary processing algorithm that rescores putative events hypothesized by a primary HMM word spotter to try to improve performance by discriminating true keywords from false alarms. This algorithm is evaluated on a keyword spotting task using the Road Rally Database, and performance is shown to improve significantly over that of the primary word spotter. The segmental model is also used on a TIMIT vowel classification task to evaluate its modeling capability. >

125 citations


Journal ArticleDOI
TL;DR: A latent distribution model is presented that includes parameters that characterize bias, category definitions, and measurement error for each rater or test and provides a general approach for mixture analysis using two or more ordered-caregory measures.
Abstract: This article presents a latent distribution model for the analysis of agreement on dichotomous or ordered category ratings. The model includes parameters that characterize bias, category definitions, and measurement error for each rater or test. Parameter estimates can be used to evaluate rater performance and to improve classification or measurement with use of multiple ratings. A simple maximum likelihood estimation procedure is described. Two examples illustrate the approach. Although considered in the context of analyzing rater agreement, the model provides a general approach for mixture analysis using two or more ordered-caregory measures.

94 citations


Journal ArticleDOI
TL;DR: By this approach the finite mixture model is embedded within the general framework of generalized linear models (GLMs) and the proposed EM algorithm can be readily done in statistical packages with facilities for GLMs.
Abstract: A generalized linear finite mixture model and an EM algorithm to fit the model to data are described. By this approach the finite mixture model is embedded within the general framework of generalized linear models (GLMs). Implementation of the proposed EM algorithm can be readily done in statistical packages with facilities for GLMs. A practical example is presented where a generalized linear finite mixture model of ten Weibull distributions is adopted. The example is concerned with the flow cytometric measurement of the DNA content of spermatids in a mutant mouse, which shows non-disjunction of specific chromosomes during meiosis.

84 citations


Journal ArticleDOI
TL;DR: An alternative approach using mixture models to identify population heterogeneity and map construction within an empirical Bayes framework is described and a map is presented for hepatitis B data from Berlin in 1989.
Abstract: The analysis and recognition of disease clustering in space and its representation on a map is one of the oldest problems in epidemiology. Some traditional methods of constructing such a map are presented. An alternative approach using mixture models to identify population heterogeneity and map construction within an empirical Bayes framework is described. For hepatitis B data from Berlin in 1989, a map is presented and the different methods are evaluated using a parametric bootstrap approach.

79 citations


Journal ArticleDOI
TL;DR: The asymptotic performance of the recursive, nonparametric method, dubbed “adaptive mixtures” for its data-driven development of a mixture model approximation to the true density, is investigated using the method of sieves.

70 citations


Journal ArticleDOI
TL;DR: Two new models that handle surfaces with discontinuities are proposed that develop a mixture of expert interpolators and specialized, asymmetric interpolators that do not cross the discontinUities.
Abstract: We have previously described an unsupervised learning procedure that discovers spatially coherent properties of the world by maximizing the information that parameters extracted from different parts of the sensory input convey about some common underlying cause. When given random dot stereograms of curved surfaces, this procedure learns to extract surface depth because that is the property that is coherent across space. It also learns how to interpolate the depth at one location from the depths at nearby locations (Becker and Hinton 1992b). In this paper, we propose two new models that handle surfaces with discontinuities. The first model attempts to detect cases of discontinuities and reject them. The second model develops a mixture of expert interpolators. It learns to detect the locations of discontinuities and to invoke specialized, asymmetric interpolators that do not cross the discontinuities.

58 citations


Journal ArticleDOI
TL;DR: The semicontinuous hidden Markov model was extended to incorporate multiple code-books and it was found that the SCHMM can have a large number of free parameters in comparison with the discrete HMM because of its smoothing ability.

Journal ArticleDOI
TL;DR: In this paper, a mixture of Laplace and Weibull distributions is used to model price changes in real estate prices in France and a statistical inference for such mixture models is given.
Abstract: B. Mandelbrot and E. Fama in the sixties, and W. Ziemba in the seventies, suggested stable laws for modeling stock returns and commodity prices. Geometric stable distributions, with Laplace distribution playing the role of a "normal" law, have been found to give better fit to such data. We study the "stability" properties of Laplace and a mixture of Laplace and Weibull and discuss the statistical inference for such mixture models. Application of the mixture distribution to modeling price changes in real estate prices in France is given.

Journal ArticleDOI
TL;DR: In this paper, a flexible class of stochastic mixture models for the analysis and interpretation of individual differences in recurrent choice and other types of count data is introduced, which are derived by specifying elements of the choice process at the individual level.
Abstract: This paper introduces a flexible class of stochastic mixture models for the analysis and interpretation of individual differences in recurrent choice and other types of count data. These choice models are derived by specifying elements of the choice process at the individual level. Probability distributions are introduced to describe variations in the choice process among individuals and to obtain a representation of the aggregate choice behavior. Due to the explicit consideration of random effect sources, the choice models are parsimonious and readily interpretable. An easy to implement EM algorithm is presented for parameter estimation. Two applications illustrate the proposed approach.

Journal ArticleDOI
TL;DR: This expository report describes this novel approach to the unconstrained identification of components within a mixture, and provides demonstration of the usefulness of the technique in the context of both simulations and the analysis of distributions of synaptic potential signals.

Proceedings Article
Eric Saund1
29 Nov 1993
TL;DR: This paper presents a formulation for unsupervised learning of clusters reflecting multiple causal structure in binary data, and demonstrates the algorithm's ability to discover coherent multiple causal representations of noisy test data and in images of printed characters.
Abstract: This paper presents a formulation for unsupervised learning of clusters reflecting multiple causal structure in binary data. Unlike the standard mixture model, a multiple cause model accounts for observed data by combining assertions from many hidden causes, each of which can pertain to varying degree to any subset of the observable dimensions. A crucial issue is the mixing-function for combining beliefs from different cluster-centers in order to generate data reconstructions whose errors are minimized both during recognition and learning. We demonstrate a weakness inherent to the popular weighted sum followed by sigmoid squashing, and offer an alternative form of the nonlinearity. Results are presented demonstrating the algorithm's ability successfully to discover coherent multiple causal representations of noisy test data and in images of printed characters.

Journal ArticleDOI
TL;DR: In this paper, the basic properties of multivariate finite mixture distributions are summarized and a distinction between models in which a gradient exists in at least one component and those in which there are no gradients within individual stellar population components is made.
Abstract: Multivariate finite mixture models provide a framework for analyzing and interpreting observational data that arise from the overlapping spatial, kinematical, chemical, and age distributions of stellar populations. In this paper the basic properties of multivariate finite mixture distributions are summarized. Particular emphasis is placed on the interpretation of observed overall gradients, and on the distinction between models in which a gradient exists in at least one component and those in which there are no gradients within individual stellar population components. Central to this discussion are posterior mixing proportions, which describe how the mixing proportions for an observed sample of stars vary as functions of metal abundance, age, etc., and conditional mixture distributions, which describe the relationships that exist among the variables

Proceedings ArticleDOI
27 Apr 1993
TL;DR: A mechanism for implementing mixtures at a phone-subsegment (microsegment) level for continuous word recognition based on the stochastic segment model (SMM) is presented and results suggest that there is a tradeoff in using mixture models and trajectory models, associated with the level of detail of the modeling unit.
Abstract: A mechanism for implementing mixtures at a phone-subsegment (microsegment) level for continuous word recognition based on the stochastic segment model (SMM) is presented. The issues that are involved in tradeoffs between the trajectory and mixture modeling in segment-based word recognition are investigated. Experimental results are reported on DAPRA's speaker-independent Resource management corpus. The results obtained suggest that there is a tradeoff in using mixture models and trajectory models, associated with the level of detail of the modeling unit. The results support the use of whole segment models in the context-dependent case, and microsegment-level (and possibly segment-level) mixtures rather than frame-level mixtures. >

Journal ArticleDOI
TL;DR: In this paper, an EM-algorithm is proposed to study a mixture model for the analysis of count data and iterative procedures for estimating the parameters are given for various discrete distributions including Binomial, Negative Binomial and Poisson.

Posted Content
TL;DR: In this paper, an approach is developed that accommodates heterogeneity in Poisson regression models for count data, assuming that heterogeneity arises from a distribution of both the intercept and the coefficients of the explanatory variables.
Abstract: In this paper an approach is developed that accommodates heterogeneity in Poisson regression models for count data. The model developed assumes that heterogeneity arises from a distribution of both the intercept and the coefficients of the explanatory variables. We assume that the mixing distribution is discrete, resulting in a finite mixture model formulation. An EM algorithm for estimation is described, and the algorithm is applied to data on customer purchases of books offered through direct mail. Our model is compared empirically to a number of other approaches that deal with heterogeneity in Poisson regression models.

Journal ArticleDOI
TL;DR: The authors compared the performance of a recently proposed multiprocessation mixture model and a more traditional random walk time-varying parameter model in the face of structural shifts and outliers.
Abstract: This Monte Carlo study compares the performance of a recently proposed multiprocess mixture model and a more traditional random walk time-varying parameter model in the face of structural shifts and outliers. The mixture model performs well and the latter model performs poorly. This finding is of general interest since investigators often adopt random-walk time-varying parameter models to accommodate potential regime shifts in regression relationships. The findings suggest that the time-varying parameter estimation procedure is unlikely to find abrupt shifts, since the time-varying parameter estimates are contaminated by the outliers and regime shifts. Copyright 1993 by MIT Press.

Journal ArticleDOI
01 Dec 1993-Test
TL;DR: Prior Feedback as discussed by the authors uses conjugate priors on each component of the mixture and is called Prior Feedback because the hyperparameters of these conjugates are iteratively replaced by the cor-responding posterior values until convergence is attained.
Abstract: In this paper, we show how Gibbs sampling can provide a reliable approximation for Bayesian estimation of the parameters of a mixture distribution Moreover, we deduce from the Bayesian approach an alternative derivation of maximum likelihood estimators in this setting, where standard nonin-formative approaches do not apply Our method uses conjugate priors on each component of the mixture and is called Prior Feedback because the hyperparameters of these conjugate priors are iteratively replaced by the cor-responding posterior values until convergence is attained We illustrate the appeal of this method through an astrophysical example, where the small sample size prohibits the use of standard maximum likelihood methods A second example shows that Prior Feedback is also able to reject an unrealistic mixture model

01 Jan 1993
TL;DR: In this article, an unconstrained step-wise regression method was proposed to analyze the mixture of spectral signatures that contribute to a pixel response, made possible by a redesign of the design matrix.
Abstract: The linear mixing model, used to analyze the mixture of spectral signatures that contribute to a pixel response, has been considered by a number of investigators. Shimbukuro and Smith [l] address the issue of numerical solutions to the problem of constraining the mixture proportions to the zero one interval and ensuring that the sum of proportions is equal to one. This note suggests a method whereby unconstrained step-wise regression may be used as an alternative to constrained regression, made possible by a redesign of the design matrix. In addition, each constraint represents a testable hypothesis, providing information that may be used in endmember or component selection, or in evaluating the overall model fit. The linear mixture model (using general regression terminology) is given by

Proceedings ArticleDOI
25 Oct 1993
TL;DR: Though the learning is based on the winner-take-all scheme, one of the novelties of BAYESNET lies in the undeterministic winner selection: unlike the usual nearest-neighbor selection, the network selects the winner according to the probability obtained by the current estimation.
Abstract: This paper proposes a new approach to pattern classification problems using a neural network, BAYESNET. The network is designed to identify the class of an unlabelled pattern using the Bayesian decision theory. Since the theory explicitly requires the information of pattern distributions, the network has the capability of learning probability density functions (pdfs) of classes. To estimate pdfs, we adopt parametric estimation with Gaussian mixture model: a class is assumed to be composed of a number of subclasses, each of whose patterns has a Gaussian distribution. The BAYESNET learning includes two subprocesses, the initialization process and the main learning process. In the initialization process, the number of subclasses for each class by automatic generation and elimination of subclasses. Generation occurs when samples assigned to a subclass turns out to have an unexpected distribution, checked by the Chi-square test. A subclass is eliminated when the subclass contributes negligibly to forming the class pdf. The role of the main learning process is to fine-tune parameters of classes and subclasses. Though the learning is based on the winner-take-all scheme, one of the novelties of BAYESNET lies in the undeterministic winner selection. Unlike the usual nearest-neighbor selection, our network selects the winner according to the probability obtained by the current estimation. Due to the selection rule, the final parameter values are assured to agree with the maximum likelihood estimation. Besides, biases are added to the winner selection in order to help avoiding local minima states.

Journal ArticleDOI
TL;DR: This work presents a model where the observed image is a mixture of an arbitrary and discrete noise process with the true but unknown image, and develops a filtering algorithm, used to remove the noise.

Journal ArticleDOI
TL;DR: This article deals with spatial smoothness constraints, which have been found useful in analyzing sequences of emission tomography images and an estimation methodology using penalized likelihood with multiple smoothing parameters is proposed.
Abstract: The following problem arises in Computer vision, diagnostic medical imaging, and remote sensing: At each pixel in an image a vector of observations is measured, and the distribution of these measurements is approximated by a mixture model. The goal is to estimate the mixing proportions of the classes by pixel in the image together with any unknown parameters in the latent distributions. In many problems of this type, it is appropriate to incorporate constraints on mixing proportions. This article deals with spatial smoothness constraints, which have been found useful in analyzing sequences of emission tomography images. An estimation methodology using penalized likelihood with multiple smoothing parameters is proposed. Numerical methods for implementing this methodology are developed. This includes an importance sampling technique for approximating the effective degrees of freedom of the solution. The methodology is illustrated with an application to the analysis of a dynamic emission tomography study usi...

Proceedings Article
29 Nov 1993
TL;DR: It is shown that the conventional back-propagation (BPP) algorithm for neural network regression is robust to leverages, but not to outliers, and a robust model is to model the error as a mixture of normal distribution.
Abstract: In this paper, it is shown that the conventional back-propagation (BPP) algorithm for neural network regression is robust to leverages (data with x corrupted), but not to outliers (data with y corrupted). A robust model is to model the error as a mixture of normal distribution. The influence function for this mixture model is calculated and the condition for the model to be robust to outliers is given. EM algorithm [5] is used to estimate the parameter. The usefulness of model selection criteria is also discussed. Illustrative simulations are performed.

01 Jan 1993
TL;DR: Given a set of images, each of which contains one instance of a small but unknown set of objects imaged from a random viewpoint, it is shown how to perform unsupervised learning to discover the object classes.
Abstract: Given a set of images, each of which contains one instance of a small but unknown set of objects imaged from a random viewpoint, we show how to perform unsupervised learning to discover the object classes. To group the data into objects we use a mixture model which is trained with the EM algorithm. We have investigated characterizing the the probability distribution for the features of each object either in terms of an object model or by a Gaussian distribution. We compare the performance of these two approaches on a dataset containing six different stick-animals, and on a dataset consisting of seven hand gestures.

Journal ArticleDOI
TL;DR: In this paper, the distribution of the likelihood ratio for testing whether or not one is sampling from a mixture of two distributions or from a single distribution is studied, where some information is available on the variation range of the parameters of populations.
Abstract: The aim of this paper is to study the distribution of the likelihood ratio for testing whether or not one is sampling from a mixture of two distributions or from a single distribution. We study the case where some information is available on the variation range of the parameters of populations. First we study the simplest case in which the difference between the mean of the two populations is known. We show certain distortions between theoretical and simulation results. Secondly, we show how this distortion spreads to the situation where this difference belongs to an interval. Finally, we give an example concerning the detection of major genes in animal population.

Journal ArticleDOI
TL;DR: The data strongly suggest that the distributions of several measures of eating behavior are composed of four component distributions, consistent with the possibility of major gene effects for eating behavior.
Abstract: This investigation tested whether distributions of certain aspects of eating behavior were consistent with the notion of a “mixture model;” that is, two or more distinct Commingled component distributions, consistent with the possibility of major gene action. Undergraduates (n=901) completed self-report trait measures of hunger, disinhibition, and dietary restraint. Variables were residualized for gender and age and transformed to remove skewness. Residualized transformed distributions were tested for departure from unimodality with Hartigan's (14) dip statistic. The distributions of all three aspects of eating behavior were significantly non-unimodal. Next, component multivariate normal distributions were estimated via maximum likelihood. Likelihood ratio tests were employed to compare nested models. A mixture of four distributions with unequal variance-covariance matrices tit significantly better than any more parsimonious model. In sum, these data strongly suggest that the distributions of several measures of eating behavior are composed of four component distributions. This finding is consistent with the possibility of major gene effects for eating behavior.

Proceedings ArticleDOI
29 Oct 1993
TL;DR: Experimental results show that the proposed algorithm is more robust against variations in training samples than the conventional supervised Gaussian maximum likelihood classifier.
Abstract: A new method for classification of multi-spectral data is proposed. This method is based on fitting mixtures of multivariate Gaussian components to training and unlabeled samples by using the EM algorithm. Through a backtracking search strategy with appropriate depth bounds, a series of mixture models are compared. The validity of the candidate models are evaluated by considering their description lengths and allocation rates. The most suitable model is selected and the multi-spectral data are classified accordingly. The EM algorithm is mapped onto a massively parallel computer system to reduce the computational cost. Experimental results show that the proposed algorithm is more robust against variations in training samples than the conventional supervised Gaussian maximum likelihood classifier.© (1993) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.