scispace - formally typeset
Search or ask a question

Showing papers on "Mixture model published in 1995"


Journal ArticleDOI
TL;DR: The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identity and is shown to outperform the other speaker modeling techniques on an identical 16 speaker telephone speech task.
Abstract: This paper introduces and motivates the use of Gaussian mixture models (GMM) for robust text-independent speaker identification. The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identity. The focus of this work is on applications which require high identification rates using short utterance from unconstrained conversational speech and robustness to degradations produced by transmission over a telephone channel. A complete experimental evaluation of the Gaussian mixture speaker model is conducted on a 49 speaker, conversational telephone speech database. The experiments examine algorithmic issues (initialization, variance limiting, model order selection), spectral variability robustness techniques, large population performance, and comparisons to other speaker modeling techniques (uni-modal Gaussian, VQ codebook, tied Gaussian mixture, and radial basis functions). The Gaussian mixture speaker model attains 96.8% identification accuracy using 5 second clean speech utterances and 80.8% accuracy using 15 second telephone speech utterances with a 49 speaker population and is shown to outperform the other speaker modeling techniques on an identical 16 speaker telephone speech task. >

3,134 citations


BookDOI
01 Jan 1995

1,250 citations


Journal ArticleDOI
TL;DR: A unified information geometrical framework for studying stochastic models of neural networks, by focusing on the EM and em algorithms, and proves a condition that guarantees their equivalence.

339 citations


Proceedings ArticleDOI
20 Jun 1995
TL;DR: This paper presents one such formulation based on maximum likelihood estimation (MLE) of mixture models and the minimum description length (MDL) encoding principle of layered motion representation, and examines how many motion models adequately describe image motion.
Abstract: Representing and modeling the motion and spatial support of multiple objects and surfaces from motion video sequences is an important intermediate step towards dynamic image understanding. One such representation, called layered representation, has recently been proposed. Although a number of algorithms have been developed for computing these representations, there has not been a consolidated effort into developing a precise mathematical formulation of the problem. This paper presents one such formulation based on maximum likelihood estimation (MLE) of mixture models and the minimum description length (MDL) encoding principle. The three major issues in layered motion representation are: (i) how many motion models adequately describe image motion, (ii) what are the motion model parameters, and (iii) what is the spatial support layer for each motion model. >

315 citations


Journal ArticleDOI
TL;DR: This work generalizes the McCullagh and Nelder approach to a latent class framework and demonstrates how this approach handles many of the existing latent class regression procedures as special cases, as well as a host of other parametric specifications in the exponential family heretofore not mentioned in the latent class literature.
Abstract: A mixture model approach is developed that simultaneously estimates the posterior membership probabilities of observations to a number of unobservable groups or latent classes, and the parameters of a generalized linear model which relates the observations, distributed according to some member of the exponential family, to a set of specified covariates within each Class. We demonstrate how this approach handles many of the existing latent class regression procedures as special cases, as well as a host of other parametric specifications in the exponential family heretofore not mentioned in the latent class literature. As such we generalize the McCullagh and Nelder approach to a latent class framework. The parameters are estimated using maximum likelihood, and an EM algorithm for estimation is provided. A Monte Carlo study of the performance of the algorithm for several distributions is provided, and the model is illustrated in two empirical applications.

314 citations


Journal ArticleDOI
TL;DR: It is shown that the EM algorithm can be regarded as a variable metric algorithm with its searching direction having a positive projection on the gradient of the log likelihood and an acceleration technique that yields a significant speedup in simulation experiments.

278 citations


Journal ArticleDOI
TL;DR: The proposed model is a semi-parametric generalization of the mixture model of Farewell (1982), and a logistic regression model is proposed for the incidence part of the model, and a Kaplan-Meier type approach is used to estimate the latency part ofThe model.
Abstract: A mixture model is an attractive approach for analyzing failure time data in which there are thought to be two groups of subjects, those who could eventually develop the endpoint and those who could not develop the endpoint. The proposed model is a semi-parametric generalization of the mixture model of Farewell (1982). A logistic regression model is proposed for the incidence part of the model, and a Kaplan-Meier type approach is used to estimate the latency part of the model. The estimator arises naturally out of the EM algorithm approach for fitting failure time mixture models as described by Larson and Dinse (1985). The procedure is applied to some experimental data from radiation biology and is evaluated in a Monte Carlo simulation study. The simulation study suggests the semi-parametric procedure is almost as efficient as the correct fully parametric procedure for estimating the regression coefficient in the incidence, but less efficient for estimating the latency distribution.

234 citations


Journal ArticleDOI
TL;DR: A general strategy for accurately estimating false-match rates for each possible cutoff weight and uses a model where the distribution of observed weights are viewed as a mixture of weights for true matches and weights for false matches.
Abstract: Specifying a record-linkage procedure requires both (1) a method for measuring closeness of agreement between records, typically a scalar weight, and (2) a rule for deciding when to classify records as matches or nonmatches based on the weights. Here we outline a general strategy for the second problem, that is, for accurately estimating false-match rates for each possible cutoff weight. The strategy uses a model where the distribution of observed weights are viewed as a mixture of weights for true matches and weights for false matches. An EM algorithm for fitting mixtures of transformed-normal distributions is used to find posterior modes; associated posterior variability is due to uncertainty about specific normalizing transformations as well as uncertainty in the parameters of the mixture model, the latter being calculated using the SEM algorithm. This mixture-model calibration method is shown to perform well in an applied setting with census data. Further, a simulation experiment reveals that...

179 citations


Journal ArticleDOI
Eric Saund1
TL;DR: A formulation for unsupervised learning of clusters reflecting multiple causal structure in binary data, which employs an objective function and iterative gradient descent learning algorithm resembling the conventional mixture model and demonstrates its ability to discover coherent multiple causal representations in several experimental data sets.
Abstract: This paper presents a formulation for unsupervised learning of clusters reflecting multiple causal structure in binary data. Unlike the "hard" k-means clustering algorithm and the "soft" mixture model, each of which assumes that a single hidden event generates each data point, a multiple cause model accounts for observed data by combining assertions from many hidden causes, each of which can pertain to varying degree to any subset of the observable dimensions. We employ an objective function and iterative gradient descent learning algorithm resembling the conventional mixture model. A crucial issue is the mixing function for combining beliefs from different cluster centers in order to generate data predictions whose errors are minimized both during recognition and learning. The mixing function constitutes a prior assumption about underlying structural regularities of the data domain; we demonstrate a weakness inherent to the popular weighted sum followed by sigmoid squashing, and offer alternative forms of the nonlinearity for two types of data domain. Results are presented demonstrating the algorithm's ability successfully to discover coherent multiple causal representations in several experimental data sets.

127 citations


Proceedings Article
27 Nov 1995
TL;DR: Two regularization methods are compared which can be used to improve the generalization capabilities of Gaussian mixture density estimates and Breiman's "bagging", which recently has been found to produce impressive results for classification networks.
Abstract: We compare two regularization methods which can be used to improve the generalization capabilities of Gaussian mixture density estimates. The first method uses a Bayesian prior on the parameter space. We derive EM (Expectation Maximization) update rules which maximize the a posterior parameter probability. In the second approach we apply ensemble averaging to density estimation. This includes Breiman's "bagging", which recently has been found to produce impressive results for classification networks.

79 citations


Journal ArticleDOI
TL;DR: This work proposes several solutions to implement the ‘SEMcm algorithm’ (SEM for censored mixture), showing in particular that one of these procedures solves numerical problems arising with the EMcm algorithm and mixtures of nonexponential-type distributions.

Posted Content
TL;DR: In this paper, the authors used a mixture model to investigate pro-son bias in child health outcomes in Bangladesh and found that the mixture model revealed systematic differences in health outcomes between the two groups.
Abstract: Many interesting economic hypotheses entail differences in behaviors of groups within a population, but analyses of pooled samples shed only partial light on underlying segmentations. Finite mixture models are considered as an alternative to methods based on pooling. Robustness checks using t -regressions and a Bayesian analogue to the likelihood ratio test for model evaluation are developed. The methodology is used to investigate pro-son bias in child health outcomes in Bangladesh. While regression analysis on the entire sample appears to wash out evidence of bias, the mixture models reveal systematic girl-boy differences in health outcomes.

Journal ArticleDOI
TL;DR: Four mixture models are fit within a Bayesian model monitoring using posterior predictive checks framework, where the distinctions between models arise from assumptions about the variance of the shifted observations and the exchangeability of schizophrenic individuals.
Abstract: Reaction times for schizophrenic individuals in a simple visual tracking experiment can be substantially more variable than for non-schizophrenic individuals Current psychological theory suggests that at least some of this extra variability arises from an attentional lapse that delays some, but not all, of each schizophrenic's reaction times Based on this theory, we pursue models in which measurements from non-schizophrenics arise from a normal linear model with a separate mean for each individual, whereas measurements from schizophrenics arise from a mixture of (i) a component analogous to the distribution of response times for non-schizophrenics and (ii) a mean-shifted component We fit four mixture models within this framework, where the distinctions between models arise from assumptions about the variance of the shifted observations and the exchangeability of schizophrenic individuals Some of these models can be fit by maximum likelihood using the EM algorithm, and all can be fit using the ECM algorithm, where the covariance matrices associated with the parameters are calculated by the SEM and SECM algorithms, respectively Bayesian model monitoring using posterior predictive checks is invoked to discard models that fail to reproduce certain observed features of the data and to stimulate the development of better models

Journal ArticleDOI
TL;DR: In this article, a mixture of two or more exponential distributions is used to split behaviour into bouts and calculate a bout criterion from the resulting parameter estimates using a maximum likelihood approach, and the sample size required to obtain reasonable estimates of the parameters using this approach is investigated using simulated data, and found to depend on the ratio between the two densities of the two exponential processes and the proportion in which they are mixed.
Abstract: One method of splitting behaviour into bouts is to model the data as a mixture of two (or more) exponential distributions and to calculate a bout criterion from the resulting parameter estimates. The parameter estimates under a mixture model can be obtained using a maximum likelihood approach. The sample size required to obtain reasonable estimates of the parameters using this approach is investigated using simulated data, and found to depend on the ratio between the two densities of the two exponential processes and the proportion in which they are mixed. The use of likelihood ratio tests in helping to determine whether the data occur in bouts is also described and illustrated.

Journal ArticleDOI
Yoram Singer1
27 Nov 1995
TL;DR: An online learning algorithm that efficiently infers the structure and estimates the parameters of each probabilistic transducer in the mixture is devised and an application of the model for inducing a noun phrase recognizer is presented.
Abstract: We describe and analyze a mixture model for supervised learning of probabilistic transducers. We devise an online learning algorithm that efficiently infers the structure and estimates the parameters of each probabilistic transducer in the mixture. Theoretical analysis and comparative simulations indicate that the learning algorithm tracks the best transducer from an arbitrarily large (possibly infinite) pool of models. We also present an application of the model for inducing a noun phrase recognizer.

Journal ArticleDOI
TL;DR: This paper argued for a categorical conceptualisation of temperamental characteristics and applied a finite mixture model appropriate to this view to two sets of longitudinal observations of infants and young children, and provided a good description of the observed predictive relation between behavioural profiles of children at 4 months and the degree of behavioural signs of fear at 14 months.
Abstract: Temperamental characteristics can be conceptualised as continuous dimensions or qualitative categories. The continuous versus categorical question concerns the underlying temperamental characteristics and not the measured variables, which can be recorded in either continuous or categorical forms. This paper argues for a categorical conceptualisation of temperamental characteristics and applies a finite mixture model appropriate to this view to two sets of longitudinal observations of infants and young children. This statistical approach provides a good description of the observed predictive relation between behavioural profiles of children at 4 months and the degree of behavioural signs of fear at 14 months. An advantage of the mixture model approach to this data, relative to more standard approaches to developmental data, is that because it takes into account an a-priori theory, it can be used to address improvements and refinements to theories and experimental designs in a straightforward manner.

Journal ArticleDOI
TL;DR: This paper uses a transformation to determine an approximate asymptotic distribution of the test statistic under a mixture model and recommends the routine use of an admixture model with a critical lod score of 3·44 for gene searches.
Abstract: Summary Linkage analysis has contributed to the localization of many human disease genes. The presence of locus heterogeneity reduces statistical power and can prejudice the detection of linkage if the analysis assumes homogeneity. Nevertheless, mixed genetic models are not routinely used in gene searches. The null distribution of the test statistic is not uniquely defined. In this paper, a transformation is used to determine an approximate asymptotic distribution of the test statistic under a mixture model. The equivalent critical values of the test are computed and the performance of the test under various levels of heterogeneity and family size is investigated. For gene searches, we recommend the routine use of an admixture model with a critical lod score of 3·44.

Journal ArticleDOI
TL;DR: In this paper, a mixture model with two gamma distributions is proposed for the analysis of overdispersed repeated count data, where the counts have independent Poisson distributions conditional on the Poisson parameter whose distribution is a mixture of gamma distributions.
Abstract: SUMMARY Repeated count data showing overdispersion are commonly analysed by using a Poisson model with varying intensity parameter, resulting in a mixed model. A mixed model with a gamma distribution for the Poisson parameter does not adequately fit a data set on 721 children's spelling errors. An alternative approach is a latent class or mixture model in which the distribution of the intensity parameter is a step function. This gives a solution with many classes that is difficult to interpret. A combination of the two models, resulting in a mixture model with two gamma distributions, however, fits the data very well. Moreover, it yields a substantively satisfactory interpretation: two heterogeneous classes of 'good' and 'poor' spelling children can be identified. Therefore, mixture models for the analysis of overdispersed repeated count data are proposed, where the counts have independent Poisson distributions conditional on the Poisson parameter whose distribution is a mixture of gamma distributions. Combining marginal maximum likelihood methods and the EM algorithm leads to straightforward estimations of the models, for which goodness-of-fit tests are also presented.

Proceedings ArticleDOI
27 Nov 1995
TL;DR: It is shown from experiments that considerably high classification performance for small sample size of training data can be realized and a structure of the network is easily determined by an incorporated statistical model.
Abstract: The present paper proposes a new probabilistic neural network based on a log-linearized Gaussian mixture model, which can estimate a posteriori probability for pattern classification problems. Although a structure of the proposed network represents a statistic model, a forward calculation and a backward learning rule based on the maximum likelihood estimation can be defined in the same manner as the error back propagation neural network model. It is shown from experiments that considerably high classification performance for small sample size of training data can be realized and a structure of the network is easily determined by an incorporated statistical model.

Journal ArticleDOI
Chulho Jung1
TL;DR: The authors developed a method of forecasting foreign exchange rate by normal mixture model (NMM) which initially establishes a set of exchange rate models and switches from one model to another probabilistically, depending on supply shocks or government policy changes.
Abstract: Develops a method of forecasting foreign exchange rate by normal mixture model (NMM). Initially establishes a set of exchange rate models and switches from one model to another probabilistically, depending on supply shocks or government policy changes. By assuming that the population distribution of foreign exchange rate is a mixture of normal distributions, these models can then be estimated simultaneously. Uses the estimated parameters of the model to forecast foreign exchange rate, and then four foreign exchange rate models are used to estimate the NMM. The out‐of‐sample forecasting results obtained show that we can decrease the mean squared error (MSE) of forecast error dramatically by using the NMM, compared with the MSE of the best forecast of each separate model.

Journal ArticleDOI
TL;DR: A heuristic approach is considered, based on the use of the EM algorithm and nonparametric density estimation with a sequential increase in the number of components of the mixture, which is a discrete mixture of Gaussian distributions.
Abstract: The paper is devoted to the problem of statistical estimation of a multivariate distribution density, which is a discrete mixture of Gaussian distributions. A heuristic approach is considered, based on the use of the EM algorithm and nonparametric density estimation with a sequential increase in the number of components of the mixture. Criteria for testing of model adequacy are discussed.

Proceedings ArticleDOI
27 Nov 1995
TL;DR: The max-min propagation neural network model is considered as a hierarchical mixture of experts by replacing the max (min) units with softmax functions, and a gradient ascent algorithm and an expectation-maximization algorithm are presented.
Abstract: The max-min propagation neural network model is considered as a hierarchical mixture of experts by replacing the max (min) units with softmax functions. The resulting mixture is different from the model of Jordan and Jacobs, but we exploit the similarities between both models to derive a probability model. Learning is treated as a maximum-likelihood problem, in particular we present a gradient ascent algorithm and an expectation-maximization algorithm. Simulation results on the parity problem and the majority problem are reported.

Journal ArticleDOI
TL;DR: In this article, the Gibbs sampler and adaptive rejection sampling methods for log-concave densities were used for Bayesian analysis of two overdispersed generalized Poisson models.
Abstract: In this paper, we consider the Bayesian analysis of two overdispersed Poisson models. The first is an overdispersed generalized Poisson model. The second is an ordinary Poisson and overdispersed generalized Poisson mixture model. Shoukri and Consul (1989, Comimintiicationis inz Statistics: Simnllationi anzd Comiiptutationi 18, 1465-1480) have previously considered a limited form of approximate Bayesian analysis for the first of these two models requiring the use of Pearson curves and the assumption that a certain model parameter has support on a finite number of values. By way of comparison, this paper demonstrates how a full Bayesian analysis of either model may proceed by making use of the Gibbs sampler and adaptive rejection sampling methods for log-concave densities. The methodology is illustrated with an application to a biological data set.

01 Jan 1995
TL;DR: This work introduces a theoretically consistent, segment-level posterior distribution model using context-dependent models, which models intra-segment correlation indirectly using a mixture of segment-length models, each of which uses conditionally independent time samples.
Abstract: This dissertation presents alternative parametric statistical models of phonetically-based segments for use in continuous speech recognition (CSR). A categorization of segment modeling approaches is proposed according to two characteristics: the assumed form of the probability distribution and the representation chosen for segment observations. The question of distribution form divides models into two groups: those based on conditional probability densities of feature given label and those using a posteriori probabilities of label given feature. The second characteristic concerns whether a model uses a variable or fixed-length representation of observed speech segments. The choices for both characteristics have important implications, particularly for context modeling and score normalization. In this work, specific segment models are developed in order to understand the benefits and limitations that follow from these choices. Mixture distributions are a particular type of conditional density with appealing modeling properties. Under a special case of segment models using variable-length representations and conditional densities, various forms of Gaussian mixture models are examined for the individual samples of the feature sequence. Within this framework, a systematic comparison of both existing and novel mixture modeling techniques is conducted. Parameter-tying alternatives for frame-level mixtures are explored and good performance is demonstrated with this approach. Within the conditional-density variable-length framework, a generalization of mixture distributions that captures properties of the complete segment is proposed in the form of a segment-level mixture model. This approach models intra-segment correlation indirectly using a mixture of segment-length models, each of which uses conditionally independent time samples. Parameter estimation formulae are derived and the model is explored experimentally. The alternative assumption of modeling based on a posteriori probabilities is examined through the development of a recognition formalism using classification and segmentation scoring. Posterior distributions have been less well studied than conditional densities in the context of CSR, and this work introduces a theoretically consistent, segment-level posterior distribution model using context-dependent models. Issues concerning fixed versus variable-length representations and segmentation scoring are explored experimentally. Finally, some general conclusions are drawn concerning the practical and theoretical trade-offs for the models examined.

Journal ArticleDOI
TL;DR: The goal of this research is the development of effective visualization techniques to portray the mixture model parameters as they change in time in an inherently high-dimensional process.
Abstract: This article focuses on recent work that analyzes the expectation maximization (EM) evolution of mixtures-based estimators. The goal of this research is the development of effective visualization techniques to portray the mixture model parameters as they change in time. This is an inherently high-dimensional process. Techniques are presented that portray the time evolution of univariate, bivariate, and trivariate finite and adaptive mixtures estimators. Adaptive mixtures is a recently developed variable bandwidth kernel estimator where each of the kernels is not constrained to reside at a sample location. The future role of these techniques in developing new versions of the adaptive mixtures procedure is also discussed.

Journal ArticleDOI
TL;DR: It is shown that a mixture model can be converted into a series system, and the IFR and DFR preservabilities for mixture models are studied, and various profust reliability bounds are given.

Proceedings ArticleDOI
27 Nov 1995
TL;DR: The author presents the EM (Expectation-Maximization) algorithm to estimate parameters of a mixture model that can be applied to the recognition of multiple objects in an image plane.
Abstract: Proposes a mixture model that can be applied to the recognition of multiple objects in an image plane. The model consists of any shape of modules; each module is a probability density function of data points with scale and shift parameters, and the modules are combined with weight probabilities. The author presents the EM (Expectation-Maximization) algorithm to estimate those parameters. The author also modifies the algorithm in the case that data points are restricted in an attention window.

Journal ArticleDOI
TL;DR: In this article, the authors developed a multiprocess dynamic Poisson model for estimating and forecasting a Poisson random variable with a time-varying parameter, similar to Harrison and Stevens' model.
Abstract: This article develops the multiprocess dynamic Poisson model for estimating and forecasting a Poisson random variable with a timevarying parameter. Its characteristics are similar to the multiprocess dynamic linear model of Harrison and Stevens. Its precision increases when the parameter remains unchanged, it reacts quickly to real parameter changes, and it is not sensitive to outliers. But the observation distribution is Poisson instead of normal, so the gamma conjugate family is used. Perturbation distributions and observation error distributions are not required, because the extrapolated conditional parameter distributions and conditional observations distributions are found directly in the gamma conjugate family. The conditional posterior distribution is found by Bayes's theorem. The theorem of Pena and Guttman about the optimal condensing of the mixture of normal distributions to a single normal distribution in terms of minimizing Kullbeck-Liebler distance is generalized to the optimal conde...

Posted ContentDOI
TL;DR: In this article, a mixture model involving the inverse Gaussian distribution and its length biased version is studied from a Bayesian view-point using proper priors, the Bayes estimates of the parameters of the model are derived and the results are applied on the aircraft data of Proschan (1963,Technometrics,5, 375-383).
Abstract: In this paper a mixture model involving the inverse Gaussian distribution and its length biased version is studied from a Bayesian view-point. Using proper priors, the Bayes estimates of the parameters of the model are derived and the results are applied on the aircraft data of Proschan (1963,Technometrics,5, 375–383). The posterior distributions of the parameters are expressed in terms of the confluent-hypergeometric function and the modified Bessel function of the third kind. The integral involved in the expression of the estimate of the mean is evaluated by numerical techniques.

Journal ArticleDOI
TL;DR: In this article, the authors extend the class of zero-order threshold autoregressive models to a much richer class of mixture models, which have the important property of duality which corresponds to time reversal.
Abstract: In this paper we extend the class of zero-order threshold autoregressive models to a much richer class of mixture models. The new class has the important property of duality which, as we show, corresponds to time reversal. We are then able to obtain the time reversals of the zero-order threshold models and to characterise the time-reversible members of this subclass. These turn out to be quite trivial. The time-reversible models of the more general class do not suffer in this way. The complete stationary distributional structure is given, as are various moments, in particular the autocovariance function. This is shown to be or ARMA type. Finally we give two examples, the second of which extends from the finite to the countable mixture case. The general theory for this extension will be given elsewhere