scispace - formally typeset
Search or ask a question

Showing papers on "Mixture model published in 2003"


Proceedings ArticleDOI
28 Jul 2003
TL;DR: Three hierarchical probabilistic mixture models which aim to describe annotated data with multiple types, culminating in correspondence latent Dirichlet allocation, a latent variable model that is effective at modeling the joint distribution of both types and the conditional distribution of the annotation given the primary type.
Abstract: We consider the problem of modeling annotated data---data with multiple types where the instance of one type (such as a caption) serves as a description of the other type (such as an image). We describe three hierarchical probabilistic mixture models which aim to describe such data, culminating in correspondence latent Dirichlet allocation, a latent variable model that is effective at modeling the joint distribution of both types and the conditional distribution of the annotation given the primary type. We conduct experiments on the Corel database of images and captions, assessing performance in terms of held-out likelihood, automatic annotation, and text-based image retrieval.

1,199 citations


Journal ArticleDOI
TL;DR: It is demonstrated that multiple trajectory classes can be estimated and appear optimal for nonnormal data even when only 1 group exists in the population.
Abstract: Growth mixture models are often used to determine if subgroups exist within the population that follow qualitatively distinct developmental trajectories. However, statistical theory developed for finite normal mixture models suggests that latent trajectory classes can be estimated even in the absence of population heterogeneity if the distribution of the repeated measures is nonnormal. By drawing on this theory, this article demonstrates that multiple trajectory classes can be estimated and appear optimal for nonnormal data even when only 1 group exists in the population. Further, the within-class parameter estimates obtained from these models are largely uninterpretable. Significant predictive relationships may be obscured or spurious relationships identified. The implications of these results for applied research are highlighted, and future directions for quantitative developments are suggested.

922 citations


Journal ArticleDOI
TL;DR: Simple methods to choose sensible starting values for the EM algorithm to get maximum likelihood parameter estimation in mixture models are compared and the simple random initialization which is probably the most employed way of initiating EM is often outperformed by strategies using CEM, SEM or shorts runs of EM before running EM.

619 citations


Journal ArticleDOI
TL;DR: A Bayesian approach is adopted in which some of the model parameters are shared and others more loosely connected through a joint prior distribution that can be learned from the data to combine the best parts of both the statistical multilevel approach and the neural network machinery.
Abstract: Modeling a collection of similar regression or classification tasks can be improved by making the tasks 'learn from each other'. In machine learning, this subject is approached through 'multitask learning', where parallel tasks are modeled as multiple outputs of the same network. In multilevel analysis this is generally implemented through the mixed-effects linear model where a distinction is made between 'fixed effects', which are the same for all tasks, and 'random effects', which may vary between tasks. In the present article we will adopt a Bayesian approach in which some of the model parameters are shared (the same for all tasks) and others more loosely connected through a joint prior distribution that can be learned from the data. We seek in this way to combine the best parts of both the statistical multilevel approach and the neural network machinery. The standard assumption expressed in both approaches is that each task can learn equally well from any other task. In this article we extend the model by allowing more differentiation in the similarities between tasks. One such extension is to make the prior mean depend on higher-level task characteristics. More unsupervised clustering of tasks is obtained if we go from a single Gaussian prior to a mixture of Gaussians. This can be further generalized to a mixture of experts architecture with the gates depending on task characteristics. All three extensions are demonstrated through application both on an artificial data set and on two real-world problems, one a school problem and the other involving single-copy newspaper sales.

610 citations


Proceedings ArticleDOI
06 Jul 2003
TL;DR: The paper addresses the design of working recognition engines and results achieved with respect to the alluded alternatives and describes a speech corpus consisting of acted and spontaneous emotion samples in German and English language.
Abstract: In this contribution we introduce speech emotion recognition by use of continuous hidden Markov models. Two methods are propagated and compared throughout the paper. Within the first method a global statistics framework of an utterance is classified by Gaussian mixture models using derived features of the raw pitch and energy contour of the speech signal. A second method introduces increased temporal complexity applying continuous hidden Markov models considering several states using low-level instantaneous features instead of global statistics. The paper addresses the design of working recognition engines and results achieved with respect to the alluded alternatives. A speech corpus consisting of acted and spontaneous emotion samples in German and English language is described in detail. Both engines have been tested and trained using this equivalent speech corpus. Results in recognition of seven discrete emotions exceeded 86% recognition rate. As a basis of comparison the similar judgment of human deciders classifying the same corpus at 79.8% recognition rate was analyzed.

599 citations


Book
01 Jan 2003
TL;DR: The Basis for, and Advantages of, Bayesian Model Estimation via Repeated Sampling via Repeations Sampling are explained and models for Spatial Outcomes and Geographical Association are described.
Abstract: Preface. The Basis for, and Advantages of, Bayesian Model Estimation via Repeated Sampling. Hierarchical Mixture Models. Regression Models. Analysis of Multi-Level Data. Models for Time Series. Analysis of Panel Data. Models for Spatial Outcomes and Geographical Association. Structural Equation and Latent Variable Models. Survival and Event History Models. Modelling and Establishing Causal Relations: Epidemiological Methods and Models. Index.

596 citations


Proceedings Article
13 Oct 2003
TL;DR: This paper proposes to model the target distribution as a nonparametric mixture model, and presents the general tracking recursion in this case, and shows how a Monte Carlo implementation of the general recursion leads to a mixture of particle filters that interact only in the computation of the mixture weights, leading to an efficient numerical algorithm.
Abstract: In recent years particle filters have become a tremendouslypopular tool to perform tracking for non-linearand/or non-Gaussian models. This is due to their simplicity,generality and success over a wide range of challengingapplications. Particle filters, and Monte Carlo methodsin general, are however poor at consistently maintainingthe multi-modality of the target distributions that may arisedue to ambiguity or the presence of multiple objects. Toaddress this shortcoming this paper proposes to model thetarget distribution as a non-parametric mixture model, andpresents the general tracking recursion in this case. It isshown how a Monte Carlo implementation of the generalrecursion leads to a mixture of particle filters that interactonly in the computation of the mixture weights, thus leadingto an efficient numerical algorithm, where all the resultspertaining to standard particle filters apply. The ability ofthe new method to maintain posterior multi-modality is illustratedon a synthetic example and a real world trackingproblem involving the tracking of football players in a videosequence.

448 citations


Journal ArticleDOI
TL;DR: A general empirical Bayes modelling approach which allows for replicate expression profiles in multiple conditions is proposed and used in a study of mammary cancer in the rat, where four distinct patterns of expression are possible.
Abstract: DNA microarrays provide for unprecedented large-scale views of gene expression and, as a result, have emerged as a fundamental measurement tool in the study of diverse biological systems. Statistical questions abound, but many traditional data analytic approaches do not apply, in large part because thousands of individual genes are measured with relatively little replication. Empirical Bayes methods provide a natural approach to microarray data analysis because they can significantly reduce the dimensionality of an inference problem while compensating for relatively few replicates by using information across the array. We propose a general empirical Bayes modelling approach which allows for replicate expression profiles in multiple conditions. The hierarchical mixture model accounts for differences among genes in their average expression levels, differential expression for a given gene among cell types, and measurement fluctuations. Two distinct parameterizations are considered: a model based on Gamma distributed measurements and one based on log-normally distributed measurements. False discovery rate and related operating characteristics of the methodology are assessed in a simulation study. We also show how the posterior odds of differential expression in one version of the model is related to the ratio of the arithmetic mean to the geometric mean of the two sample means. The methodology is used in a study of mammary cancer in the rat, where four distinct patterns of expression are possible. Copyright © 2003 John Wiley & Sons, Ltd.

389 citations


Journal ArticleDOI
TL;DR: A heuristic for searching for the optimal component to insert in the greedy learning of gaussian mixtures is proposed and can be particularly useful when the optimal number of mixture components is unknown.
Abstract: This article concerns the greedy learning of gaussian mixtures. In the greedy approach, mixture components are inserted into the mixture one after the other. We propose a heuristic for searching for the optimal component to insert. In a randomized manner, a set of candidate new components is generated. For each of these candidates, we find the locally optimal new component and insert it into the existing mixture. The resulting algorithm resolves the sensitivity to initialization of state-of-the-art methods, like expectation maximization, and has running time linear in the number of data points and quadratic in the (final) number of mixture components. Due to its greedy nature, the algorithm can be particularly useful when the optimal number of mixture components is unknown. Experimental results comparing the proposed algorithm to other methods on density estimation and texture segmentation are provided.

380 citations


Proceedings ArticleDOI
01 Jan 2003
TL;DR: In this paper, the target distribution is modeled as a nonparametric mixture model, and the general tracking recursion is used to maintain the posterior multimodality of the target distributions.
Abstract: In recent years particle filters have become a tremendously popular tool to perform tracking for nonlinear and/or nonGaussian models This is due to their simplicity, generality and success over a wide range of challenging applications Particle filters, and Monte Carlo methods in general, are however poor at consistently maintaining the multimodality of the target distributions that may arise due to ambiguity or the presence of multiple objects To address this shortcoming this paper proposes to model the target distribution as a nonparametric mixture model, and presents the general tracking recursion in this case It is shown how a Monte Carlo implementation of the general recursion leads to a mixture of particle filters that interact only in the computation of the mixture weights, thus leading to an efficient numerical algorithm, where all the results pertaining to standard particle filters apply The ability of the new method to maintain posterior multimodality is illustrated on a synthetic example and a real world tracking problem involving the tracking of football players in a video sequence

374 citations


Journal ArticleDOI
TL;DR: An adaptation of a new expectation-maximization based competitive mixture decomposition algorithm is introduced and it is shown that it efficiently and reliably performs mixture decompositions of t-distributions.

Proceedings Article
09 Dec 2003
TL;DR: A generative latent variable model for rating-based collaborative filtering called the User Rating Profile model (URP), which represents each user as a mixture of user attitudes, and the mixing proportions are distributed according to a Dirichlet random variable.
Abstract: In this paper we present a generative latent variable model for rating-based collaborative filtering called the User Rating Profile model (URP). The generative process which underlies URP is designed to produce complete user rating profiles, an assignment of one rating to each item for each user. Our model represents each user as a mixture of user attitudes, and the mixing proportions are distributed according to a Dirichlet random variable. The rating for each item is generated by selecting a user attitude for the item, and then selecting a rating according to the preference pattern associated with that attitude. URP is related to several models including a multinomial mixture model, the aspect model [7], and LDA [1], but has clear advantages over each.

Journal ArticleDOI
TL;DR: The utility of the bounded cumulative hazard model in cure rate estimation is considered, which is an appealing alternative to the widely used two-component mixture model, and is particularly suitable for semiparametric and Bayesian methods of statistical inference.
Abstract: This article considers the utility of the bounded cumulative hazard model in cure rate estimation, which is an appealing alternative to the widely used two-component mixture model. This approach has the following distinct advantages: (1) It allows for a natural way to extend the proportional hazards regression model, leading to a wide class of extended hazard regression models. (2) In some settings the model can be interpreted in terms of biologically meaningful parameters. (3) The model structure is particularly suitable for semiparametric and Bayesian methods of statistical inference. Notwithstanding the fact that the model has been around for less than a decade, a large body of theoretical results and applications has been reported to date. This review article is intended to give a big picture of these modeling techniques and associated statistical problems. These issues are discussed in the context of survival data in cancer.

Journal ArticleDOI
TL;DR: It is shown that all currently used latent variable models can be mapped into equivalent mixture models, which facilitates their simulation, statistical fitting and the study of their large portfolio properties.
Abstract: We analyse the mathematical structure of portfolio credit risk models with particular regard to the modelling of dependence between default events in these models We explore the role of copulas in latent variable models (the approach that underlies KMV and CreditMetrics) and use non-Gaussian copulas to present extensions to standard industry models We explore the role of the mixing distribution in Bernoulli mixture models (the approach underlying CreditRisk) and derive large portfolio approximations for the loss distribution We show that all currently used latent variable models can be mapped into equivalent mixture models, which facilitates their simulation, statistical fitting and the study of their large portfolio properties Finally we develop and test several approaches to model calibration based on the Bernoulli mixture representation; we find that maximum likelihood estimation of parametric mixture models generally outperforms simple moment estimation methods JEL Subject Classification: G31, G11, C15

Proceedings Article
21 Aug 2003
TL;DR: FMM extends existing partitioning/clustering algorithms for collaborative filtering by clustering both users and items together simultaneously without assuming that each user and item should only belong to a single cluster.
Abstract: This paper presents a flexible mixture model (FMM) for collaborative filtering. FMM extends existing partitioning/clustering algorithms for collaborative filtering by clustering both users and items together simultaneously without assuming that each user and item should only belong to a single cluster. Furthermore, with the introduction of 'preference' nodes, the proposed framework is able to explicitly model how users rate items, which can vary dramatically, even among the users with similar tastes on items. Empirical study over two datasets of movie ratings has shown that our new algorithm outperforms five other collaborative filtering algorithms substantially.

Journal ArticleDOI
TL;DR: MCLUST is a software package for model-based clustering, density estimation and discriminant analysis interfaced to the S-PLUS commercial software and the R language that implements parameterized Gaussian hierarchical clustering algorithms and the EM algorithm for parameterizedGaussian mixture models with the possible addition of a Poisson noise term.
Abstract: MCLUST is a software package for model-based clustering, density estimation and discriminant analysis interfaced to the S-PLUS commercial software and the R language. It implements parameterized Gaussian hierarchical clustering algorithms and the EM algorithm for parameterized Gaussian mixture models with the possible addition of a Poisson noise term. Also included are functions that combine hierarchical clustering, EM and the Bayesian Information Criterion (BIC) in comprehensive strategies for clustering, density estimation, and discriminant analysis. MCLUST provides functionality for displaying and visualizing clustering and classification results. A web page with related links can be found at http://www.stat.washington.edu/mclust.

Proceedings ArticleDOI
13 Oct 2003
TL;DR: A Dynamically Multi-Linked Hidden Markov Model (DML-HMM) is developed to interpret group activities involving multiple objects captured in an outdoor scene based on the discovery of salient dynamic interlinks among multiple temporal events using DPNs.
Abstract: Dynamic Probabilistic Networks (DPNs) are exploited for modeling the temporal relationships among a set of different object temporal events in the scene for a coherent and robust scene-level behaviour interpretation. In particular, we develop a Dynamically Multi-Linked Hidden Markov Model (DML-HMM) to interpret group activities involving multiple objects captured in an outdoor scene. The model is based on the discovery of salient dynamic interlinks among multiple temporal events using DPNs. Object temporal events are detected and labeled using Gaussian Mixture Models with automatic model order selection. A DML-HMM is built using Schwarz's Bayesian Information Criterion based factorisation resulting in its topology being intrinsically determined by the underlying causality and temporal order among different object events. Our experiments demonstrate that its performance on modelling group activities in a noisy outdoor scene is superior compared to that of a Multi-Observation Hidden Markov Model (MOHMM), a Parallel Hidden Markov Model (PaHMM) and a Coupled Hidden Markov Model (CHMM).

Proceedings Article
09 Dec 2003
TL;DR: This work develops a framework to incorporate side information in the form of equivalence constraints into the model estimation procedure, and demonstrates that such side information can lead to considerable improvement in clustering tasks.
Abstract: Density estimation with Gaussian Mixture Models is a popular generative technique used also for clustering. We develop a framework to incorporate side information in the form of equivalence constraints into the model estimation procedure. Equivalence constraints are defined on pairs of data points, indicating whether the points arise from the same source (positive constraints) or from different sources (negative constraints). Such constraints can be gathered automatically in some learning problems, and are a natural form of supervision in others. For the estimation of model parameters we present a closed form EM procedure which handles positive constraints, and a Generalized EM procedure using a Markov net which handles negative constraints. Using publicly available data sets we demonstrate that such side information can lead to considerable improvement in clustering tasks, and that our algorithm is preferable to two other suggested methods using the same type of side information.

Journal ArticleDOI
Robert Nowak1
TL;DR: The paper presents a distributed expectation-maximization (EM) algorithm for estimating the Gaussian components, which are common to the environment and sensor network as a whole, as well as the mixing probabilities that may vary from node to node.
Abstract: The paper considers the problem of density estimation and clustering in distributed sensor networks. It is assumed that each node in the network senses an environment that can be described as a mixture of some elementary conditions. The measurements are thus statistically modeled with a mixture of Gaussians, where each Gaussian component corresponds to one of the elementary conditions. The paper presents a distributed expectation-maximization (EM) algorithm for estimating the Gaussian components, which are common to the environment and sensor network as a whole, as well as the mixing probabilities that may vary from node to node. The algorithm produces an estimate (in terms of a Gaussian mixture approximation) of the density of the sensor data without requiring the data to be transmitted to and processed at a central location. Alternatively, the algorithm can be viewed as a distributed processing strategy for clustering the sensor data into components corresponding to predominant environmental features sensed by the network. The convergence of the distributed EM algorithm is investigated, and simulations demonstrate the potential of this approach to sensor network data analysis.

Journal ArticleDOI
TL;DR: This work focuses on mixtures of factor analyzers from the perspective of a method for model-based density estimation from high-dimensional data, and hence for the clustering of such data.

Journal ArticleDOI
TL;DR: This work evaluated several clustering algorithms that incorporate repeated measurements, and shows that algorithms that take advantage of repeated measurements yield more accurate and more stable clusters.
Abstract: Clustering is a common methodology for the analysis of array data, and many research laboratories are generating array data with repeated measurements. We evaluated several clustering algorithms that incorporate repeated measurements, and show that algorithms that take advantage of repeated measurements yield more accurate and more stable clusters. In particular, we show that the infinite mixture model-based approach with a built-in error model produces superior results.

Proceedings Article
21 Aug 2003
TL;DR: This paper analyzes the performance of semi-supervised learning of mixture models and shows that unlabeled data can lead to an increase in classification error even in situations where additional labeled data would decrease classification error.
Abstract: This paper analyzes the performance of semi-supervised learning of mixture models. We show that unlabeled data can lead to an increase in classification error even in situations where additional labeled data would decrease classification error. We present a mathematical analysis of this "degradation" phenomenon and show that it is due to the fact that bias may be adversely affected by unlabeled data. We discuss the impact of these theoretical results to practical situations.

Journal ArticleDOI
TL;DR: It is shown how the expectation maximization algorithm can be used to jointly learn clusters, while at the same time inferring the transformation associated with each input, by approximating the nonlinear transformation manifold by a discrete set of points.
Abstract: Clustering is a simple, effective way to derive useful representations of data, such as images and videos. Clustering explains the input as one of several prototypes, plus noise. In situations where each input has been randomly transformed (e.g., by translation, rotation, and shearing in images and videos), clustering techniques tend to extract cluster centers that account for variations in the input due to transformations, instead of more interesting and potentially useful structure. For example, if images from a video sequence of a person walking across a cluttered background are clustered, it would be more useful for the different clusters to represent different poses and expressions, instead of different positions of the person and different configurations of the background clutter. We describe a way to add transformation invariance to mixture models, by approximating the nonlinear transformation manifold by a discrete set of points. We show how the expectation maximization algorithm can be used to jointly learn clusters, while at the same time inferring the transformation associated with each input. We compare this technique with other methods for filtering noisy images obtained from a scanning electron microscope, clustering images from videos of faces into different categories of identification and pose and removing foreground obstructions from video. We also demonstrate that the new technique is quite insensitive to initial conditions and works better than standard techniques, even when the standard techniques are provided with extra data.

Journal ArticleDOI
TL;DR: An unsupervised learning algorithm that can obtain a probabilistic model of an object composed of a collection of parts automatically from unlabeled training data is presented.
Abstract: An unsupervised learning algorithm that can obtain a probabilistic model of an object composed of a collection of parts (a moving human body in our examples) automatically from unlabeled training data is presented. The training data include both useful "foreground" features as well as features that arise from irrelevant background clutter - the correspondence between parts and detected features is unknown. The joint probability density function of the parts is represented by a mixture of decomposable triangulated graphs which allow for fast detection. To learn the model structure as well as model parameters, an EM-like algorithm is developed where the labeling of the data (part assignments) is treated as hidden variables. The unsupervised learning technique is not limited to decomposable triangulated graphs. The efficiency and effectiveness of our algorithm is demonstrated by applying it to generate models of human motion automatically from unlabeled image sequences, and testing the learned models on a variety of sequences.

Journal ArticleDOI
TL;DR: In this paper, Bayesian inference for switching regression models and their generalizations can be achieved by the specification of loss functions which overcome the label switching problem common to all mixture models.
Abstract: This article shows how Bayesian inference for switching regression models and their generalizations can be achieved by the specification of loss functions which overcome the label switching problem common to all mixture models. We also derive an extension to models where the number of components in the mixture is unknown, based on the birthand-death technique developed in recent literature. The methods are illustrated on various real datasets.

Journal ArticleDOI
TL;DR: A parameterization of the beta-binomial mixture is developed that provides sensible inferences about the size of a closed population when probabilities of capture or detection vary among individuals.
Abstract: We develop a parameterization of the beta-binomial mixture that provides sensible inferences about the size of a closed population when probabilities of capture or detection vary among individuals. Three classes of mixture models (beta-binomial, logistic-normal, and latent-class) are fitted to recaptures of snowshoe hares for estimating abundance and to counts of bird species for estimating species richness. In both sets of data, rates of detection appear to vary more among individuals (animals or species) than among sampling occasions or locations. The estimates of population size and species richness are sensitive to model-specific assumptions about the latent distribution of individual rates of detection. We demonstrate using simulation experiments that conventional diagnostics for assessing model adequacy, such as deviance, cannot be relied on for selecting classes of mixture models that produce valid inferences about population size. Prior knowledge about sources of individual heterogeneity in detection rates, if available, should be used to help select among classes of mixture models that are to be used for inference.

Journal ArticleDOI
TL;DR: The use of features derived from multiresolution analysis of speech and the Teager energy operator for classification of drivers' speech under stressed conditions and the problem of choosing a suitable temporal scale for representing categorical differences in the data is addressed.

Journal Article
TL;DR: The class of species sampling mixture models is introduced in this article as an exten- sion of semiparametric models based on the Dirichlet process to models of all exchangeable urn distributions.
Abstract: The class of species sampling mixture models is introduced as an exten- sion of semiparametric models based on the Dirichlet process to models based on the general class of species sampling priors, or equivalently the class of all exchangeable urn distributions. Using Fubini calculus in conjunction with Pitman (1995, 1996), we derive characterizations of the posterior distribution in terms of a posterior par- tition distribution that extend the results of Lo (1984) for the Dirichlet process. These results provide a better understanding of models and have both theoretical and practical applications. To facilitate the use of our models we generalize the work in Brunner, Chan, James and Lo (2001) by extending their weighted Chinese restaurant (WCR) Monte Carlo procedure, an i.i.d. sequential importance sampling (SIS) procedure for approximating posterior mean functionals based on the Dirich- let process, to the case of approximation of mean functionals and additionally their posterior laws in species sampling mixture models. We also discuss collapsed Gibbs sampling, Polya urn Gibbs sampling and a Polya urn SIS scheme. Our framework allows for numerous applications, including multiplicative counting process models subject to weighted gamma processes, as well as nonparametric and semiparamet- ric hierarchical models based on the Dirichlet process, its two-parameter extension, the Pitman-Yor process and finite dimensional Dirichlet priors.

Journal ArticleDOI
TL;DR: This work proposes a nonparametric statistical approach, called the mixture model method (MMM), to handle the problem when there are a small number of replicates under each experimental condition, and estimates the distributions of a t -type test statistic and its null statistic using finite normal mixture models.
Abstract: An exciting biological advancement over the past few years is the use of microarray technologies to measure simultaneously the expression levels of thousands of genes. The bottleneck now is how to extract useful information from the resulting large amounts of data. An important and common task in analyzing microarray data is to identify genes with altered expression under two experimental conditions. We propose a nonparametric statistical approach, called the mixture model method (MMM), to handle the problem when there are a small number of replicates under each experimental condition. Specifically, we propose estimating the distributions of a t -type test statistic and its null statistic using finite normal mixture models. A comparison of these two distributions by means of a likelihood ratio test, or simply using the tail distribution of the null statistic, can identify genes with significantly changed expression. Several methods are proposed to effectively control the false positives. The methodology is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle ear infection.

Journal ArticleDOI
TL;DR: In this article, an extension of the EM algorithm reintroduced additive separability, thus allowing one to estimate parameters sequentially during each maximization step, and they showed that, relative to full information maximum likelihood, their sequential estimator can generate large computational savings with little loss of efficiency.
Abstract: A popular way to account for unobserved heterogeneity is to assume that the data are drawn from a finite mixture distribution. A barrier to using finite mixture models is that parameters that could previously be estimated in stages must now be estimated jointly: using mixture distributions destroys any additive separability of the log-likelihood function. We show, however, that an extension of the EM algorithm reintroduces additive separability, thus allowing one to estimate parameters sequentially during each maximization step. In establishing this result, we develop a broad class of estimators for mixture models. Returning to the likelihood problem, we show that, relative to full information maximum likelihood, our sequential estimator can generate large computational savings with little loss of efficiency.