scispace - formally typeset
Search or ask a question
Topic

Expectation–maximization algorithm

About: Expectation–maximization algorithm is a research topic. Over the lifetime, 11823 publications have been published within this topic receiving 528693 citations. The topic is also known as: EM algorithm & Expectation Maximization.


Papers
More filters
Proceedings ArticleDOI
25 Oct 2008
TL;DR: This paper compares a variety of different Bayesian estimators for Hidden Markov Model POS taggers with various numbers of hidden states on data sets of different sizes and finds that Variational Bayes was the fastest of all the estimators, especially on large data sets, and that explicit Gibbs sampler were generally faster than their collapsed counterparts on largeData sets.
Abstract: There is growing interest in applying Bayesian techniques to NLP problems. There are a number of different estimators for Bayesian models, and it is useful to know what kinds of tasks each does well on. This paper compares a variety of different Bayesian estimators for Hidden Markov Model POS taggers with various numbers of hidden states on data sets of different sizes. Recent papers have given contradictory results when comparing Bayesian estimators to Expectation Maximization (EM) for unsupervised HMM POS tagging, and we show that the difference in reported results is largely due to differences in the size of the training data and the number of states in the HMM. We invesigate a variety of samplers for HMMs, including some that these earlier papers did not study. We find that all of Gibbs samplers do well with small data sets and few states, and that Variational Bayes does well on large data sets and is competitive with the Gibbs samplers. In terms of times of convergence, we find that Variational Bayes was the fastest of all the estimators, especially on large data sets, and that explicit Gibbs sampler (both pointwise and sentence-blocked) were generally faster than their collapsed counterparts on large data sets.

107 citations

Journal ArticleDOI
TL;DR: This paper generalizes the well-known mixtures of Gaussians approach to density estimation and the accompanying Expectation-Maximization technique for finding the maximum likelihood parameters of the mixture to the case where each data point carries an individual d-dimensional uncertainty covariance and has unique missing data properties.
Abstract: We generalize the well-known mixtures of Gaussians approach to density estimation and the accompanying Expectation--Maximization technique for finding the maximum likelihood parameters of the mixture to the case where each data point carries an individual $d$-dimensional uncertainty covariance and has unique missing data properties. This algorithm reconstructs the error-deconvolved or "underlying" distribution function common to all samples, even when the individual data points are samples from different distributions, obtained by convolving the underlying distribution with the heteroskedastic uncertainty distribution of the data point and projecting out the missing data directions. We show how this basic algorithm can be extended with conjugate priors on all of the model parameters and a "split-and-merge" procedure designed to avoid local maxima of the likelihood. We demonstrate the full method by applying it to the problem of inferring the three-dimensional velocity distribution of stars near the Sun from noisy two-dimensional, transverse velocity measurements from the Hipparcos satellite.

107 citations

Patent
23 May 2002
TL;DR: In this paper, a visual motion analysis method that uses multiple layered global motion models to both detect and reliably track an arbitrary number of moving objects appearing in image sequences is presented, where each global model includes a background layer and one or more foreground polybones, each foreground polybone including a parametric shape model, an appearance model, and a motion model describing an associated moving object.
Abstract: A visual motion analysis method that uses multiple layered global motion models to both detect and reliably track an arbitrary number of moving objects appearing in image sequences Each global model includes a background layer and one or more foreground “polybones”, each foreground polybone including a parametric shape model, an appearance model, and a motion model describing an associated moving object Each polybone includes an exclusive spatial support region and a probabilistic boundary region, and is assigned an explicit depth ordering Multiple global models having different numbers of layers, depth orderings, motions, etc, corresponding to detected objects are generated, refined using, for example, an EM algorithm, and then ranked/compared Initial guesses for the model parameters are drawn from a proposal distribution over the set of potential (likely) models Bayesian model selection is used to compare/rank the different models, and models having relatively high posterior probability are retained for subsequent analysis

107 citations

Journal ArticleDOI
TL;DR: A (supervised) logistic regression algorithm for the classification of incomplete data is developed and is extended to the semisupervised case by incorporating graph-based regularization.
Abstract: We address the incomplete-data problem in which feature vectors to be classified are missing data (features). A (supervised) logistic regression algorithm for the classification of incomplete data is developed. Single or multiple imputation for the missing data is avoided by performing analytic integration with an estimated conditional density function (conditioned on the observed data). Conditional density functions are estimated using a Gaussian mixture model (GMM), with parameter estimation performed using both expectation-maximization (EM) and variational Bayesian EM (VB-EM). The proposed supervised algorithm is then extended to the semisupervised case by incorporating graph-based regularization. The semisupervised algorithm utilizes all available data-both incomplete and complete, as well as labeled and unlabeled. Experimental results of the proposed classification algorithms are shown

106 citations

01 Jan 2005
TL;DR: It is shown that under general, simple, veriable conditions, any EM sequence is convergent, if the maximizer at the M-step is unique; this condition is almost always satis- ed in practice.
Abstract: It is well known that the likelihood sequence of the EM algorithm is non- decreasing and convergent (Dempster, Laird and Rubin (1977)), and that the limit points of the EM algorithm are stationary points of the likelihood (Wu (1982)), but the issue of the convergence of the EM sequence itself has not been completely settled. In this paper we close this gap and show that under general, simple, veriable conditions, any EM sequence is convergent. In pathological cases we show that the sequence is cycling in the limit among a nite number of stationary points with equal likelihood. The results apply equally to the optimization transfer class of algorithms (MM algorithm) of Lange, Hunter, and Yang (2000). Two dieren t EM algorithms constructed on the same dataset illustrate the convergence and the cyclic behavior. This paper contains new results concerning the convergence of the EM al- gorithm. The EM algorithm was brought into the limelight by Dempster, Laird and Rubin (1977) as a general iterative method of computing the maximum likelihood estimator by maximizing a simpler likelihood on an augmented data space. However, the problem of the convergence of the algorithm has not been satisfactory resolved. Wu (1983), the main theoretical contribution in this area, showed that the limit points of the EM algorithm are stationary points of the likelihood, and that when the likelihood is unimodal, any EM sequence is con- vergent. Boyles (1983) has a number of results along similar lines. These results still allow the possibility of a non-convergent EM sequence when the likelihood is not unimodal. More importantly, the EM algorithm is useful when the likelihood is hard to obtain directly; for these cases, the unimodality of the likelihood is very dicult to verify. Here we give simple, general, veriable conditions for con- vergence: our main result (Theorem 3) is that any EM sequence is convergent, if the maximizer at the M-step is unique. This condition is almost always satis- ed in practice (otherwise the particular EM data augmentation scheme would

106 citations


Network Information
Related Topics (5)
Estimator
97.3K papers, 2.6M citations
91% related
Deep learning
79.8K papers, 2.1M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
84% related
Artificial neural network
207K papers, 4.5M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023114
2022245
2021438
2020410
2019484
2018519