scispace - formally typeset
Search or ask a question
Topic

Expectation–maximization algorithm

About: Expectation–maximization algorithm is a research topic. Over the lifetime, 11823 publications have been published within this topic receiving 528693 citations. The topic is also known as: EM algorithm & Expectation Maximization.


Papers
More filters
Journal ArticleDOI
TL;DR: A systematic probabilistic framework that leads to both optimal and near-optimal OFDM detection schemes in the presence of unknown PHN is presented and it is pointed out that the expectation-maximization algorithm is a special case of the variational-inference-based joint estimator.
Abstract: This paper studies the mitigation of phase noise (PHN) in orthogonal frequency-division multiplexing (OFDM) data detection. We present a systematic probabilistic framework that leads to both optimal and near-optimal OFDM detection schemes in the presence of unknown PHN. In contrast to the conventional approach that cancels the common (average) PHN, our aim is to jointly estimate the complete PHN sequence and the data symbol sequence. We derive a family of low-complexity OFDM detectors for this purpose. The theoretical foundation on which these detectors are based is called variational inference, an approximate probabilistic inference technique associated with the minimization of variational free energy. In deriving the proposed schemes, we also point out that the expectation-maximization algorithm is a special case of the variational-inference-based joint estimator. Further complexity reduction is obtained using the conjugate gradient (CG) method, and only a few CG iterations are needed to closely approach the ideal joint estimator output

114 citations

Journal ArticleDOI
TL;DR: The purpose of this paper is to compare the EM and MCMC approaches in three cases of different complexity; the examples include model order selection, continuous-time HMMs and variants of HMMs in which the observed data depends on many hidden variables in an overlapping fashion.
Abstract: Hidden Markov models (HMMs) and related models have become standard in statistics during the last 15--20 years, with applications in diverse areas like speech and other statistical signal processing, hydrology, financial statistics and econometrics, bioinformatics etc. Inference in HMMs is traditionally often carried out using the EM algorithm, but examples of Bayesian estimation, in general implemented through Markov chain Monte Carlo (MCMC) sampling are also frequent in the HMM literature. The purpose of this paper is to compare the EM and MCMC approaches in three cases of different complexity; the examples include model order selection, continuous-time HMMs and variants of HMMs in which the observed data depends on many hidden variables in an overlapping fashion. All these examples in some way or another originate from real-data applications. Neither EM nor MCMC analysis of HMMs is a black-box methodology without need for user-interaction, and we will illustrate some of the problems, like poor mixing and long computation times, one may expect to encounter.

114 citations

Journal ArticleDOI
TL;DR: In this paper, the null and non-null asymptotic distributions of the Wald test, the Lagrange multiplier test (Rao's efficient score test), and the likelihood ratio test are obtained.
Abstract: Statistical inference for a system of simultaneous, nonlinear, implicit equations is discussed. The discussion considers inference as an adjunct to maximum likelihood estimation rather than in a general setting. The null and non-null asymptotic distributions of the Wald test, the Lagrange multiplier test (Rao's efficient score test), and the likelihood ratio test are obtained. Several refinements in the existing theory of maximum likelihood estimation are accomplished as intermediate steps. IT IS NECESSARY to compute the power of a statistical test in two instances. The first is in the design of an experiment. In design, one is obliged to verify, prior to the expenditure of resources, that an experimental effect would be detected with a reasonably high probability. Peak load electricity pricing experiments come immediately to mind as examples. The second is when failure to reject a null hypothesis is used to claim that the data support the null hypothesis. To validate this claim, it must be shown that candidate alternatives would have been detected with a reasonably high probability. This article sets forth formulas for asymptotic approximations of power for tests commonly used in connection with maximum likelihood estimation for a system of simultaneous, nonlinear, implicit equations. The reader who is interested only in this result should skim Sections 2 and 5 to become familiar with the notation and then read Section 6. See Gallant and Jorgenson [7] for similar formulas if two- and three-stage estimation methods are employed instead. Several refinements in the existing theory of maximum likelihood estimation (Amemiya [2]) are accomplished as intermediate steps in the derivation of the asymptotic approximations. They are as follows. In any theory of nonlinear statistical analysis, various sequences of random functions must converge uniformly in their argument. However, merely listing these sequences and assuming uniform convergence is not very helpful to the practitioner. Conditions which are easily recognized as obtaining or not obtaining in an application are preferabie. Here, the notion of Cesaro summable sequences is used to show that uniform convergence obtains if the log likelihood and its derivatives are dominated by integrable functions. If normal errors are imposed, then it is shown that the requisite domination may be stated in terms of the structural model itself. The critical assumption is that the limit of the log likelihood must have a unique maximum. This implies strong consistency of the maximum likelihood estimator itself, not merely that there exists a solution of the first order conditions which is

114 citations

Journal ArticleDOI
TL;DR: This framework starts by formulating the minimum-mean-square error (MMSE)-based solution in the context of multiple simultaneous speakers and background noise, and outlines the importance of the estimation of the activities of the speakers.
Abstract: We propose a new framework for joint multichannel speech source separation and acoustic noise reduction. In this framework, we start by formulating the minimum-mean-square error (MMSE)-based solution in the context of multiple simultaneous speakers and background noise, and outline the importance of the estimation of the activities of the speakers. The latter is accurately achieved by introducing a latent variable that takes N+1 possible discrete states for a mixture of N speech signals plus additive noise. Each state characterizes the dominance of one of the N+1 signals. We determine the posterior probability of this latent variable, and show how it plays a twofold role in the MMSE-based speech enhancement. First, it allows the extraction of the second order statistics of the noise and each of the speech signals from the noisy data. These statistics are needed to formulate the multichannel Wiener-based filters (including the minimum variance distortionless response). Second, it weighs the outputs of these linear filters to shape the spectral contents of the signals' estimates following the associated target speakers' activities. We use the spatial and spectral cues contained in the multichannel recordings of the sound mixtures to compute the posterior probability of this latent variable. The spatial cue is acquired by using the normalized observation vector whose distribution is well approximated by a Gaussian-mixture-like model, while the spectral cue can be captured by using a pre-trained Gaussian mixture model for the log-spectra of speech. The parameters of the investigated models and the speakers' activities (posterior probabilities of the different states of the latent variable) are estimated via expectation maximization. Experimental results including comparisons with the well-known independent component analysis and masking are provided to demonstrate the efficiency of the proposed framework.

114 citations

Journal ArticleDOI
TL;DR: In this paper, a new Fourier-von Mises image model is identified, with phase differences between Fouriertransformed images having von Mises distributions, and null set distortion criteria are proposed, with each criterion uniquely minimized by a particular set of polynomial functions.
Abstract: A warping is a function that deforms images by mapping between image domains. The choice of function is formulated statistically as maximum penalized likelihood, where the likelihood measures the similarity between images after warping and the penalty is a measure of distortion of a warping. The paper addresses two issues simultaneously, of how to choose the warping function and how to assess the alignment. A new, Fourier–von Mises image model is identified, with phase differences between Fourier-transformed images having von Mises distributions. Also, new, null set distortion criteria are proposed, with each criterion uniquely minimized by a particular set of polynomial functions. A conjugate gradient algorithm is used to estimate the warping function, which is numerically approximated by a piecewise bilinear function. The method is motivated by, and used to solve, three applied problems: to register a remotely sensed image with a map, to align microscope images obtained by using different optics and to discriminate between species of fish from photographic images.

114 citations


Network Information
Related Topics (5)
Estimator
97.3K papers, 2.6M citations
91% related
Deep learning
79.8K papers, 2.1M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
84% related
Artificial neural network
207K papers, 4.5M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023114
2022245
2021438
2020410
2019484
2018519