scispace - formally typeset
Search or ask a question
Topic

Expectation–maximization algorithm

About: Expectation–maximization algorithm is a research topic. Over the lifetime, 11823 publications have been published within this topic receiving 528693 citations. The topic is also known as: EM algorithm & Expectation Maximization.


Papers
More filters
Proceedings ArticleDOI
25 Jun 2006
TL;DR: This paper introduces robust probabilistic principal component analysis and robust Probabilistic canonical correlation analysis based on a Student-t density model.
Abstract: Principal components and canonical correlations are at the root of many exploratory data mining techniques and provide standard pre-processing tools in machine learning. Lately, probabilistic reformulations of these methods have been proposed (Roweis, 1998; Tipping & Bishop, 1999b; Bach & Jordan, 2005). They are based on a Gaussian density model and are therefore, like their non-probabilistic counterpart, very sensitive to atypical observations. In this paper, we introduce robust probabilistic principal component analysis and robust probabilistic canonical correlation analysis. Both are based on a Student-t density model. The resulting probabilistic reformulations are more suitable in practice as they handle outliers in a natural way. We compute maximum likelihood estimates of the parameters by means of the EM algorithm.

97 citations

Journal ArticleDOI
16 Dec 1997
TL;DR: Viewing mixture decomposition as probabilistic clustering as opposed to parametric estimation enables both fuzzy and crisp measures of cluster validity for this problem, and uses the expectation-maximization algorithm to find clusters in the data.
Abstract: We study indices for choosing the correct number of components in a mixture of normal distributions Previous studies have been confined to indices based wholly on probabilistic models Viewing mixture decomposition as probabilistic clustering (where the emphasis is on partitioning for geometric substructure) as opposed to parametric estimation enables us to introduce both fuzzy and crisp measures of cluster validity for this problem We presume the underlying samples to be unlabeled, and use the expectation-maximization (EM) algorithm to find clusters in the data We test 16 probabilistic, 3 fuzzy and 4 crisp indices on 12 data sets that are samples from bivariate normal mixtures having either 3 or 6 components Over three run averages based on different initializations of EM, 10 of the 23 indices tested for choosing the right number of mixture components were correct in at least 9 of the 12 trials Among these were the fuzzy index of Xie-Beni, the crisp Davies-Bouldin index, and two crisp indices that are recent generalizations of Dunn’s index

97 citations

Journal ArticleDOI
Neil Shephard1
TL;DR: New strategies for the implementation of maximum likelihood estimation of nonlinear time series models are suggested and make use of recent work on the EM algorithm and iterative simulation techniques for fitting stochastic variance models to exchange rate data.
Abstract: New strategies for the implementation of maximum likelihood estimation of nonlinear time series models are suggested. They make use of recent work on the EM algorithm and iterative simulation techniques. The estimation procedures are applied to the problem of fitting stochastic variance models to exchange rate data.

97 citations

Journal ArticleDOI
TL;DR: A new approach for maximum-likelihood analyses of complex DNA histograms by the application of the EM algorithm works very well, and it converges to reasonable values for all parameters, in simulations from the estimated models.
Abstract: Flow cytometric DNA measurements yield the amount of DNA for each of a large number of cells. A DNA histogram normally consists of a mixture of one or more constellations of G0/G1-, S-, G2/M-phase cells, together with internal standards, debris, background noise, and one or more populations of clumped cells. We have modelled typical DNA histograms as a mixed distribution with Gaussian densities for the G0/G1 and G2/M phases, an S-phase density, assumed to be uniform between the G0/G1 and G2/M peaks, observed with a Gaussian error, and with Gaussian densities for standards of chicken and trout red blood cells. The debris is modelled as a truncated exponential distribution, and we also have included a uniform background noise distribution over the whole observation interval. We have explored a new approach for maximum-likelihood analyses of complex DNA histograms by the application of the EM algorithm. This algorithm was used for four observed DNA histograms of varying complexity. Our results show that the algorithm works very well, and it converges to reasonable values for all parameters. In simulations from the estimated models, we have investigated bias, variance, and correlations of the estimates.

97 citations

Journal ArticleDOI
TL;DR: The segmentation accuracy based on three two‐sample validation metrics against the estimated composite latent gold standard, which was derived from several experts' manual segmentations by an EM algorithm, yielded satisfactory accuracy with varied optimal thresholds.
Abstract: The validity of brain tumour segmentation is an important issue in image processing because it has a direct impact on surgical planning. We examined the segmentation accuracy based on three two-sample validation metrics against the estimated composite latent gold standard, which was derived from several experts' manual segmentations by an EM algorithm. The distribution functions of the tumour and control pixel data were parametrically assumed to be a mixture of two beta distributions with different shape parameters. We estimated the corresponding receiver operating characteristic curve, Dice similarity coefficient, and mutual information, over all possible decision thresholds. Based on each validation metric, an optimal threshold was then computed via maximization. We illustrated these methods on MR imaging data from nine brain tumour cases of three different tumour types, each consisting of a large number of pixels. The automated segmentation yielded satisfactory accuracy with varied optimal thresholds. The performances of these validation metrics were also investigated via Monte Carlo simulation. Extensions of incorporating spatial correlation structures using a Markov random field model were considered.

97 citations


Network Information
Related Topics (5)
Estimator
97.3K papers, 2.6M citations
91% related
Deep learning
79.8K papers, 2.1M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
84% related
Artificial neural network
207K papers, 4.5M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023114
2022245
2021438
2020410
2019484
2018519