scispace - formally typeset
Search or ask a question

Showing papers on "Expectation–maximization algorithm published in 2000"


Book
02 Oct 2000
TL;DR: The important role of finite mixture models in the statistical analysis of data is underscored by the ever-increasing rate at which articles on mixture applications appear in the mathematical and statistical literature.
Abstract: The important role of finite mixture models in the statistical analysis of data is underscored by the ever-increasing rate at which articles on mixture applications appear in the statistical and ge...

8,095 citations


Journal ArticleDOI
TL;DR: A likelihood-based density modification approach is developed that can incorporate expected electron-density information from a wide variety of sources.
Abstract: A likelihood-based approach to density modification is developed that can be applied to a wide variety of cases where some information about the electron density at various points in the unit cell is available. The key to the approach consists of developing likelihood functions that represent the probability that a particular value of electron density is consistent with prior expectations for the electron density at that point in the unit cell. These likelihood functions are then combined with likelihood functions based on experimental observations and with others containing any prior knowledge about structure factors to form a combined likelihood function for each structure factor. A simple and general approach to maximizing the combined likelihood function is developed. It is found that this likelihood-based approach yields greater phase improvement in model and real test cases than either conventional solvent flattening and histogram matching or a recent reciprocal-space solvent-flattening procedure [Terwilliger (1999), Acta Cryst. D55, 1863–1871].

1,671 citations


Journal ArticleDOI
TL;DR: The authors propose two automatic techniques (based on the Bayes theory) for the analysis of the difference image that allow an automatic selection of the decision threshold that minimizes the overall change detection error probability under the assumption that pixels in the difference picture are independent of one another.
Abstract: One of the main problems related to unsupervised change detection methods based on the "difference image" lies in the lack of efficient automatic techniques for discriminating between changed and unchanged pixels in the difference image. Such discrimination is usually performed by using empirical strategies or manual trial-and-error procedures, which affect both the accuracy and the reliability of the change-detection process. To overcome such drawbacks, in this paper, the authors propose two automatic techniques (based on the Bayes theory) for the analysis of the difference image. One allows an automatic selection of the decision threshold that minimizes the overall change detection error probability under the assumption that pixels in the difference image are independent of one another. The other analyzes the difference image by considering the spatial-contextual information included in the neighborhood of each pixel. In particular, an approach based on Markov Random Fields (MRFs) that exploits interpixel class dependency contexts is presented. Both proposed techniques require the knowledge of the statistical distributions of the changed and unchanged pixels in the difference image. To perform an unsupervised estimation of the statistical terms that characterize these distributions, they propose an iterative method based on the Expectation-Maximization (EM) algorithm. Experimental results confirm the effectiveness of both proposed techniques.

1,218 citations


Journal ArticleDOI
TL;DR: The Latent Moderated Structural Equations (LMS) as mentioned in this paper approach is a new method developed for the analysis of the general interaction model that utilizes the mixture distribution and provides a ML estimation of model parameters by adapting the EM algorithm.
Abstract: In the context of structural equation modeling, a general interaction model with multiple latent interaction effects is introduced. A stochastic analysis represents the nonnormal distribution of the joint indicator vector as a finite mixture of normal distributions. The Latent Moderated Structural Equations (LMS) approach is a new method developed for the analysis of the general interaction model that utilizes the mixture distribution and provides a ML estimation of model parameters by adapting the EM algorithm. The finite sample properties and the robustness of LMS are discussed. Finally, the applicability of the new method is illustrated by an empirical example.

1,122 citations


Journal ArticleDOI
TL;DR: The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.
Abstract: Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.

903 citations


Book ChapterDOI
26 Jun 2000
TL;DR: A method to learn object class models from unlabeled and unsegmented cluttered cluttered scenes for the purpose of visual object recognition that achieves very good classification results on human faces and rear views of cars.
Abstract: We present a method to learn object class models from unlabeled and unsegmented cluttered scenes for the purpose of visual object recognition. We focus on a particular type of model where objects are represented as flexible constellations of rigid parts (features). The variability within a class is represented by a joint probability density function (pdf) on the shape of the constellation and the output of part detectors. In a first stage, the method automatically identifies distinctive parts in the training set by applying a clustering algorithm to patterns selected by an interest operator. It then learns the statistical shape model using expectation maximization. The method achieves very good classification results on human faces and rear views of cars.

736 citations


Journal ArticleDOI
TL;DR: Maximum likelihood techniques for the joint estimation of the incidence and latency regression parameters in this model are developed using the nonparametric form of the likelihood and an EM algorithm and are applied to a data set of tonsil cancer patients treated with radiation therapy.
Abstract: Summary. Some failure time data come from a population that consists of some subjects who are susceptible to and others who are nonsusceptible to the event of interest. The data typically have heavy censoring at the end of the follow-up period, and a standard survival analysis would not always be appropriate. In such situations where there is good scientific or empirical evidence of a nonsusceptible population, the mixture or cure model can be used (Farewell, 1982, Biometrics38, 1041–1046). It assumes a binary distribution to model the incidence probability and a parametric failure time distribution to model the latency. Kuk and Chen (1992, Biometrika79, 531–541) extended the model by using Cox's proportional hazards regression for the latency. We develop maximum likelihood techniques for the joint estimation of the incidence and latency regression parameters in this model using the nonparametric form of the likelihood and an EM algorithm. A zero-tail constraint is used to reduce the near nonidentifiability of the problem. The inverse of the observed information matrix is used to compute the standard errors. A simulation study shows that the methods are competitive to the parametric methods under ideal conditions and are generally better when censoring from loss to follow-up is heavy. The methods are applied to a data set of tonsil cancer patients treated with radiation therapy.

549 citations


Journal ArticleDOI
TL;DR: Finite mixture partitions of animals and/or samples are used to give a unified linear-logistic framework for fitting all eight models of Otis et al. by maximum likelihood.
Abstract: Summary. Agresti (1994, Biometrics50, 494–500) and Norris and Pollock (1996a, Biometrics52, 639–649) suggested using methods of finite mixtures to partition the animals in a closed capture-recapture experiment into two or more groups with relatively homogeneous capture probabilities. This enabled them to fit the models Mh, Mbh (Norris and Pollock), and Mth (Agresti) of Otis et al. (1978, Wildlife Monographs62, 1–135). In this article, finite mixture partitions of animals and/or samples are used to give a unified linear-logistic framework for fitting all eight models of Otis et al. by maximum likelihood. Likelihood ratio tests are available for model comparisons. For many data sets, a simple dichotomy of animals is enough to substantially correct for heterogeneity-induced bias in the estimation of population size, although there is the option of fitting more than two groups if the data warrant it.

516 citations


Journal ArticleDOI
TL;DR: A new statistical model for time series that iteratively segments data into regimes with approximately linear dynamics and learns the parameters of each of these linear regimes is introduced and the results suggest that variational approximations are a viable method for inference and learning in switching state-space models.
Abstract: We introduce a new statistical model for time series that iteratively segments data into regimes with approximately linear dynamics and learns the parameters of each of these linear regimes. This model combines and generalizes two of the most widely used stochastic time-series models— hidden Markov models and linear dynamical systems—and is closely related to models that are widely used in the control and econometrics literatures. It can also be derived by extending the mixture of experts neural network (Jacobs, Jordan, Nowlan, & Hinton, 1991) to its fully dynamical version, in which both expert and gating networks are recurrent. Inferring the posterior probabilities of the hidden states of this model is computationally intractable, and therefore the exact expectation maximization (EM) algorithm cannot be applied. However, we present a variational approximation that maximizes a lower bound on the log-likelihood and makes use of both the forward and backward recursions for hidden Markov models and the Kalman filter recursions for linear dynamical systems. We tested the algorithm on artificial data sets and a natural data set of respiration force from a patient with sleep apnea. The results suggest that variational approximations are a viable method for inference and learning in switching state-space models.

478 citations


Proceedings ArticleDOI
13 Jun 2000
TL;DR: A method is presented to recover 3D scene structure and camera motion from multiple images without the need for correspondence information by means of an algorithm which iteratively refines a probability distribution over the set of all correspondence assignments.
Abstract: A method is presented to recover 3D scene structure and camera motion from multiple images without the need for correspondence information. The problem is framed as finding the maximum likelihood structure and motion given only the 2D measurements, integrating over all possible assignments of 3D features to 2D measurements. This goal is achieved by means of an algorithm which iteratively refines a probability distribution over the set of all correspondence assignments. At each iteration a new structure from motion problem is solved, using as input a set of 'virtual measurements' derived from this probability distribution. The distribution needed can be efficiently obtained by Markov Chain Monte Carlo sampling. The approach is cast within the framework of Expectation-Maximization, which guarantees convergence to a local maximizer of the likelihood. The algorithm works well in practice, as will be demonstrated using results on several real image sequences.

340 citations


Journal ArticleDOI
TL;DR: It is shown that the on-line EM algorithm is equivalent to the batch EM algorithm if a specific scheduling of the discount factor is employed and can be considered as a stochastic approximation method to find the maximum likelihood estimator.
Abstract: A normalized gaussian network (NGnet) (Moody & Darken, 1989) is a network of local linear regression units. The model softly partitions the input space by normalized gaussian functions, and each local unit linearly approximates the output within the partition. In this article, we propose a new on-line EM algorithm for the NGnet, which is derived from the batch EM algorithm (Xu, Jordan, & Hinton 1995), by introducing a discount factor. We show that the on-line EM algorithm is equivalent to the batch EM algorithm if a specific scheduling of the discount factor is employed. In addition, we show that the on-line EM algorithm can be considered as a stochastic approximation method to find the maximum likelihood estimator. A new regularization method is proposed in order to deal with a singular input distribution. In order to manage dynamic environments, where the input-output distribution of data changes over time, unit manipulation mechanisms such as unit production, unit deletion, and unit division are also introduced based on probabilistic interpretation. Experimental results show that our approach is suitable for function approximation problems in dynamic environments. We also apply our on-line EM algorithm to robot dynamics problems and compare our algorithm with the mixtures-of-experts family.

Journal ArticleDOI
TL;DR: In this paper, an iterative algorithm for finding sample quantiles without sorting is presented, and a generalization of the algorithm to nonlinear quantile regression is explored, termed an MM, or majorize-minimize, algorithm.
Abstract: Quantile regression is an increasingly popular method for estimating the quantiles of a distribution conditional on the values of covariates. Regression quantiles are robust against the influence of outliers and, taken several at a time, they give a more complete picture of the conditional distribution than a single estimate of the center. This article first presents an iterative algorithm for finding sample quantiles without sorting and then explores a generalization of the algorithm to nonlinear quantile regression. Our quantile regression algorithm is termed an MM, or majorize—minimize, algorithm because it entails majorizing the objective function by a quadratic function followed by minimizing that quadratic. The algorithm is conceptually simple and easy to code, and our numerical tests suggest that it is computationally competitive with a recent interior point algorithm for most problems.

01 Jan 2000
TL;DR: A greedy algorithm for learning a Gaussian mixture which is capable of achieving solutions superior to EM with k components in terms of the likelihood of a test set.

Journal ArticleDOI
TL;DR: The Gaussian mixture transition distribution model is generalized to the mixture autoregressive (MAR) model for the modelling of non‐linear time series and appears to capture features of the data better than other competing models do.
Abstract: We generalize the Gaussian mixture transition distribution (GMTD) model introduced by Le and co-workers to the mixture autoregressive (MAR) model for the modelling of non-linear time series. The models consist of a mixture of K stationary or non-stationary AR components. The advantages of the MAR model over the GMTD model include a more full range of shape changing predictive distributions and the ability to handle cycles and conditional heteroscedasticity in the time series. The stationarity conditions and autocorrelation function are derived. The estimation is easily done via a simple EM algorithm and the model selection problem is addressed. The shape changing feature of the conditional distributions makes these models capable of modelling time series with multimodal conditional distributions and with heteroscedasticity. The models are applied to two real data sets and compared with other competing models. The MAR models appear to capture features of the data better than other competing models do.

Journal ArticleDOI
TL;DR: A unified maximum likelihood method for estimating the parameters of the generalized latent trait model will be presented and in addition the scoring of individuals on the latent dimensions is discussed.
Abstract: In this paper we discuss a general model framework within which manifest variables with different distributions in the exponential family can be analyzed with a latent trait model. A unified maximum likelihood method for estimating the parameters of the generalized latent trait model will be presented. We discuss in addition the scoring of individuals on the latent dimensions. The general framework presented allows, not only the analysis of manifest variables all of one type but also the simultaneous analysis of a collection of variables with different distributions. The approach used analyzes the data as they are by making assumptions about the distribution of the manifest variables directly.

Journal ArticleDOI
TL;DR: Some asymptotic results for the Stochastic EM algorithm are given, and estimation based on this algorithm is discussed, and some implementation issues and the possibility of allowing unidentified parameters in the algorithm are discussed.
Abstract: The EM algorithm is a much used tool for maximum likelihood estimation in missing or incomplete data problems. However, calculating the conditional expectation required in the E-step of the algorithm may be infeasible, especially when this expectation is a large sum or a high-dimensional integral. Instead the expectation can be estimated by simulation. This is the common idea in the stochastic EM algorithm and the Monte Carlo EM algorithm.

Proceedings ArticleDOI
11 Jun 2000
TL;DR: The authors formulate feature registration problems as maximum likelihood or Bayesian maximum a posteriori estimation problems using mixture models and embedding of the EM algorithm within a deterministic annealing scheme in order to directly control the fuzziness of the correspondences.
Abstract: The authors formulate feature registration problems as maximum likelihood or Bayesian maximum a posteriori estimation problems using mixture models. An EM-like algorithm is proposed to jointly solve for the feature correspondences as well as the geometric transformations. A novel aspect of the authors' approach is the embedding of the EM algorithm within a deterministic annealing scheme in order to directly control the fuzziness of the correspondences. The resulting algorithm-termed mixture point matching (MPM)-can solve for both rigid and high dimensional (thin-plate spline-based) non-rigid transformations between point sets in the presence of noise and outliers. The authors demonstrate the algorithm's performance on 2D and 3D data.

Book ChapterDOI
26 Jun 2000
TL;DR: A novel observation density for the particle filter which models the statistical dependence of neighboring pixels based on a Markov random field is presented, and the effectiveness of both the low level process and the observation likelihood are demonstrated.
Abstract: A new probabilistic background model based on a Hidden Markov Model is presented. The hidden states of the model enable discrimination between foreground, background and shadow. This model functions as a low level process for a car tracker. A particle filter is employed as a stochastic filter for the car tracker. The use of a particle filter allows the incorporation of the information from the low level process via importance sampling. A novel observation density for the particle filter which models the statistical dependence of neighboring pixels based on a Markov random field is presented. The effectiveness of both the low level process and the observation likelihood are demonstrated.

Journal ArticleDOI
TL;DR: In this article, a Gauss Markov random field (GMRF) model is used for textured areas and allows an adaptive neighborhood system for edge preservation between uniform areas, which is used to estimate the texture parameters that provide the highest evidence.
Abstract: Basic textures as they appear, especially in high resolution SAR images, are affected by multiplicative speckle noise and should be preserved by despeckling algorithms. Sharp edges between different regions and strong scatterers also must be preserved. To despeckle images, the authors use a maximum aposteriori (MAP) estimation of the cross section, choosing between different prior models. The proposed approach uses a Gauss Markov random field (GMRF) model for textured areas and allows an adaptive neighborhood system for edge preservation between uniform areas. In order to obtain the best possible texture reconstruction, an expectation maximization algorithm is used to estimate the texture parameters that provide the highest evidence. Borders between homogeneous areas are detected with a stochastic region-growing algorithm, locally determining the neighborhood system of the Gauss Markov prior. Smoothed strong scatterers are found in the ratio image of the data and the filtering result and are replaced in the image. In this way, texture, edges between homogeneous regions, and strong scatterers are well reconstructed and preserved. Additionally, the estimated model parameters can be used for further image interpretation methods.

Journal ArticleDOI
TL;DR: It is found that an optimal single-stage VQ can operate at approximately 3 bits less than a state-of-the-art LSF-based 2-split VQ.
Abstract: We model the underlying probability density function of vectors in a database as a Gaussian mixture (GM) model. The model is employed for high rate vector quantization analysis and for design of vector quantizers. It is shown that the high rate formulas accurately predict the performance of model-based quantizers. We propose a novel method for optimizing GM model parameters for high rate performance, and an extension to the EM algorithm for densities having bounded support is also presented. The methods are applied to quantization of LPC parameters in speech coding and we present new high rate analysis results for band-limited spectral distortion and outlier statistics. In practical terms, we find that an optimal single-stage VQ can operate at approximately 3 bits less than a state-of-the-art LSF-based 2-split VQ.

Journal ArticleDOI
01 Dec 2000
TL;DR: A new kernel rule has been developed for road sign classification using the Laplace probability density and an Expectation–Maximization algorithm is used to maximize the pseudo-likelihood function.
Abstract: Driver support systems (DSS) of intelligent vehicles will predict potentially dangerous situations in heavy traffic, help with navigation and vehicle guidance and interact with a human driver. Important information necessary for traffic situation understanding is presented by road signs. A new kernel rule has been developed for road sign classification using the Laplace probability density. Smoothing parameters of the Laplace kernel are optimized by the pseudo-likelihood cross-validation method. To maximize the pseudo-likelihood function, an Expectation–Maximization algorithm is used. The algorithm has been tested on a dataset with more than 4900 noisy images. A comparison to other classification methods is also given.

Journal ArticleDOI
TL;DR: New theoretical results show that the EM/MPM algorithm can be expected to achieve the goal of minimizing the expected value of the number of misclassified pixels, to the extent that theEM estimates of the model parameters are close to the true values of themodel parameters.
Abstract: In this paper we present new results relative to the "expectation-maximization/maximization of the posterior marginals" (EM/MPM) algorithm for simultaneous parameter estimation and segmentation of textured images. The EM/MPM algorithm uses a Markov random field model for the pixel class labels and alternately approximates the MPM estimate of the pixel class labels and estimates parameters of the observed image model. The goal of the EM/MPM algorithm is to minimize the expected value of the number of misclassified pixels. We present new theoretical results in this paper which show that the algorithm can be expected to achieve this goal, to the extent that the EM estimates of the model parameters are close to the true values of the model parameters. We also present new experimental results demonstrating the performance of the EM/MPM algorithm.

BookDOI
01 Jan 2000
TL;DR: This monograph develops the probability models for genetic data on related individuals, from the meiosis level to data on extended pedigrees, but on joint models for data at multiple genetic loci, such as arise in modern genome scan studies.
Abstract: This monograph is based on lectures presented at a 1999 CBMS Summer Research Conference. It develops the probability models for genetic data on related individuals, from the meiosis level to data on extended pedigrees. The focus is on simple Mendelian traits, such as DNA markers, but on joint models for data at multiple genetic loci, such as arise in modern genome scan studies. The statistical approach is that of likelihood, maximum likelihood estimation, and methods for the analysis of latent-variable and hidden-Markov models including the EM algorithm, the Baum algorithm, and Monte Carlo imputation methods.


Journal ArticleDOI
TL;DR: Suboptimal algorithms based on the model provide progressive classification that is much faster than the algorithm based on single-resolution hidden Markov models.
Abstract: This paper treats a multiresolution hidden Markov model for classifying images. Each image is represented by feature vectors at several resolutions, which are statistically dependent as modeled by the underlying state process, a multiscale Markov mesh. Unknowns in the model are estimated by maximum likelihood, in particular by employing the expectation-maximization algorithm. An image is classified by finding the optimal set of states with maximum a posteriori probability. States are then mapped into classes. The multiresolution model enables multiscale information about context to be incorporated into classification. Suboptimal algorithms based on the model provide progressive classification that is much faster than the algorithm based on single-resolution hidden Markov models.

Proceedings Article
30 Jun 2000
TL;DR: It is shown that, given data from a mixture of k well-separated spherical Gaussians in Rn, a simple two-round variant of EM will, with high probability, learn the centers of the Gaussian to near-optimal precision, if the dimension is high.
Abstract: We show that, given data from a mixture of k well-separated spherical Gaussians in Rn, a simple two-round variant of EM will, with high probability, learn the centers of the Gaussians to near-optimal precision, if the dimension is high (n ≫ log k). We relate this to previous theoretical and empirical work on the EM algorithm.

Journal ArticleDOI
TL;DR: Adopting the expectation-maximization (EM) algorithm for use in computing the maximum a posteriori (MAP) estimate corresponding to the model, it is found that the model permits remarkably simple, closed-form expressions for the EM update equations.
Abstract: This paper describes a statistical multiscale modeling and analysis framework for linear inverse problems involving Poisson data The framework itself is founded upon a multiscale analysis associated with recursive partitioning of the underlying intensity, a corresponding multiscale factorization of the likelihood (induced by this analysis), and a choice of prior probability distribution made to match this factorization by modeling the "splits" in the underlying partition The class of priors used here has the interesting feature that the "noninformative" member yields the traditional maximum-likelihood solution; other choices are made to reflect prior belief as to the smoothness of the unknown intensity Adopting the expectation-maximization (EM) algorithm for use in computing the maximum a posteriori (MAP) estimate corresponding to our model, we find that our model permits remarkably simple, closed-form expressions for the EM update equations The behavior of our EM algorithm is examined, and it is shown that convergence to the global MAP estimate can be guaranteed Applications in emission computed tomography and astronomical energy spectral analysis demonstrate the potential of the new approach

Journal ArticleDOI
TL;DR: The fast iterative algorithm for metal artifact reduction corrects intermediate reconstruction according to subsets of projections and produces satisfactory image quality at a much faster speed than the previously published iterative algorithms.

Proceedings ArticleDOI
26 Mar 2000
TL;DR: Two methods using mixtures of linear sub-spaces for face detection in gray level images using Kohonen's self-organizing map for clustering and Fisher linear discriminant to find the optimal projection for pattern classification are presented.
Abstract: We present two methods using mixtures of linear sub-spaces for face detection in gray level images. One method uses a mixture of factor analyzers to concurrently perform clustering and, within each cluster, perform local dimensionality reduction. The parameters of the mixture model are estimated using an EM algorithm. A face is detected if the probability of an input sample is above a predefined threshold. The other mixture of subspaces method uses Kohonen's self-organizing map for clustering and Fisher linear discriminant to find the optimal projection for pattern classification, and a Gaussian distribution to model the class-conditioned density function of the projected samples for each class. The parameters of the class-conditioned density functions are maximum likelihood estimates and the decision rule is also based on maximum likelihood. A wide range of face images including ones in different poses, with different expressions and under different lighting conditions are used as the training set to capture the variations of human faces. Our methods have been tested on three sets of 225 images which contain 871 faces. Experimental results on the first two datasets show that our methods perform as well as the best methods in the literature, yet have fewer false detects.

Journal ArticleDOI
TL;DR: In this paper, a statistical coarticulatory model is presented for spontaneous speech recognition, where knowledge of the dynamic, target-directed behavior in the vocal tract resonance is incorporated into the model design, training, and in likelihood computation.
Abstract: A statistical coarticulatory model is presented for spontaneous speech recognition, where knowledge of the dynamic, target-directed behavior in the vocal tract resonance is incorporated into the model design, training, and in likelihood computation. The principal advantage of the new model over the conventional HMM is the use of a compact, internal structure that parsimoniously represents long-span context dependence in the observable domain of speech acoustics without using additional, context-dependent model parameters. The new model is formulated mathematically as a constrained, nonstationary, and nonlinear dynamic system, for which a version of the generalized EM algorithm is developed and implemented for automatically learning the compact set of model parameters. A series of experiments for speech recognition and model synthesis using spontaneous speech data from the Switchboard corpus are reported. The promise of the new model is demonstrated by showing its consistently superior performance over a state-of-the-art benchmark HMM system under controlled experimental conditions. Experiments on model synthesis and analysis shed insight into the mechanism underlying such superiority in terms of the target-directed behavior and of the long-span context-dependence property, both inherent in the designed structure of the new dynamic model of speech.