scispace - formally typeset
Search or ask a question

Showing papers on "Expectation–maximization algorithm published in 2002"


Journal ArticleDOI
TL;DR: The novelty of the approach is that it does not use a model selection criterion to choose one among a set of preestimated candidate models; instead, it seamlessly integrate estimation and model selection in a single algorithm.
Abstract: This paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective "unsupervised" is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectation-maximization (EM) algorithm, it does not require careful initialization. The proposed method also avoids another drawback of EM for mixture fitting: the possibility of convergence toward a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures. These experiments testify for the good performance of our approach.

2,182 citations


Journal ArticleDOI
TL;DR: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues, and relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classified tissues or with background and biological knowledge of these sets.
Abstract: Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.

571 citations


Book ChapterDOI
28 May 2002
TL;DR: It is shown that EMICP robustly aligns the barycenters and inertia moments with a high variance, while it tends toward the accurate ICP for a small variance, and is used in a multi-scale approach using an annealing scheme on this parameter to combine robustness and accuracy.
Abstract: We investigate in this article the rigid registration of large sets of points, generally sampled from surfaces. We formulate this problem as a general Maximum-Likelihood (ML) estimation of the transformation and the matches. We show that, in the specific case of a Gaussian noise, it corresponds to the Iterative Closest Point algorithm(ICP) with the Mahalanobis distance.Then, considering matches as a hidden variable, we obtain a slightly more complex criterion that can be efficiently solved using Expectation-Maximization (EM) principles. In the case of a Gaussian noise, this new methods corresponds to an ICP with multiple matches weighted by normalized Gaussian weights, giving birth to the EM-ICP acronym of the method.The variance of the Gaussian noise is a new parameter that can be viewed as a "scale or blurring factor" on our point clouds. We show that EMICP robustly aligns the barycenters and inertia moments with a high variance, while it tends toward the accurate ICP for a small variance. Thus, the idea is to use a multi-scale approach using an annealing scheme on this parameter to combine robustness and accuracy. Moreover, we show that at each "scale", the criterion can be efficiently approximated using a simple decimation of one point set, which drastically speeds up the algorithm.Experiments on real data demonstrate a spectacular improvement of the performances of EM-ICP w.r.t. the standard ICP algorithm in terms of robustness (a factor of 3 to 4) and speed (a factor 10 to 20), with similar performances in precision. Though the multiscale scheme is only justified with EM, it can also be used to improve ICP, in which case the performances reaches then the one of EM when the data are not too noisy.

470 citations


Proceedings Article
01 Aug 2002
TL;DR: This paper showed that the simple variational methods of Blei et al. (2001) can lead to inaccurate inferences and biased learning for the generative aspect model and developed an alternative approach that leads to higher accuracy at comparable cost.
Abstract: The generative aspect model is an extension of the multinomial model for text that allows word probabilities to vary stochastically across documents. Previous results with aspect models have been promising, but hindered by the computational difficulty of carrying out inference and learning. This paper demonstrates that the simple variational methods of Blei et al. (2001) can lead to inaccurate inferences and biased learning for the generative aspect model. We develop an alternative approach that leads to higher accuracy at comparable cost. An extension of Expectation-Propagation is used for inference and then embedded in an EM algorithm for learning. Experimental results are presented for both synthetic and real data sets.

428 citations


Journal ArticleDOI
TL;DR: Experimental results show that the proposed speckle reduction algorithm outperforms standard wavelet denoising techniques in terms of the signal-to-noise ratio and the equivalent-number-of-looks measures in most cases and achieves better performance than the refined Lee filter.
Abstract: The granular appearance of speckle noise in synthetic aperture radar (SAR) imagery makes it very difficult to visually and automatically interpret SAR data. Therefore, speckle reduction is a prerequisite for many SAR image processing tasks. In this paper, we develop a speckle reduction algorithm by fusing the wavelet Bayesian denoising technique with Markov-random-field-based image regularization. Wavelet coefficients are modeled independently and identically by a two-state Gaussian mixture model, while their spatial dependence is characterized by a Markov random field imposed on the hidden state of Gaussian mixtures. The Expectation-Maximization algorithm is used to estimate hyperparameters and specify the mixture model, and the iterated-conditional-modes method is implemented to optimize the state configuration. The noise-free wavelet coefficients are finally estimated by a shrinkage function based on local weighted averaging of the Bayesian estimator. Experimental results show that the proposed method outperforms standard wavelet denoising techniques in terms of the signal-to-noise ratio and the equivalent-number-of-looks measures in most cases. It also achieves better performance than the refined Lee filter.

414 citations


Journal ArticleDOI
TL;DR: An adaptive semiparametric technique for the unsupervised estimation of the statistical terms associated with the gray levels of changed and unchanged pixels in a difference image is presented and a change detection map is generated.
Abstract: A novel automatic approach to the unsupervised identification of changes in multitemporal remote-sensing images is proposed. This approach, unlike classical ones, is based on the formulation of the unsupervised change-detection problem in terms of the Bayesian decision theory. In this context, an adaptive semiparametric technique for the unsupervised estimation of the statistical terms associated with the gray levels of changed and unchanged pixels in a difference image is presented. Such a technique exploits the effectivenesses of two theoretically well-founded estimation procedures: the reduced Parzen estimate (RPE) procedure and the expectation-maximization (EM) algorithm. Then, thanks to the resulting estimates and to a Markov random field (MRF) approach used to model the spatial-contextual information contained in the multitemporal images considered, a change detection map is generated. The adaptive semiparametric nature of the proposed technique allows its application to different kinds of remote-sensing images. Experimental results, obtained on two sets of multitemporal remote-sensing images acquired by two different sensors, confirm the validity of the proposed approach.

407 citations


Journal ArticleDOI
TL;DR: The original method, based on the EM algorithm, is shown to be superior to the standard one for a priori probability estimation and always performs better than the original one in terms of classification accuracy, when the a Priori probability conditions differ from the training set to the real-world data.
Abstract: It sometimes happens (for instance in case control studies) that a classifier is trained on a data set that does not reflect the true a priori probabilities of the target classes on real-world data. This may have a negative effect on the classification accuracy obtained on the real-world data set, especially when the classifier's decisions are based on the a posteriori probabilities of class membership. Indeed, in this case, the trained classifier provides estimates of the a posteriori probabilities that are not valid for this real-world data set (they rely on the a priori probabilities of the training set). Applying the classifier as is (without correcting its outputs with respect to these new conditions) on this new data set may thus be suboptimal. In this note, we present a simple iterative procedure for adjusting the outputs of the trained classifier with respect to these new a priori probabilities without having to refit the model, even when these probabilities are not known in advance. As a by-product, estimates of the new a priori probabilities are also obtained. This iterative algorithm is a straightforward instance of the expectation-maximization (EM) algorithm and is shown to maximize the likelihood of the new data. Thereafter, we discuss a statistical test that can be applied to decide if the a priori class probabilities have changed from the training set to the real-world data. The procedure is illustrated on different classification problems involving a multilayer neural network, and comparisons with a standard procedure for a priori probability estimation are provided. Our original method, based on the EM algorithm, is shown to be superior to the standard one for a priori probability estimation. Experimental results also indicate that the classifier with adjusted outputs always performs better than the original one in terms of classification accuracy, when the a priori probability conditions differ from the training set to the real-world data. The gain in classification accuracy can be significant.

356 citations


Journal Article
TL;DR: A general overview of analytic and iterative methods of reconstruction in SPECT is presented with a special focus on filter backprojection (FBP), conjugate gradient, maximum likelihood expectation maximization, and maximum a posteriori expectation maximizations algorithms.
Abstract: Images of the inside of the human body can be obtained noninvasively using tomographic acquisition and processing techniques. In particular, these techniques are commonly used to obtain images of a γ-emitter distribution after its administration in the human body. The reconstructed images are obtained given a set of their projections, acquired using rotating gamma cameras. A general overview of analytic and iterative methods of reconstruction in SPECT is presented with a special focus on filter backprojection (FBP), conjugate gradient, maximum likelihood expectation maximization, and maximum a posteriori expectation maximization algorithms. The FBP algorithm is faster than iterative algorithms, with the latter providing a framework for accurately modeling the emission and detection processes.

348 citations


Journal ArticleDOI
TL;DR: The method is demonstrated using an input-state-output model of the hemodynamic coupling between experimentally designed causes or factors in fMRI studies and the ensuing BOLD response, and extends classical inference to more plausible inferences about the parameters of the model given the data.

315 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present new computational techniques for multivariate longitudinal or clustered data with missing values by applying a multivariate extension of a popular linear mixed-effects model, creating multiple imputations of missing values for subsequent analyses by a straightforward and effective Markov chain Monte Carlo procedure.
Abstract: This article presents new computational techniques for multivariate longitudinal or clustered data with missing values. Current methodology for linear mixed-effects models can accommodate imbalance or missing data in a single response variable, but it cannot handle missing values in multiple responses or additional covariates. Applying a multivariate extension of a popular linear mixed-effects model, we create multiple imputations of missing values for subsequent analyses by a straightforward and effective Markov chain Monte Carlo procedure. We also derive and implement a new EM algorithm for parameter estimation which converges more rapidly than traditional EM algorithms because it does not treat the random effects as “missing data,” but integrates them out of the likelihood function analytically. These techniques are illustrated on models for adolescent alcohol use in a large school-based prevention trial.

310 citations


Journal ArticleDOI
01 Oct 2002
TL;DR: A new clustering algorithm is proposed, based on the expectation-maximization (EM) identification of Gaussian mixture models, which is applied to two well-known benchmark problems: the MPG prediction and a simulated second-order nonlinear process.
Abstract: The construction of interpretable Takagi-Sugeno (TS) fuzzy models by means of clustering is addressed. First, it is shown how the antecedent fuzzy sets and the corresponding consequent parameters of the TS model can be derived from clusters obtained by the Gath-Geva (GG) algorithm. To preserve the partitioning of the antecedent space, linearly transformed input variables can be used in the model. This may, however, complicate the interpretation of the rules. To form an easily interpretable model that does not use the transformed input variables, a new clustering algorithm is proposed, based on the expectation-maximization (EM) identification of Gaussian mixture models. This new technique is applied to two well-known benchmark problems: the MPG (miles per gallon) prediction and a simulated second-order nonlinear process. The obtained results are compared with results from the literature.

Journal ArticleDOI
TL;DR: In this paper, a greedy algorithm for learning a Gaussian mixture is proposed, which uses a combination of global and local search each time a new component is added to the mixture and achieves solutions superior to EM with k components in terms of the likelihood of a test set.
Abstract: Learning a Gaussian mixture with a local algorithm like EM can be difficult because (i) the true number of mixing components is usually unknown, (ii) there is no generally accepted method for parameter initialization, and (iii) the algorithm can get trapped in one of the many local maxima of the likelihood function. In this paper we propose a greedy algorithm for learning a Gaussian mixture which tries to overcome these limitations. In particular, starting with a single component and adding components sequentially until a maximum number k, the algorithm is capable of achieving solutions superior to EM with k components in terms of the likelihood of a test set. The algorithm is based on recent theoretical results on incremental mixture density estimation, and uses a combination of global and local search each time a new component is added to the mixture.

Journal ArticleDOI
TL;DR: The authors compare and contrast five approaches for dealing with missing data and suggest that mean substitution was the least effective and that regression with an error term and the EM algorithm produced estimates closest to those of the original variables.
Abstract: Researchers are commonly faced with the problem of missing data. This article presents theoretical and empirical information for the selection and application of approaches for handling missing data on a single variable. An actual data set of 492 cases with no missing values was used to create a simulated yet realistic data set with missing at random (MAR) data. The authors compare and contrast five approaches (listwise deletion, mean substitution, simple regression, regression with an error term, and the expectation maximization [EM] algorithm) for dealing with missing data, and compare the effects of each method on descriptive statistics and correlation coefficients for the imputed data (n = 96) and the entire sample (n = 492) when imputed data are included. All methods had limitations, although our findings suggest that mean substitution was the least effective and that regression with an error term and the EM algorithm produced estimates closest to those of the original variables.

Journal ArticleDOI
TL;DR: The mixture transition distribution model (MTD) was introduced by Raftery as discussed by the authors for the modeling of high-order Markov chains with a finite state space and has been successfully applied to a range of situations, including the analysis of wind directions, DNA sequences and social behavior.
Abstract: The mixture transition distribution model (MTD) was introduced in 1985 by Raftery for the modeling of high-order Markov chains with a finite state space. Since then it has been generalized and successfully applied to a range of situations, including the analysis of wind directions, DNA sequences and social behavior. Here we review the MTD model and the developments since 1985. We first introduce the basic principle and then we present several extensions, including general state spaces and spatial statistics. Following that, we review methods for estimating the model parameters. Finally, a review of different types of applications shows the practical interest of the MTD model.

Journal ArticleDOI
TL;DR: This work proposes a likelihood-based approach that requires only the assumption that the random effects have a smooth density, and implementation via the EM algorithm is described, and performance and the benefits for uncovering noteworthy features are illustrated.
Abstract: Joint models for a time-to-event (e.g., survival) and a longitudinal response have generated considerable recent interest. The longitudinal data are assumed to follow a mixed effects model, and a proportional hazards model depending on the longitudinal random effects and other covariates is assumed for the survival endpoint. Interest may focus on inference on the longitudinal data process, which is informatively censored, or on the hazard relationship. Several methods for fitting such models have been proposed, most requiring a parametric distributional assumption (normality) on the random effects. A natural concern is sensitivity to violation of this assumption; moreover, a restrictive distributional assumption may obscure key features in the data. We investigate these issues through our proposal of a likelihood-based approach that requires only the assumption that the random effects have a smooth density. Implementation via the EM algorithm is described, and performance and the benefits for uncovering noteworthy features are illustrated by application to data from an HIV clinical trial and by simulation.

Journal ArticleDOI
TL;DR: This work presents an algorithm which meets these demands: one-pass list-mode expectation maximization (OPL-EM) algorithm, which operates directly on list- mode data, passes through the data once only, accounts for finite resolution effects in the system model, and can also include regularization.
Abstract: High-resolution three-dimensional (3-D) positron emission tomography (PET) scanners with high count rate performance, such as the quad-high density avalanche chamber (HIDAC), place new demands on image reconstruction algorithms due to the large quantities of high-precision list-mode data which are produced. Therefore, a reconstruction algorithm is required which can, in a practical time frame, reconstruct into very large image arrays (submillimeter voxels, which range over a large field of view) whilst preferably retaining the precision of the data. This work presents an algorithm which meets these demands: one-pass list-mode expectation maximization (OPL-EM) algorithm. The algorithm operates directly on list-mode data, passes through the data once only, accounts for finite resolution effects in the system model, and can also include regularization. The algorithm performs multiple image updates during its single pass through the list-mode data, corresponding to the number of subsets that the data have been split into. The algorithm has been assessed using list-mode data from a quad-HIDAC and is compared to the analytic reconstruction method 3-D reprojection (RP) with 3-D filtered backprojection.

Journal ArticleDOI
TL;DR: In this paper, the EM algorithm is used to determine the maximum likelihood estimates when the data are progressively Type II censored, and the asymptotic variances and covariances of the ML estimates are computed by means of the missing information principle.

Journal ArticleDOI
TL;DR: This book discusses using the likelihood function for both modeling and inference, providing a nice introduction to a variety of topics and can serve as a good initial exposure to possibly new concepts without overwhelming them with details.
Abstract: As the title indicates, this book discusses using the likelihood function for both modeling and inference. It is written as a textbook with a fair number of examples. The author conveniently provides code using the statistical package R for all relevant examples on his web site. He assumes a list of prerequisites that would typically be covered in the Ž rst year of a master’s degree in statistics (or possibly in a solid undergraduate program in statistics). A good background in probability and theory of statistics, familiarity with applied statistics (such as tests of hypotheses, conŽ dence intervals, least squares and p values), and calculus are prerequisites for using this book. The author presents interesting philosophical discussions in Chapters 1 and 7. In Chapter 1 he explains the differences between a Bayesian versus frequentist approach to statistical inference. He states that the likelihood approach is a compromise between these two approaches and that it could be called a Fisherian approach. He argues that the likelihood approach is non-Bayesian yet has Bayesians aspects and that it has frequentist features but also some nonfrequentist aspects. He references Fisher throughout the book. In Chapter 7 the author discusses the controversial informal likelihood principle, “two datasets (regardless of experimental source) with the same likelihood should lead to the same conclusions.” It is hard to be convinced that how data were collected does not affect conclusions. Chapters 2 and 3 provide deŽ nitions and properties for likelihood functions. Some advanced technical topics are addressed in Chapters 8, 9, and 12, including score function, Fisher information, minimum variance unbiased estimation, consistency of maximum likelihood estimators, goodness-of-Ž t tests, and the EM algorithm. Six chapters deal with modeling. Chapter 4 presents the basic models, binomial and Poisson, with some applications. Chapter 6 focuses on regression models, including normal linear, logistic, Poisson, nonnormal, and exponential family, and deals with the related issues of deviance, iteratively weighted least squares, and the Box–Cox transformations. Chapter 11 covers models with complex data structure, including models for time series data, models for survival data, and some specialized Poisson models. Chapter 14 examines quasi-likelihood models, Chapter 17 covers random and mixed effects models, and Chapter 18 introduces the concept of nonparametric smoothing. The remaining chapters put more emphasis on inference. Chapter 5 deals with frequentist properties including bias of point estimates, p values, conŽ dence intervals, conŽ dence intervals via bootstrapping, and exact inference for binomial and Poisson models. Chapter 10 handles nuisance parameters using marginal and conditional likelihood, modiŽ ed proŽ le likelihood, and estimated likelihood methods. Chapter 13 covers the robustness of a speciŽ ed likelihood. Chapter 15 introduces empirical likelihood concepts, and Chapter 16 addresses random parameters. This book works Ž ne as a textbook, providing a nice introduction to a variety of topics. For engineers, this book can also serve as a good initial exposure to possibly new concepts without overwhelming them with details. But when applying a speciŽ c topic covered in this book to real problems, a more specialized book with greater depth and/or more practical examples may be desired.

Journal ArticleDOI
TL;DR: A Monte Carlo version of the EM gradient algorithm is developed for maximum likelihood estimation of model parameters and shows that the minimum mean-squared error (MMSE) prediction can be done in a linear fashion in spatial GLMMs analogous to linear kriging.
Abstract: We use spatial generalized linear mixed models (GLMM) to model non-Gaussian spatial variables that are observed at sampling locations in a continuous area. In many applications, prediction of random effects in a spatial GLMM is of great practical interest. We show that the minimum mean-squared error (MMSE) prediction can be done in a linear fashion in spatial GLMMs analogous to linear kriging. We develop a Monte Carlo version of the EM gradient algorithm for maximum likelihood estimation of model parameters. A by-product of this approach is that it also produces the MMSE estimates for the realized random effects at the sampled sites. This method is illustrated through a simulation study and is also applied to a real data set on plant root diseases to obtain a map of disease severity that can facilitate the practice of precision agriculture.

Journal ArticleDOI
TL;DR: The bootstrap is discussed as a method to assess the uncertainty of the maximum likelihood estimate and to construct confidence intervals for functions of the transition matrix such as expected survival.
Abstract: Discrete-time Markov chains have been successfully used to investigate treatment programs and health care protocols for chronic diseases. In these situations, the transition matrix, which describes the natural progression of the disease, is often estimated from a cohort observed at common intervals. Estimation of the matrix, however, is often complicated by the complex relationship among transition probabilities. This paper summarizes methods to obtain the maximum likelihood estimate of the transition matrix when the cycle length of the model coincides with the observation interval, the cycle length does not coincide with the observation interval, and when the observation intervals are unequal in length. In addition, the bootstrap is discussed as a method to assess the uncertainty of the maximum likelihood estimate and to construct confidence intervals for functions of the transition matrix such as expected survival. Copyright © 2002 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This paper presents a deterministic algorithm to approximately optimize the objective function by using the idea of the split and merge operations which was previously proposed within the maximum likelihood framework and applies the method to mixture of expers models to experimentally show that the proposed method can find the optimal number of experts of a MoE while avoiding local maxima.

Journal ArticleDOI
TL;DR: An integration of rough-set-theoretic knowledge extraction, the Expectation Maximization (EM) algorithm, and minimal spanning tree (MST) clustering is described, which helps in faster convergence and in avoiding the local minima problem.
Abstract: The problem of segmentation of multispectral satellite images is addressed. An integration of rough-set-theoretic knowledge extraction, the Expectation Maximization (EM) algorithm, and minimal spanning tree (MST) clustering is described. EM provides the statistical model of the data and handles the associated measurement and representation uncertainties. Rough-set theory helps in faster convergence and in avoiding the local minima problem, thereby enhancing the performance of EM. For rough-set-theoretic rule generation, each band is discretized using fuzzy-correlation-based gray-level thresholding. MST enables determination of nonconvex clusters. Since this is applied on Gaussians, determined by granules, rather than on the original data points, time required is very low. These features are demonstrated on two IRS-1A four-band images. Comparison with related methods is made in terms of computation time and a cluster quality measure.

Journal ArticleDOI
TL;DR: Results indicate that sample sizes significantly larger than 100 should be used to obtain reliable estimates through maximum likelihood, and the appropriateness of using asymptotic methods examined.
Abstract: Continuing increases in computing power and availability mean that many maximum likelihood estimation (MLE) problems previously thought intractable or too computationally difficult can now be tackled numerically. However, ML parameter estimation for distributions whose only analytical expression is as quantile functions has received little attention. Numerical MLE procedures for parameters of new families of distributions, the g-and-k and the generalized g-and-h distributions, are presented and investigated here. Simulation studies are included, and the appropriateness of using asymptotic methods examined. Because of the generality of these distributions, the investigations are not only into numerical MLE for these distributions, but are also an initial investigation into the performance and problems for numerical MLE applied to quantile-defined distributions in general. Datasets are also fitted using the procedures here. Results indicate that sample sizes significantly larger than 100 should be used to obtain reliable estimates through maximum likelihood.

Journal ArticleDOI
TL;DR: In this article, an EM type algorithm is provided for the Maximum Likelihood estimation of the Normal-Inverse Gaussian distribution, which overcomes numerical difficulties occurring when standard numerical techniques are used.

Journal ArticleDOI
TL;DR: A new algorithm that uses Structural Expectation Maximization (EM) for learning maximum likelihood phylogenetic trees, enabling, for the first time, phylogenetic analysis of large protein data sets in the maximum likelihood framework.
Abstract: A central task in the study of molecular evolution is the reconstruction of a phylogenetic tree from sequences of current-day taxa. The most established approach to tree reconstruction is maximum likelihood (ML) analysis. Unfortunately, searching for the maximum likelihood phylogenetic tree is computationally prohibitive for large data sets. In this paper, we describe a new algorithm that uses Structural Expectation Maximization (EM) for learning maximum likelihood phylogenetic trees. This algorithm is similar to the standard EM method for edge-length estimation, except that during iterations of the Structural EM algorithm the topology is improved as well as the edge length. Our algorithm performs iterations of two steps. In the E-step, we use the current tree topology and edge lengths to compute expected sufficient statistics, which summarize the data. In the M-Step, we search for a topology that maximizes the likelihood with respect to these expected sufficient statistics. We show that searching for better topologies inside the M-step can be done efficiently, as opposed to standard methods for topology search. We prove that each iteration of this procedure increases the likelihood of the topology, and thus the procedure must converge. This convergence point, however, can be a suboptimal one. To escape from such "local optima," we further enhance our basic EM procedure by incorporating moves in the flavor of simulated annealing. We evaluate these new algorithms on both synthetic and real sequence data and show that for protein sequences even our basic algorithm finds more plausible trees than existing methods for searching maximum likelihood phylogenies. Furthermore, our algorithms are dramatically faster than such methods, enabling, for the first time, phylogenetic analysis of large protein data sets in the maximum likelihood framework.

Journal ArticleDOI
Shy Shoham1
TL;DR: New robust clustering algorithms are presented, which significantly improve upon the noise and initialization sensitivity of traditional mixture decomposition algorithms, and simplify the determination of the optimal number of clusters in the data set.

Patent
23 May 2002
TL;DR: In this paper, a visual motion analysis method that uses multiple layered global motion models to both detect and reliably track an arbitrary number of moving objects appearing in image sequences is presented, where each global model includes a background layer and one or more foreground polybones, each foreground polybone including a parametric shape model, an appearance model, and a motion model describing an associated moving object.
Abstract: A visual motion analysis method that uses multiple layered global motion models to both detect and reliably track an arbitrary number of moving objects appearing in image sequences Each global model includes a background layer and one or more foreground “polybones”, each foreground polybone including a parametric shape model, an appearance model, and a motion model describing an associated moving object Each polybone includes an exclusive spatial support region and a probabilistic boundary region, and is assigned an explicit depth ordering Multiple global models having different numbers of layers, depth orderings, motions, etc, corresponding to detected objects are generated, refined using, for example, an EM algorithm, and then ranked/compared Initial guesses for the model parameters are drawn from a proposal distribution over the set of potential (likely) models Bayesian model selection is used to compare/rank the different models, and models having relatively high posterior probability are retained for subsequent analysis

Journal ArticleDOI
TL;DR: An expectation maximization algorithm is derived for maximum-likelihood training of substitution rate matrices from multiple sequence alignments that can be used to train hidden substitution models, where the structural context of a residue is treated as a hidden variable that can evolve over time.

Journal ArticleDOI
TL;DR: In this article, an EM type algorithm is developed for maximum likelihood estimation of a general nonlinear structural equation model, where the E-step is completed by a Metropolis-Hastings algorithm and the M-step can be completed efficiently by simple conditional maximization.
Abstract: The existing maximum likelihood theory and its computer software in structural equation modeling are established based on linear relationships among manifest variables and latent variables. However, models with nonlinear relationships are often encountered in social and behavioral sciences. In this article, an EM type algorithm is developed for maximum likelihood estimation of a general nonlinear structural equation model. To avoid computation of the complicated multiple integrals involved, the E-step is completed by a Metropolis-Hastings algorithm. It is shown that the M-step can be completed efficiently by simple conditional maximization. Standard errors of the maximum likelihood estimates are obtained via Louis's formula. The methodology is illustrated with results from a simulation study and two real examples.

Journal ArticleDOI
TL;DR: The use of conditional maximum-likelihood training for the TSBN is investigated and it is found that this gives rise to improved classification performance over the ML-trained TSBN.
Abstract: We are concerned with the problem of image segmentation, in which each pixel is assigned to one of a predefined finite number of labels. In Bayesian image analysis, this requires fusing together local predictions for the class labels with a prior model of label images. Following the work of Bouman and Shapiro (1994), we consider the use of tree-structured belief networks (TSBNs) as prior models. The parameters in the TSBN are trained using a maximum-likelihood objective function with the EM algorithm and the resulting model is evaluated by calculating how efficiently it codes label images. A number of authors have used Gaussian mixture models to connect the label field to the image data. We compare this approach to the scaled-likelihood method of Smyth (1994) and Morgan and Bourlard (1995), where local predictions of pixel classification from neural networks are fused with the TSBN prior. Our results show a higher performance is obtained with the neural networks. We evaluate the classification results obtained and emphasize not only the maximum a posteriori segmentation, but also the uncertainty, as evidenced e.g., by the pixelwise posterior marginal entropies. We also investigate the use of conditional maximum-likelihood training for the TSBN and find that this gives rise to improved classification performance over the ML-trained TSBN.