scispace - formally typeset
Search or ask a question

Showing papers on "Expectation–maximization algorithm published in 1995"


Posted Content
TL;DR: The random field models and techniques introduced in this paper differ from those common to much of the computer vision literature in that the underlying random fields are non-Markovian and have a large number of parameters that must be estimated.
Abstract: We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the Kullback-Leibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the field and an iterative scaling algorithm is used to estimate the optimal values of the weights. The statistical modeling techniques introduced in this paper differ from those common to much of the natural language processing literature since there is no probabilistic finite state or push-down automaton on which the model is built. Our approach also differs from the techniques common to the computer vision literature in that the underlying random fields are non-Markovian and have a large number of parameters that must be estimated. Relations to other learning approaches including decision trees and Boltzmann machines are given. As a demonstration of the method, we describe its application to the problem of automatic word classification in natural language processing. Key words: random field, Kullback-Leibler divergence, iterative scaling, divergence geometry, maximum entropy, EM algorithm, statistical learning, clustering, word morphology, natural language processing

1,140 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider four different approximations to the log-likelihood, comparing their computational and statistical properties, and conclude that the linear mixed-effects (LME) approximation suggested by Lindstrom and Bates, t
Abstract: Nonlinear mixed-effects models have received a great deal of attention in the statistical literature in recent years because of the flexibility they offer in handling the unbalanced repeated-measures data that arise in different areas of investigation, such as pharmacokinetics and economics. Several different methods for estimating the parameters in nonlinear mixed-effects model have been proposed. We concentrate here on two of them—maximum likelihood and restricted maximum likelihood. A rather complex numerical issue for (restricted) maximum likelihood estimation in nonlinear mixed-effects models is the evaluation of the log-likelihood function of the data, because it involves the evaluation of a multiple integral that, in most cases, does not have a closed-form expression. We consider here four different approximations to the log-likelihood, comparing their computational and statistical properties. We conclude that the linear mixed-effects (LME) approximation suggested by Lindstrom and Bates, t...

1,073 citations


Journal ArticleDOI
TL;DR: In this article, a strategy of using an average information matrix is shown to be computationally convenient and efficient for estimating variance components by restricted maximum likelihood (REML) in the mixed linear model.
Abstract: A strategy of using an average information matrix is shown to be computationally convenient and efficient for estimating variance components by restricted maximum likelihood (REML) in the mixed linear model. Three applications are described. The motivation for the algorithm was the estimation of variance components in the analysis of wheat variety means from 1,071 experiments representing 10 years and 60 locations in New South Wales. We also apply the algorithm to the analysis of designed experiments by incomplete block analysis and spatial analysis of field experiments.

868 citations


Journal ArticleDOI
TL;DR: It is shown how the computational scheme of Lauritzen and Spiegelhalter (1988) can be exploited to perform the E-step of the EM algorithm when applied to findingmaximum likelihood estimates or penalized maximum likelihood estimates in hierarchical log-linear models and recursive models for contingency tables with missing data.

788 citations


Book ChapterDOI
01 Jan 1995

603 citations


Journal ArticleDOI
TL;DR: The new method is a natural extension of the EM for maximizing likelihood with concave priors for emission tomography and convergence proofs are given.
Abstract: The maximum likelihood (ML) expectation maximization (EM) approach in emission tomography has been very popular in medical imaging for several years. In spite of this, no satisfactory convergent modifications have been proposed for the regularized approach. Here, a modification of the EM algorithm is presented. The new method is a natural extension of the EM for maximizing likelihood with concave priors. Convergence proofs are given. >

506 citations


Journal ArticleDOI
TL;DR: This EM gradient algorithm approximately solves the M-step of the EM algorithm by one iteration of Newton's method, and the proof of global convergence applies and improves existing theory for the EM algorithms.
Abstract: In many problems of maximum likelihood estimation, it is impossible to carry out either the E-step or the M-step of the EM algorithm. The present paper introduces a gradient algorithm that is closely related to the EM algorithm. This EM gradient algorithm approximately solves the M-step of the EM algorithm by one iteration of Newton's method. Since Newton's method converges quickly, the local properties of the EM gradient algorithm are almost identical with those of the EM algorithm. Any strict local maximum point of the observed likelihood locally attracts the EM and EM gradient algorithm at the same rate of convergence, and near the maximum point the EM gradient algorithm always produces an increase in the likelihood. With proper modification the EM gradient algorithm also exhibits global convergence properties that are similar to those of the EM algorithm. Our proof of global convergence applies and improves existing theory for the EM algorithm. These theoretical points are reinforced by a discussion of three realistic examples illustrating how the EM gradient algorithm can succeed where the EM algorithm is intractable

371 citations


Journal ArticleDOI
TL;DR: Preliminary numerical testing of the algorithms on simulated data suggest that the convex algorithm and the ad hoc gradient algorithm are computationally superior to the EM algorithm.
Abstract: This paper reviews and compares three maximum likelihood algorithms for transmission tomography. One of these algorithms is the EM algorithm, one is based on a convexity argument devised by De Pierro (see IEEE Trans. Med. Imaging, vol.12, p.328-333, 1993) in the context of emission tomography, and one is an ad hoc gradient algorithm. The algorithms enjoy desirable local and global convergence properties and combine gracefully with Bayesian smoothing priors. Preliminary numerical testing of the algorithms on simulated data suggest that the convex algorithm and the ad hoc gradient algorithm are computationally superior to the EM algorithm. This superiority stems from the larger number of exponentiations required by the EM algorithm. The convex and gradient algorithms are well adapted to parallel computing. >

368 citations


Journal ArticleDOI
TL;DR: A unified information geometrical framework for studying stochastic models of neural networks, by focusing on the EM and em algorithms, and proves a condition that guarantees their equivalence.

339 citations


Journal ArticleDOI
TL;DR: This work generalizes the McCullagh and Nelder approach to a latent class framework and demonstrates how this approach handles many of the existing latent class regression procedures as special cases, as well as a host of other parametric specifications in the exponential family heretofore not mentioned in the latent class literature.
Abstract: A mixture model approach is developed that simultaneously estimates the posterior membership probabilities of observations to a number of unobservable groups or latent classes, and the parameters of a generalized linear model which relates the observations, distributed according to some member of the exponential family, to a set of specified covariates within each Class. We demonstrate how this approach handles many of the existing latent class regression procedures as special cases, as well as a host of other parametric specifications in the exponential family heretofore not mentioned in the latent class literature. As such we generalize the McCullagh and Nelder approach to a latent class framework. The parameters are estimated using maximum likelihood, and an EM algorithm for estimation is provided. A Monte Carlo study of the performance of the algorithm for several distributions is provided, and the model is illustrated in two empirical applications.

314 citations


Journal ArticleDOI
TL;DR: It is shown that the EM algorithm can be regarded as a variable metric algorithm with its searching direction having a positive projection on the gradient of the log likelihood and an acceleration technique that yields a significant speedup in simulation experiments.

Journal ArticleDOI
TL;DR: The proposed model is a semi-parametric generalization of the mixture model of Farewell (1982), and a logistic regression model is proposed for the incidence part of the model, and a Kaplan-Meier type approach is used to estimate the latency part ofThe model.
Abstract: A mixture model is an attractive approach for analyzing failure time data in which there are thought to be two groups of subjects, those who could eventually develop the endpoint and those who could not develop the endpoint. The proposed model is a semi-parametric generalization of the mixture model of Farewell (1982). A logistic regression model is proposed for the incidence part of the model, and a Kaplan-Meier type approach is used to estimate the latency part of the model. The estimator arises naturally out of the EM algorithm approach for fitting failure time mixture models as described by Larson and Dinse (1985). The procedure is applied to some experimental data from radiation biology and is evaluated in a Monte Carlo simulation study. The simulation study suggests the semi-parametric procedure is almost as efficient as the correct fully parametric procedure for estimating the regression coefficient in the incidence, but less efficient for estimating the latency distribution.

Journal ArticleDOI
TL;DR: A general strategy for accurately estimating false-match rates for each possible cutoff weight and uses a model where the distribution of observed weights are viewed as a mixture of weights for true matches and weights for false matches.
Abstract: Specifying a record-linkage procedure requires both (1) a method for measuring closeness of agreement between records, typically a scalar weight, and (2) a rule for deciding when to classify records as matches or nonmatches based on the weights. Here we outline a general strategy for the second problem, that is, for accurately estimating false-match rates for each possible cutoff weight. The strategy uses a model where the distribution of observed weights are viewed as a mixture of weights for true matches and weights for false matches. An EM algorithm for fitting mixtures of transformed-normal distributions is used to find posterior modes; associated posterior variability is due to uncertainty about specific normalizing transformations as well as uncertainty in the parameters of the mixture model, the latter being calculated using the SEM algorithm. This mixture-model calibration method is shown to perform well in an applied setting with census data. Further, a simulation experiment reveals that...

Journal ArticleDOI
S H Manglos1, G M Gagne, A. Krol, F.D. Thomas, R Narayanaswamy 
TL;DR: With ordered subsets, high-quality iterative reconstruction is now available in clinically practical reconstructions times, and the existing transmission maximum-likelihood algorithm (TRML) is accurate but the reconstruction time is too long.
Abstract: An iterative algorithm is presented for accelerated reconstruction of cone beam transmission CT data (CBCT). CBCT supplies an attenuation map for SPECT attenuation compensation and anatomical correlation. Iterative algorithms are necessary to reduce truncation artifacts and 3D reconstruction artifacts. An existing transmission maximum-likelihood algorithm (TRML) is accurate but the reconstruction time is too long. The new algorithm is a modified EM algorithm, based on ordered subsets (OSEM). OSEM was evaluated in comparison to TRML using a thorax phantom and a 3D Defrise phantom. A wide range of image measures were evaluated, including spatial resolution, noise, log likelihood, region quantification, truncation artifact removal, and 3D artifact removal. For appropriate subset size, OSEM produced essentially the same image as TRML, but required only one-tenth as many iterations. Thus, adequate images were available in two to four iterations (20-30 min on a SPARC 2 workstation). Further, OSEM still approximately maximizes likelihood: divergence occurs only for very high (and clinically irrelevant) iterations. Ordered subsets are likely to be useful in other geometries (fan and parallel) and for emission CT as well. Therefore, with ordered subsets, high-quality iterative reconstruction is now available in clinically practical reconstructions times.

Proceedings Article
27 Nov 1995
TL;DR: Two regularization methods are compared which can be used to improve the generalization capabilities of Gaussian mixture density estimates and Breiman's "bagging", which recently has been found to produce impressive results for classification networks.
Abstract: We compare two regularization methods which can be used to improve the generalization capabilities of Gaussian mixture density estimates. The first method uses a Bayesian prior on the parameter space. We derive EM (Expectation Maximization) update rules which maximize the a posterior parameter probability. In the second approach we apply ensemble averaging to density estimation. This includes Breiman's "bagging", which recently has been found to produce impressive results for classification networks.

Journal ArticleDOI
TL;DR: This article proposes an EM-like algorithm for estimating, by maximum likelihood, the population parameters of a nonlinear mixed-effect model given sparse individual data using a linearization about those Bayesian estimates.
Abstract: This article proposes an EM-like algorithm for estimating, by maximum likelihood, the population parameters of a nonlinear mixed-effect model given sparse individual data The first step involves Bayesian estimation of the individual parameters During the second step, population parameters are estimated using a linearization about those Bayesian estimates This algorithm (implemented in P-PHARM) is evaluated on simulated data, mimicking pharmacokinetic analyses and compared to the First-Order method and the First-Order Conditional Estimates method (both implemented in NONMEM) The accuracy of the results, within few iterations, shows the estimation capabilities of the proposed approach

Journal ArticleDOI
TL;DR: The authors address the problem of maximum likelihood estimation of dependence tree models with missing observations, using the expectation-maximization algorithm, which involves computing observation probabilities with an iterative "upward-downward" algorithm.
Abstract: A dependence tree is a model for the joint probability distribution of an n-dimensional random vector, which requires a relatively small number of free parameters by making Markov-like assumptions on the tree. The authors address the problem of maximum likelihood estimation of dependence tree models with missing observations, using the expectation-maximization algorithm. The solution involves computing observation probabilities with an iterative "upward-downward" algorithm, which is similar to an algorithm proposed for belief propagation in causal trees, a special case of Bayesian networks. >

Proceedings Article
Bo Thiesson1
20 Aug 1995
TL;DR: This paper considers statistical batch learning of the probability tables on the basis of incomplete data and expert knowledge and proposes a new class of models that allows a great variety of local functional restrictions to be imposed on the statistical model.
Abstract: Probabilistic expert systems based on Bayesian networks (BNs) require initial specification of both a qualitative graphical structure and quantitative assessment of conditional probability tables. This paper considers statistical batch learning of the probability tables on the basis of incomplete data and expert knowledge. The EM algorithm with a generalized conjugate gradient acceleration method has been dedicated to quantification of BNs by maximum posterior likelihood estimation for a super-class of the recursive graphical models. This new class of models allows a great variety of local functional restrictions to be imposed on the statistical model, which hereby extents the control and applicability of the constructed method for quantifying BNs.

Journal ArticleDOI
G. Agrò1
TL;DR: In this article, the problem of obtaining maximum likelihood estimates for the three parameters of the exponential power function is addressed, and the information matrix is derived and the covariance matrix is presented; the regularity conditions which ensure asymptotic normality and efficiency are examined.
Abstract: This paper addresses the problem of obtaining maximum likelihood estimates for the three parameters of the exponential power function; the information matrix is derived and the covariance matrix is here presented; the regularity conditions which ensure asymptotic normality and efficiency are examined. A numerical investigation is performed for exploring the bias and variance of the maximum likelihood estimates and their dependence on sample size and shape parameter.

Journal ArticleDOI
TL;DR: This work proposes several solutions to implement the ‘SEMcm algorithm’ (SEM for censored mixture), showing in particular that one of these procedures solves numerical problems arising with the EMcm algorithm and mixtures of nonexponential-type distributions.

Proceedings ArticleDOI
Ananth Sankar1, Chin-Hui Lee1
09 May 1995
TL;DR: The maximum likelihood (ML) stochastic matching approach is presented to decrease the acoustic mismatch between a test utterance Y and a given set of speech hidden Markov models so as to reduce the recognition performance degradation caused by possible distortions in the test utterances.
Abstract: We present a maximum likelihood (ML) stochastic matching approach to decrease the acoustic mismatch between a test utterance Y and a given set of speech hidden Markov models /spl Lambda//sub X/ so as to reduce the recognition performance degradation caused by possible distortions in the test utterance. This mismatch may be reduced in two ways: (1) by an inverse distortion function F/sub /spl nu//(.) that maps Y into an utterance X which matches better with the models /spl Lambda//sub X/, and (2) by a model transformation function G/sub /spl eta//(.) that maps /spl Lambda//sub X/ to the transformed model /spl Lambda//sub Y/ which matches better with the utterance Y. The functional form of the transformations depends upon our prior knowledge about the mismatch, and the parameters are estimated along with the recognized string in a maximum likelihood manner using the EM algorithm. Experimental results verify the efficacy of the approach in improving the performance of a continuous speech recognition system in the presence of mismatch due to different transducers and transmission channels.

Proceedings ArticleDOI
09 May 1995
TL;DR: A complete iterative solution to the Rician parameter estimation problem by means of the EM (expectation-maximization) algorithm is presented.
Abstract: A polarimetric synthetic aperture radar (SAR) forms a complex vector-valued image where each pixel comprises the polarization-dependent reflectivity of a portion of a target or scene. The most common statistical model for this type of image is the zero-mean, circularly-symmetric, multivariate, complex Gaussian model. A logical generalization of this model is a circularly-symmetric, multivariate, complex Rician model which results from having a nonzero-mean complex target reflectivity. Direct maximum-likelihood estimation of the Rician model parameters is infeasible, since setting derivatives equal to zero results in an intractable system of coupled nonlinear equations. The contribution of the paper is a complete iterative solution to the Rician parameter estimation problem by means of the EM (expectation-maximization) algorithm.

Journal ArticleDOI
TL;DR: The R e s t r i c t e d E M A l g o r i t h m f o r M a x i m u m L i k e l i h o o d E s t i m a t i o n U n d e r Linear R e S T O R s on the Parameters
Abstract: The EM algorithm is one of the most powerful algorithms for obtaining maximum likelihood estimates for many incomplete-data problems. But when the parameters must satisfy a set of linear restrictions, the EM algorithm may be too complicated to apply directly. In this article we propose maximum likelihood estimation procedures under a set of linear restrictions for situations in which the EM algorithm could be used if there were no such restrictions on the parameters. We develop a modification to the EM algorithm, which we call the restricted EM algorithm, incorporating the linear restrictions on the parameters. This algorithm is easily updated by using the code for the complete data information matrix and the code for the usual EM algorithm. Major applications of the restricted EM algorithm are to construct likelihood ratio tests and profile likelihood confidence intervals. We illustrate the procedure with two models: a variance component model and a bivariate normal model.

Journal ArticleDOI
TL;DR: This paper discusses the non-parametric estimation of a distribution function based on incomplete data for which the measurement origin of a survival time or the date of enrollment in a study is known only to belong to an interval and the survival time of interest itself is observed from a truncated distribution.
Abstract: In this paper we discuss the non-parametric estimation of a distribution function based on incomplete data for which the measurement origin of a survival time or the date of enrollment in a study is known only to belong to an interval. Also the survival time of interest itself is observed from a truncated distribution and is known only to lie in an interval. To estimate the distribution function, a simple self-consistency algorithm, a generalization of Turnbull's (1976, Journal of the Royal Statistical Association, Series B 38, 290-295) self-consistency algorithm, is proposed. This method is then used to analyze two AIDS cohort studies, for which direct use of the EM algorithm (Dempster, Laird and Rubin, 1976, Journal of the Royal Statistical Association, Series B 39, 1-38), which is computationally complicated, has previously been the usual method of the analysis.

Journal ArticleDOI
TL;DR: This is Part II of a series concerning the PLS kernel algorithm for data sets with many variables and few objects where the issues of cross‐validation and missing data are investigated.
Abstract: This is Part II of a series concerning the PLS kernel algorithm for data sets with many variables and few objects. Here the issues of cross-validation and missing data are investigated. Both partial and full cross-validation are evaluated in terms of predictive residuals and speed and are illustrated on real examples. Two related approaches to the solution of the missing data problem are presented. One is a full EM algorithm and the second a reduced EM algorithm which applies when the number of missing values is small. The two examples are multivariate calibration data sets. The first set consists of UV-visible data measured on mixtures of four metal ions. The second example consists of FT-IR measurements on mixtures consisting of four different organic substances.

Journal ArticleDOI
TL;DR: Simple computational formulas for this E-step are provided in the case of two-level data, and recent experience with its implementation discussed.
Abstract: Recent investigations have revealed that widely available software can be adapted to provide maximum likelihood (ML) estimates for a general class of multilevel covariance structure models if the data are balanced (e.g. equal numbers of students in each of many schools). When the data are unbalanced, finding ML estimates is more challenging. However, by viewing the ‘complete data’ as balanced, one can calculate ML estimates for unbalanced data by constructing a comparatively simple EM algorithm. Computation using standard single-level structural equation software performs the ‘M step’, and an auxiliary program computes the E-step. Simple computational formulas for this E-step are provided in the case of two-level data, and recent experience with its implementation discussed. Asymptotic standard errors are found by computing the observed-data information matrix at convergence.

Journal ArticleDOI
TL;DR: Four mixture models are fit within a Bayesian model monitoring using posterior predictive checks framework, where the distinctions between models arise from assumptions about the variance of the shifted observations and the exchangeability of schizophrenic individuals.
Abstract: Reaction times for schizophrenic individuals in a simple visual tracking experiment can be substantially more variable than for non-schizophrenic individuals Current psychological theory suggests that at least some of this extra variability arises from an attentional lapse that delays some, but not all, of each schizophrenic's reaction times Based on this theory, we pursue models in which measurements from non-schizophrenics arise from a normal linear model with a separate mean for each individual, whereas measurements from schizophrenics arise from a mixture of (i) a component analogous to the distribution of response times for non-schizophrenics and (ii) a mean-shifted component We fit four mixture models within this framework, where the distinctions between models arise from assumptions about the variance of the shifted observations and the exchangeability of schizophrenic individuals Some of these models can be fit by maximum likelihood using the EM algorithm, and all can be fit using the ECM algorithm, where the covariance matrices associated with the parameters are calculated by the SEM and SECM algorithms, respectively Bayesian model monitoring using posterior predictive checks is invoked to discard models that fail to reproduce certain observed features of the data and to stimulate the development of better models

Journal ArticleDOI
R. Samadani1
TL;DR: This correspondence describes an algorithm for estimating the proportions of classes in an SAR image by first assuming that an image consists of a mixture of a known number of different pixel types, then evaluated using the EM algorithm.
Abstract: This correspondence describes an algorithm for estimating the proportions of classes in an SAR image by first assuming that an image consists of a mixture of a known number of different pixel types. A maximum likelihood estimate of the parameters of the resulting mixture distribution is then evaluated using the EM algorithm. An advantage of the finite mixtures approach is that the quantities of interest, the proportions, are directly estimated. The technique is applied to aircraft synthetic aperture radar (SAR) images of sea ice. In addition to finding the proportions of the classes, knowledge of the mixture components allows image displays tailored to a user's requirements. >

Journal ArticleDOI
TL;DR: In this paper, the authors employ a maximum smoothed likelihood formalism inspired by a nonlinearly smoothed version of the EMS algorithm of Silverman, Jones, Wilson and Nychka.
Abstract: We consider the problem of estimating a pdf $f$ from samples $X_1, X_2, \ldots, X_n$ of a random variable with pdf $\mathscr{K}f$, where $\mathscr{K}$ is a compact integral operator. We employ a maximum smoothed likelihood formalism inspired by a nonlinearly smoothed version of the EMS algorithm of Silverman, Jones, Wilson and Nychka. We show that this nonlinearly smoothed algorithm is itself an EM algorithm, which helps explain the strong convergence properties of the algorithm. For the case of (standard) density estimation, that is, the case where $\mathscr{K}$ is the identity, the method yields the standard kernel density estimators. The maximum smoothed likelihood density estimation technique is a regularization technique. We prove an inequality which implies the stability and convergence of the regularization method for the large sample asymptotic problem. Under minimal assumptions it also implies the a.s. convergence of the finite sample density estimate via a uniform version of the strong law of large numbers. Under extra regularity conditions we get a.s. convergence rates via a uniform version of the law of the iterated logarithm (under stronger conditions than usual).

Journal ArticleDOI
TL;DR: A maximum a posteriori (MAP) approach for iterative reconstruction based on a weighted least-squares conjugate gradient (WLS-CG) algorithm, which concludes that the MAP-CG algorithm requires 10%-25% of the processing time of EM techniques, and provides images of comparable or superior quality.
Abstract: We have derived a maximum a posteriori (MAP) approach for iterative reconstruction based on a weighted least‐squares conjugate gradient (WLS‐CG) algorithm. The WLS‐CG algorithm has been shown to have initial convergence rates up to 10× faster than the maximum‐likelihood expectation maximization (ML‐EM) algorithm, but WLS‐CG suffers from rapidly increasing image noise at higher iteration numbers. In our MAP‐CG algorithm, the increasing noise is controlled by a Gibbs smoothing prior, resulting in stable, convergent solutions. Our formulation assumes a Gaussian noise model for the likelihood function. When a linear transformation of the pixel space is performed (the ‘‘relaxation’’ acceleration method), the MAP‐CG algorithm obtains a low‐noise, stable solution (one that does not change with further iterations) in 10–30 iterations, compared to 100–200 iterations for MAP‐EM. Each iteration of MAP‐CG requires approximately the same amount of processing time as one iteration of ML‐EM or MAP‐EM. We show that the use of an initial image estimate obtained from a single iteration of the Chang method helps the algorithm to converge faster when acceleration is not used, but does not help when acceleration is applied. While both the WLS‐CG and MAP‐CG methods suffer from the potential for obtaining negative pixel values in the iterated image estimates, the use of the Gibbs prior substantially reduces the number of pixels with negative values and restricts them to regions of little or no activity. We use SPECT data from simulated hot‐sphere phantoms and from patient studies to demonstrate the advantages of the MAP‐CG algorithm. We conclude that the MAP‐CG algorithm requires 10%–25% of the processing time of EM techniques, and provides images of comparable or superior quality.