Showing papers on "Expectation–maximization algorithm published in 1994"

PDF

Open Access

Journal Article•DOI•

Accelerated image reconstruction using ordered subsets of projection data

[...]

H.M. Hudson¹, R.S. Larkin¹•Institutions (1)

01 Jan 1994-IEEE Transactions on Medical Imaging

TL;DR: Ordered subsets EM (OS-EM) provides a restoration imposing a natural positivity condition and with close links to the EM algorithm, applicable in both single photon (SPECT) and positron emission tomography (PET).

...read moreread less

Abstract: The authors define ordered subset processing for standard algorithms (such as expectation maximization, EM) for image restoration from projections. Ordered subsets methods group projection data into an ordered sequence of subsets (or blocks). An iteration of ordered subsets EM is defined as a single pass through all the subsets, in each subset using the current estimate to initialize application of EM with that data subset. This approach is similar in concept to block-Kaczmarz methods introduced by Eggermont et al. (1981) for iterative reconstruction. Simultaneous iterative reconstruction (SIRT) and multiplicative algebraic reconstruction (MART) techniques are well known special cases. Ordered subsets EM (OS-EM) provides a restoration imposing a natural positivity condition and with close links to the EM algorithm. OS-EM is applicable in both single photon (SPECT) and positron emission tomography (PET). In simulation studies in SPECT, the OS-EM algorithm provides an order-of-magnitude acceleration over EM, with restoration quality maintained. >

...read moreread less

3,740 citations

Journal Article•DOI•

Hierarchical mixtures of experts and the EM algorithm

[...]

Michael I. Jordan¹, Robert A. Jacobs²•Institutions (2)

Massachusetts Institute of Technology¹, University of Rochester²

01 Mar 1994-Neural Computation

TL;DR: An Expectation-Maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning and an on-line learning algorithm in which the parameters are updated incrementally.

...read moreread less

Abstract: We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM's). Learning is treated as a maximum likelihood problem; in particular, we present an Expectation-Maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an on-line learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain.

...read moreread less

2,418 citations

Journal Article•DOI•

Sequential Imputations and Bayesian Missing Data Problems

[...]

Augustine Kong¹, Jun Liu², Wing Hung Wong¹•Institutions (2)

University of Chicago¹, Harvard University²

01 Mar 1994-Journal of the American Statistical Association

TL;DR: This article introduces an alternative procedure that involves imputing the missing data sequentially and computing appropriate importance sampling weights, and in many applications this new procedure works very well without the need for iterations.

...read moreread less

Abstract: For missing data problems, Tanner and Wong have described a data augmentation procedure that approximates the actual posterior distribution of the parameter vector by a mixture of complete data posteriors. Their method of constructing the complete data sets is closely related to the Gibbs sampler. Both required iterations, and, similar to the EM algorithm, convergence can be slow. We introduce in this article an alternative procedure that involves imputing the missing data sequentially and computing appropriate importance sampling weights. In many applications this new procedure works very well without the need for iterations. Sensitivity analysis, influence analysis, and updating with new data can be performed cheaply. Bayesian prediction and model selection can also be incorporated. Examples taken from a wide range of applications are used for illustration.

...read moreread less

1,166 citations

Journal Article•DOI•

Space-alternating generalized expectation-maximization algorithm

[...]

Jeffrey A. Fessler¹, Alfred O. Hero¹•Institutions (1)

University of Michigan¹

01 Oct 1994-IEEE Transactions on Signal Processing

TL;DR: The paper describes the space-alternating generalized EM (SAGE) method, which updates the parameters sequentially by alternating between several small hidden-data spaces defined by the algorithm designer, and proves that the sequence of estimates monotonically increases the penalized-likelihood objective, derive asymptotic convergence rates, and provide sufficient conditions for monotone convergence in norm.

...read moreread less

Abstract: The expectation-maximization (EM) method can facilitate maximizing likelihood functions that arise in statistical estimation problems. In the classical EM paradigm, one iteratively maximizes the conditional log-likelihood of a single unobservable complete data space, rather than maximizing the intractable likelihood function for the measured or incomplete data. EM algorithms update all parameters simultaneously, which has two drawbacks: 1) slow convergence, and 2) difficult maximization steps due to coupling when smoothness penalties are used. The paper describes the space-alternating generalized EM (SAGE) method, which updates the parameters sequentially by alternating between several small hidden-data spaces defined by the algorithm designer. The authors prove that the sequence of estimates monotonically increases the penalized-likelihood objective, derive asymptotic convergence rates, and provide sufficient conditions for monotone convergence in norm. Two signal processing applications illustrate the method: estimation of superimposed signals in Gaussian noise, and image reconstruction from Poisson measurements. In both applications, the SAGE algorithms easily accommodate smoothness penalties and converge faster than the EM algorithms. >

...read moreread less

1,083 citations

Journal Article•DOI•

Estimation of Finite Mixture Distributions Through Bayesian Sampling

[...]

Jean Diebolt, Christian P. Robert

01 Jul 1994-Journal of the royal statistical society series b-methodological

TL;DR: In this paper, Gibbs sampling is used to evaluate the posterior distribution and Bayes estimators by Gibbs sampling, relying on the missing data structure of the mixture model. And the data augmentation method is shown to converge geometrically, since a duality principle transfers properties from the discrete missing data chain to the parameters.

...read moreread less

Abstract: SUMMARY A formal Bayesian analysis of a mixture model usually leads to intractable calculations, since the posterior distribution takes into account all the partitions of the sample. We present approximation methods which evaluate the posterior distribution and Bayes estimators by Gibbs sampling, relying on the missing data structure of the mixture model. The data augmentation method is shown to converge geometrically, since a duality principle transfers properties from the discrete missing data chain to the parameters. The fully conditional Gibbs alternative is shown to be ergodic and geometric convergence is established in the normal case. We also consider non-informative approximations associated with improper priors, assuming that the sample corresponds exactly to a k-component mixture.

...read moreread less

895 citations

Journal Article•DOI•

Operations for learning with graphical models

[...]

Wray Buntine¹•Institutions (1)

Ames Research Center¹

01 Aug 1994-Journal of Artificial Intelligence Research

TL;DR: In this article, a multidisciplinary review of empirical, statistical learning from a graphical model perspective is presented, including decomposition, differentiation, and manipulation of probability models from the exponential family.

...read moreread less

Abstract: This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Well-known examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feed-forward networks, and learning Gaussian and discrete Bayesian networks frorn data. The paper concludes by sketching some implications for data analysis and summarizing how some popular algorithms fall within the framework presented. The main original contributions here are the decomposition techniques and the demonstration that graphical models provide a framework for understanding and developing complex learning algorithms.

...read moreread less

617 citations

Journal Article•DOI•

The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence

[...]

Chuanhai Liu¹, Donald B. Rubin¹•Institutions (1)

Harvard University¹

01 Dec 1994-Biometrika

TL;DR: ECME as discussed by the authors is a generalization of the ECM algorithm, which is itself an extension of the EM algorithm (Dempster, Laird & Rubin, 1977), which can be obtained by replacing some CM-steps of ECM, which maximise the constrained expected complete-data loglikelihood function, with steps that maximize the correspondingly constrained actual likelihood function.

...read moreread less

Abstract: A generalisation of the ECM algorithm (Meng & Rubin, 1993), which is itselfan extension of the EM algorithm (Dempster, Laird & Rubin, 1977), can be obtained by replacing some CM-steps of ECM, which maximise the constrained expected complete-data loglikelihood function, with steps that maximise the correspondingly constrained actual likelihood function. This algorithm, which we call ECME algorithm, for Expectation /Conditional Maximisation Either, shares with both EM and ECM their stable monotone convergence and basic simplicity of implementation relative to competing faster converging methods. Moreover, ECME can have a substantially faster convergence rate than either EM or ECM, measured using either the number of iterations or actual computer time

...read moreread less

604 citations

Journal Article•DOI•

A Class of Pattern-Mixture Models for Normal Incomplete Data

[...]

Roderick J. A. Little¹•Institutions (1)

University of Michigan¹

01 Sep 1994-Biometrika

TL;DR: In this article, a new class of pattern-mixture models for the situation where missingness is assumed to depend on an arbitrary unspecified function of a linear combination of the two variables is described.

...read moreread less

Abstract: SUMMARY Likelihood-based methods are developed for analyzing a random sample on two continuous variables when values of one of the variables are missing. Normal maximum likelihood estimates when values are missing completely at random were derived by Anderson (1957). They are also maximum likelihood providing the missing-data mechanism is ignorable, in Rubin's (1976) sense that the mechanism depends only on observed data. A new class of pattern-mixture models (Little, 1993) is described for the situation where missingness is assumed to depend on an arbitrary unspecified function of a linear combination of the two variables. Maximum likelihood for models in this class is straightforward, and yields the estimates of Anderson (1957) when missingness depends solely on the completely observed variable, and the estimates of Brown (1990) when missingness depends solely on the incompletely observed variable. Another choice of linear combination yields estimates from complete-case analysis. Large-sample and Bayesian methods are described for this model. The data do not supply information about the ratio of the coefficients of the linear combination that controls missingness. If this ratio is not welldetermined based on prior knowledge, a prior distribution can be specified, and Bayesian inference is then readily accomplished. Alternatively, sensitivity of inferences can be displayed for a variety of choices of the ratio.

...read moreread less

395 citations

Journal Article•DOI•

Noise properties of the EM algorithm: I. Theory.

[...]

Harrison H. Barrett¹, Donald W. Wilson, Benjamin M. W. Tsui•Institutions (1)

University of Arizona¹

01 May 1994-Physics in Medicine and Biology

TL;DR: The theory of expectation-maximization can be used as a basis for calculation of objective figures of merit for image quality over a wide range of conditions in emission tomography.

...read moreread less

Abstract: The expectation-maximization (EM) algorithm is an important tool for maximum-likelihood (ML) estimation and image reconstruction, especially in medical imaging. It is a non-linear iterative algorithm that attempts to find the ML estimate of the object that produced a data set. The convergence of the algorithm and other deterministic properties are well established, but relatively little is known about how noise in the data influences noise in the final reconstructed image. In this paper we present a detailed treatment of these statistical properties. The specific application we have in mind is image reconstruction in emission tomography, but the results are valid for any application of the EM algorithm in which the data set can be described by Poisson statistics. We show that the probability density function for the grey level at a pixel in the image is well approximated by a log-normal law. An expression is derived for the variance of the grey level and for pixel-to-pixel covariance. The variance increases rapidly with iteration number at first, but eventually saturates as the ML estimate is approached. Moreover, the variance at any iteration number has a factor proportional to the square of the mean image (though other factors may also depend on the mean image), so a map of the standard deviation resembles the object itself. Thus low-intensity regions of the image tend to have low noise. By contrast, linear reconstruction methods, such as filtered back-projection in tomography, show a much more global noise pattern, with high-intensity regions of the object contributing to noise at rather distant low-intensity regions. The theoretical results of this paper depend on two approximations, but in the second paper in this series we demonstrate through Monte Carlo simulation that the approximations are justified over a wide range of conditions in emission tomography. The theory can, therefore, be used as a basis for calculation of objective figures of merit for image quality.

...read moreread less

388 citations

Proceedings Article•

An Input Output HMM Architecture

[...]

Yoshua Bengio¹, Paolo Frasconi²•Institutions (2)

Université de Montréal¹, University of Florence²

01 Jan 1994

TL;DR: A recurrent architecture having a modular structure that has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation is introduced.

...read moreread less

Abstract: We introduce a recurrent architecture having a modular structure and we formulate a training procedure based on the EM algorithm. The resulting model has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation.

...read moreread less

344 citations

Journal Article•DOI•

Maximum likelihood estimation: Logic and practice.

[...]

Scott R. Eliason¹•Institutions (1)

University of Iowa¹

01 Sep 1994-Journal of the American Statistical Association

TL;DR: The Logic of Maximum Likelihood (LML) as discussed by the authors is a general modeling framework using maximum likelihood methods for estimating the probability of a given model with a given set of parameters.

...read moreread less

Abstract: Introduction The Logic of Maximum Likelihood A General Modeling Framework Using Maximum Likelihood Methods An Introduction to Basic Estimation Techniques Further Empirical Examples Additional Likelihoods Conclusions

...read moreread less

Journal Article•DOI•

Joint parameter estimation and symbol detection for linear or nonlinear unknown channels

[...]

Ghassan Kawas Kaleh¹, R. Vallet¹•Institutions (1)

Télécom ParisTech¹

01 Jul 1994-IEEE Transactions on Communications

TL;DR: An iterative method for joint channel parameter estimation and symbol selection via the Baum-Welch algorithm, or equivalently the Expectation-Maximization (EM) algorithm, which can easily give optimum decisions on information symbols which minimize the symbol error probability.

...read moreread less

Abstract: We present an iterative method for joint channel parameter estimation and symbol selection via the Baum-Welch algorithm, or equivalently the Expectation-Maximization (EM) algorithm. Channel parameters, including noise variance, are estimated using a maximum likelihood criterion. The Markovian properties of the channel state sequence enable us to calculate the required likelihood using a forward-backward algorithm. The calculated likelihood functions can easily give optimum decisions on information symbols which minimize the symbol error probability. The proposed receiver can be used for both linear and nonlinear channels. It improves the system throughput by making saving in the transmission of known symbols, usually employed for channel identification. Simulation results which show fast convergence are presented. >

...read moreread less

Journal Article•DOI•

The statistical analysis of general processing tree models with the EM algorithm

[...]

Xiangen Hu¹, William H. Batchelder¹•Institutions (1)

University of California, Irvine¹

01 Mar 1994-Psychometrika

TL;DR: In this paper, the authors consider the case where branch probabilities are products of nonnegative integer powers in the parameters, and their complements, 1 - θs, and show that the EM algorithm necessarily converges to a local maximum.

...read moreread less

Abstract: Multinomial processing tree models assume that an observed behavior category can arise from one or more processing sequences represented as branches in a tree. These models form a subclass of parametric, multinomial models, and they provide a substantively motivated alternative to loglinear models. We consider the usual case where branch probabilities are products of nonnegative integer powers in the parameters, 0≤θs≤1, and their complements, 1 - θs. A version of the EM algorithm is constructed that has very strong properties. First, the E-step and the M-step are both analytic and computationally easy; therefore, a fast PC program can be constructed for obtaining MLEs for large numbers of parameters. Second, a closed form expression for the observed Fisher information matrix is obtained for the entire class. Third, it is proved that the algorithm necessarily converges to a local maximum, and this is a stronger result than for the exponential family as a whole. Fourth, we show how the algorithm can handle quite general hypothesis tests concerning restrictions on the model parameters. Fifth, we extend the algorithm to handle the Read and Cressie power divergence family of goodness-of-fit statistics. The paper includes an example to illustrate some of these results.

...read moreread less

Proceedings Article•

An Alternative Model for Mixtures of Experts

[...]

Lei Xu¹, Michael I. Jordan², Geoffrey E. Hinton³•Institutions (3)

The Chinese University of Hong Kong¹, Massachusetts Institute of Technology², University of Toronto³

01 Jan 1994

TL;DR: An alternative model for mixtures of experts which uses a different parametric form for the gating network, trained by the EM algorithm, and which yields faster convergence.

...read moreread less

Abstract: We propose an alternative model for mixtures of experts which uses a different parametric form for the gating network. The modified model is trained by the EM algorithm. In comparison with earlier models--trained by either EM or gradient ascent--there is no need to select a learning stepsize. We report simulation experiments which show that the new architecture yields faster convergence. We also apply the new model to two problem domains: piecewise nonlinear function approximation and the combination of multiple previously trained classifiers.

...read moreread less