Showing papers on "Expectation–maximization algorithm published in 1996"

PDF

Open Access

Journal Article•DOI•

[...]

T.K. Moon¹•Institutions (1)

01 Nov 1996-IEEE Signal Processing Magazine

TL;DR: The EM (expectation-maximization) algorithm is ideally suited to problems of parameter estimation, in that it produces maximum-likelihood (ML) estimates of parameters when there is a many-to-one mapping from an underlying distribution to the distribution governing the observation.

...read moreread less

Abstract: A common task in signal processing is the estimation of the parameters of a probability distribution function Perhaps the most frequently encountered estimation problem is the estimation of the mean of a signal in noise In many parameter estimation problems the situation is more complicated because direct access to the data necessary to estimate the parameters is impossible, or some of the data are missing Such difficulties arise when an outcome is a result of an accumulation of simpler outcomes, or when outcomes are clumped together, for example, in a binning or histogram operation There may also be data dropouts or clustering in such a way that the number of underlying data points is unknown (censoring and/or truncation) The EM (expectation-maximization) algorithm is ideally suited to problems of this sort, in that it produces maximum-likelihood (ML) estimates of parameters when there is a many-to-one mapping from an underlying distribution to the distribution governing the observation The EM algorithm is presented at a level suitable for signal processing practitioners who have had some exposure to estimation theory

...read moreread less

2,573 citations

Journal Article•DOI•

On convergence properties of the em algorithm for gaussian mixtures

[...]

Lei Xu¹, Michael I. Jordan²•Institutions (2)

The Chinese University of Hong Kong¹, Massachusetts Institute of Technology²

01 Jan 1996-Neural Computation

TL;DR: The mathematical connection between the Expectation-Maximization (EM) algorithm and gradient-based approaches for maximum likelihood learning of finite gaussian mixtures is built up and an explicit expression for the matrix is provided.

...read moreread less

Abstract: We build up the mathematical connection between the “Expectation-Maximization” (EM) algorithm and gradient-based approaches for maximum likelihood learning of finite gaussian mixtures. We show that the EM step in parameter space is obtained from the gradient via a projection matrix P, and we provide an explicit expression for the matrix. We then analyze the convergence of EM in terms of special properties of P and provide new results analyzing the effect that P has on the likelihood surface. Based on these mathematical results, we present a comparative discussion of the advantages and disadvantages of EM and other algorithms for the learning of gaussian mixture models.

...read moreread less

849 citations

Journal Article•DOI•

Network Tomography: Estimating Source-Destination Traffic Intensities from Link Data

[...]

Y. Vardi¹•Institutions (1)

Rutgers University¹

01 Mar 1996-Journal of the American Statistical Association

TL;DR: In this article, the problem of estimating the node-to-node traffic intensity from repeated measurements of traffic on the links of a network is formulated and discussed under Poisson assumptions and two types of traffic-routing regimens: deterministic (a fixed known path between each directed pair of nodes) and Markovian (a random path between a pair of vertices, determined according to a known Markov chain fixed for that pair).

...read moreread less

Abstract: The problem of estimating the node-to-node traffic intensity from repeated measurements of traffic on the links of a network is formulated and discussed under Poisson assumptions and two types of traffic-routing regimens: deterministic (a fixed known path between each directed pair of nodes) and Markovian (a random path between each directed pair of nodes, determined according to a known Markov chain fixed for that pair). Maximum likelihood estimation and related approximations are discussed, and computational difficulties are pointed out. A detailed methodology is presented for estimates based on the method of moments. The estimates are derived algorithmically, taking advantage of the fact that the first and second moment equations give rise to a linear inverse problem with positivity restrictions that can be approached by an EM algorithm, resulting in a particularly simple solution to a hard problem. A small simulation study is carried out.

...read moreread less

801 citations

Journal Article•DOI•

Discriminant Analysis by Gaussian Mixtures

[...]

Trevor Hastie¹, Robert Tibshirani•Institutions (1)

Stanford University¹

01 Jan 1996-Journal of the royal statistical society series b-methodological

TL;DR: This paper fits Gaussian mixtures to each class to facilitate effective classification in non-normal settings, especially when the classes are clustered.

...read moreread less

Abstract: Fisher-Rao linear discriminant analysis (LDA) is a valuable tool for multigroup classification. LDA is equivalent to maximum likelihood classification assuming Gaussian distributions for each class. In this paper, we fit Gaussian mixtures to each class to facilitate effective classification in non-normal settings, especially when the classes are clustered. Low dimensional views are an important by-product of LDA-our new techniques inherit this feature. We can control the within-class spread of the subclass centres relative to the between-class spread. Our technique for fitting these models permits a natural blend with nonparametric versions of LDA.

...read moreread less

791 citations

The EM algorithm for mixtures of factor analyzers

[...]

Zoubin Ghahramani¹, Geoffrey E. Hinton•Institutions (1)

University of Toronto¹

21 May 1996

TL;DR: This work presents an exact Expectation{Maximization algorithm for determining the parameters of this mixture of factor analyzers which concurrently performs clustering and dimensionality reduction, and can be thought of as a reduced dimension mixture of Gaussians.

...read moreread less

Abstract: Factor analysis, a statistical method for modeling the covariance structure of high dimensional data using a small number of latent variables, can be extended by allowing di erent local factor models in di erent regions of the input space. This results in a model which concurrently performs clustering and dimensionality reduction, and can be thought of as a reduced dimension mixture of Gaussians. We present an exact Expectation{Maximization algorithm for tting the parameters of this mixture of factor analyzers.

...read moreread less

705 citations

Journal Article•

Fitting Phase-type Distributions via the EM Algorithm

[...]

Ren Asmussen, Olle Nerman, Marita Olsson

01 Jan 1996-Scandinavian Journal of Statistics

TL;DR: An extended EM algorithm is used to minimize the information divergence (maximize the relative entropy) in the density approximation case and fits to Weibull, log normal, and Erlang distributions are used as illustrations of the latter.

...read moreread less

Abstract: Estimation from sample data and density approximation with phase-type distribu- tions are considered. Maximum likelihood estimation via the EM algorithm is discussed and performed for some data sets. An extended EM algorithm is used to minimize the information divergence (maximize the relative entropy) in the density approximation case. Fits to Weibull, log normal, and Erlang distributions are used as illustrations of the latter.

...read moreread less

690 citations

Journal Article•DOI•

Testing for linkage disequilibrium in genotypic data using the expectation-maximization algorithm

[...]

Montgomery Slatkin¹, Laurent Excoffier²•Institutions (2)

University of California, Berkeley¹, University of Geneva²

01 Apr 1996-Heredity

TL;DR: It is concluded that with highly polymorphic loci, the EM algorithm does lead to a useful test for linkage disequilibrium, but that it is necessary to find the empirical distribution of likelihood ratios in order to perform a test of significance correctly.

...read moreread less

Abstract: We generalize an approach suggested by Hill (Heredity, 33, 229-239, 1974) for testing for significant association among alleles at two loci when only genotype and not haplotype frequencies are available. The principle is to use the Expectation-Maximization (EM) algorithm to resolve double heterozygotes into haplotypes and then apply a likelihood ratio test in order to determine whether the resolutions of haplotypes are significantly nonrandom, which is equivalent to testing whether there is statistically significant linkage disequilibrium between loci. The EM algorithm in this case relies on the assumption that genotype frequencies at each locus are in Hardy-Weinberg proportions. This method can accommodate X-linked loci and samples from haplodiploid species. We use three methods for testing significance of the likelihood ratio: the empirical distribution in a large number of randomized data sets, the X2 approximation for the distribution of likelihood ratios, and the Z2 test. The performance of each method is evaluated by applying it to simulated data sets and comparing the tail probability with the tail probability from Fisher's exact test applied to the actual haplotype data. For realistic sample sizes (50-150 individuals) all three methods perform well with two or three alleles per locus, but only the empirical distribution is adequate when there are five to eight alleles per locus, as is typical of hypervariable loci such as microsatellites. The method is applied to a data set of 32 microsatellite loci in a Finnish population and the results confirm the theoretical predictions. We conclude that with highly polymorphic loci, the EM algorithm does lead to a useful test for linkage disequilibrium, but that it is necessary to find the empirical distribution of likelihood ratios in order to perform a test of significance correctly.

...read moreread less

556 citations

Journal Article•DOI•

Calculating posterior distributions and modal estimates in Markov mixture models

[...]

Siddhartha Chib¹•Institutions (1)

Washington University in St. Louis¹

01 Nov 1996-Journal of Econometrics

TL;DR: A new, full Bayesian approach based on the method of Gibbs sampling is developed, and it is shown that the latent variables, one for each observation, can be simulated from their joint distribution given the data and the remaining parameters.

...read moreread less

532 citations

Journal Article•DOI•

A unified approach to statistical tomography using coordinate descent optimization

[...]

Charles A. Bouman¹, Ken David Sauer²•Institutions (2)

Purdue University¹, University of Notre Dame²

01 Mar 1996-IEEE Transactions on Image Processing

TL;DR: This work proposes a new approach to statistically optimal image reconstruction based on direct optimization of the MAP criterion, which requires approximately the same amount of computation per iteration as EM-based approaches, but the new method converges much more rapidly.

...read moreread less

Abstract: Over the past years there has been considerable interest in statistically optimal reconstruction of cross-sectional images from tomographic data. In particular, a variety of such algorithms have been proposed for maximum a posteriori (MAP) reconstruction from emission tomographic data. While MAP estimation requires the solution of an optimization problem, most existing reconstruction algorithms take an indirect approach based on the expectation maximization (EM) algorithm. We propose a new approach to statistically optimal image reconstruction based on direct optimization of the MAP criterion. The key to this direct optimization approach is greedy pixel-wise computations known as iterative coordinate decent (ICD). We propose a novel method for computing the ICD updates, which we call ICD/Newton-Raphson. We show that ICD/Newton-Raphson requires approximately the same amount of computation per iteration as EM-based approaches, but the new method converges much more rapidly (in our experiments, typically five to ten iterations). Other advantages of the ICD/Newton-Raphson method are that it is easily applied to MAP estimation of transmission tomograms, and typical convex constraints, such as positivity, are easily incorporated.

...read moreread less

493 citations

Journal Article•DOI•

A row-action alternative to the EM algorithm for maximizing likelihood in emission tomography

[...]

J. Browne, A.B. de Pierro

01 Oct 1996

TL;DR: The authors present a row-action maximum likelihood algorithm (RAMLA) as an alternative to the EM algorithm for maximizing the Poisson likelihood in ECT and show that their modification converges to a ML solution whereas the standard OS-EM does not.

...read moreread less

Abstract: The maximum likelihood (ML) approach to estimating the radioactive distribution in the body cross section has become very popular among researchers in emission computed tomography (ECT) since it has been shown to provide very good images compared to those produced with the conventional filtered backprojection (FBP) algorithm. The expectation maximization (EM) algorithm is an often-used iterative approach for maximizing the Poisson likelihood in ECT because of its attractive theoretical and practical properties. Its major disadvantage is that, due to its slow rate of convergence, a large amount of computation is often required to achieve an acceptable image. Here, the authors present a row-action maximum likelihood algorithm (RAMLA) as an alternative to the EM algorithm for maximizing the Poisson likelihood in ECT. The authors deduce the convergence properties of this algorithm and demonstrate by way of computer simulations that the early iterates of RAMLA increase the Poisson likelihood in ECT at an order of magnitude faster that the standard EM algorithm. Specifically, the authors show that, from the point of view of measuring total radionuclide uptake in simulated brain phantoms, iterations 1, 2, 3, and 4 of RAMLA perform at least as well as iterations 45, 60, 70, and 80, respectively, of EM. Moreover, the authors show that iterations 1, 2, 3, and 4 of RAMLA achieve comparable likelihood values as iterations 45, 60, 70, and 80, respectively, of EM. The authors also present a modified version of a recent fast ordered subsets EM (OS-EM) algorithm and show that RAMLA is a special case of this modified OS-EM. Furthermore, the authors show that their modification converges to a ML solution whereas the standard OS-EM does not.

...read moreread less

434 citations

Journal Article•DOI•

Iterative deblurring for CT metal artifact reduction

[...]

Ge Wang¹, Donald L. Snyder¹, Joseph A. O'Sullivan¹, Michael W. Vannier¹•Institutions (1)

Washington University in St. Louis¹

01 Jan 1996-IEEE Transactions on Medical Imaging

TL;DR: In experiments with synthetic noise-free and additive noisy projection data of dental phantoms, it is found that both simultaneous iterative algorithms produce superior image quality as compared to filtered backprojection after linearly fitting projection gaps.

...read moreread less

Abstract: Iterative deblurring methods using the expectation maximization (EM) formulation and the algebraic reconstruction technique (ART), respectively, are adapted for metal artifact reduction in medical computed tomography (CT). In experiments with synthetic noise-free and additive noisy projection data of dental phantoms, it is found that both simultaneous iterative algorithms produce superior image quality as compared to filtered backprojection after linearly fitting projection gaps. Furthermore, the EM-type algorithm converges faster than the ART-type algorithm in terms of either the I-divergence or Euclidean distance between ideal and reprojected data in the authors' simulation. Also, for a given iteration number, the EM-type deblurring method produces better image clarity but stronger noise than the ART-type reconstruction. The computational complexity of EM- and ART-based iterative deblurring is essentially the same, dominated by reprojection and backprojection. Relevant practical and theoretical issues are discussed.

...read moreread less

Journal Article•DOI•

Maximizing the Usefulness of Data Obtained with Planned Missing Value Patterns: An Application of Maximum Likelihood Procedures.

[...]

John W. Graham¹, Scott M. Hofer, David P. MacKinnon•Institutions (1)

Pennsylvania State University¹

01 Apr 1996-Multivariate Behavioral Research

TL;DR: The simulation demonstrated that maximum likelihood estimation and multiple imputation methods produce the most efficient and least biased estimates of variances and covariances for normally distributed and slightly skewed data when data are missing completely at random (MCAR).

...read moreread less

Abstract: Researchers often face a dilemma: Should they collect little data and emphasize quality, or much data at the expense of quality? The utility of the 3-form design coupled with maximum likelihood methods for estimation of missing values was evaluated. In 3-form design surveys, four sets of items. X, A, B, and C are administered: Each third of the subjects receives X and one combination of two other item sets - AB, BC, or AC. Variances and covariances were estimated with pairwise deletion, mean replacement, single imputation, multiple imputation, raw data maximum likelihood, multiple-group covariance structure modeling, and Expectation-Maximization (EM) algorithm estimation. The simulation demonstrated that maximum likelihood estimation and multiple imputation methods produce the most efficient and least biased estimates of variances and covariances for normally distributed and slightly skewed data when data are missing completely at random (MCAR). Pairwise deletion provided equally unbiased estimates but was less efficient than ML procedures. Further simulation results demonstrated that nun-maximum likelihood methods break down when data are not missing completely at random. Application of these methods with empirical drug use data resulted in similar covariance matrices for pairwise and EM estimation, however, ML estimation produced better and more efficient regression estimates. Maximum likelihood estimation or multiple imputation procedures. which are now becoming more readily available, are always recommended. In order to maximize the efficiency of the ML parameter estimates, it is recommended that scale items be split across forms rather than being left intact within forms.

...read moreread less

Journal Article•DOI•

Input-output HMMs for sequence processing

[...]

Yoshua Bengio¹, Paolo Frasconi•Institutions (1)

Université de Montréal¹

01 Sep 1996-IEEE Transactions on Neural Networks

TL;DR: It is demonstrated that IOHMMs are well suited for solving grammatical inference problems on a benchmark problem and able to map input sequences to output sequences, using the same processing style as recurrent neural networks.

...read moreread less

Abstract: We consider problems of sequence processing and propose a solution based on a discrete-state model in order to represent past context. We introduce a recurrent connectionist architecture having a modular structure that associates a subnetwork to each state. The model has a statistical interpretation we call input-output hidden Markov model (IOHMM). It can be trained by the estimation-maximization (EM) or generalized EM (GEM) algorithms, considering state trajectories as missing data, which decouples temporal credit assignment and actual parameter estimation. The model presents similarities to hidden Markov models (HMMs), but allows us to map input sequences to output sequences, using the same processing style as recurrent neural networks. IOHMMs are trained using a more discriminant learning paradigm than HMMs, while potentially taking advantage of the EM algorithm. We demonstrate that IOHMMs are well suited for solving grammatical inference problems on a benchmark problem. Experimental results are presented for the seven Tomita grammars, showing that these adaptive models can attain excellent generalization.

...read moreread less

Proceedings Article•DOI•

A unified mixture framework for motion segmentation: incorporating spatial coherence and estimating the number of models

[...]

Yair Weiss¹, Edward H. Adelson¹•Institutions (1)

Massachusetts Institute of Technology¹

18 Jun 1996

TL;DR: This work shows how to add spatial constraints to the mixture formulations and presents a variant of the EM algorithm that makes use of both the form and the motion constraints and estimates the number of segments given knowledge about the level of model failure expected in the sequence.

...read moreread less

Abstract: Describing a video sequence in terms of a small number of coherently moving segments is useful for tasks ranging from video compression to event perception. A promising approach is to view the motion segmentation problem in a mixture estimation framework. However, existing formulations generally use only the motion, data and thus fail to make use of static cues when segmenting the sequence. Furthermore, the number of models is either specified in advance or estimated outside the mixture model framework. In this work we address both of these issues. We show how to add spatial constraints to the mixture formulations and present a variant of the EM algorithm that males use of both the form and the motion constraints. Moreover this algorithm estimates the number of segments given knowledge about the level of model failure expected in the sequence. The algorithm's performance is illustrated on synthetic and real image sequences.

...read moreread less

Journal Article•DOI•

MIXREG: a computer program for mixed-effects regression analysis with autocorrelated errors

[...]

Donald Hedeker¹, Robert D. Gibbons¹•Institutions (1)

University of Illinois at Chicago¹

01 May 1996-Computer Methods and Programs in Biomedicine

TL;DR: MIXREG is a program that provides estimates for a mixed-effects regression model (MRM) for normally-distributed response data including autocorrelated errors, utilizing both the EM algorithm and a Fisher-scoring solution.

...read moreread less

Journal Article•DOI•

Nonparametric mle under two closed capture-recapture models with heterogeneity

[...]

James L. Norris, Kenneth H. Pollock

01 Jun 1996-Biometrics

TL;DR: In this paper, a nonparametric maximum likelihood estimator (MLE) of the (N, F) pair is derived for any specified population size N and the MLE of the pair and thus for N is determined.

...read moreread less

Abstract: We conduct nonparametric maximum likelihood estimation under two common heterogeneous closed population capture-recapture models. Our models specify mixture models (as did previous researchers' models) which have a common generating distribution, say F, for the capture probabilities. Using Lindsay and Roeder's (1992, Journal of the American Statistical Association 87, 785-794) mixture model results and the EM algorithm, a nonparametric maximum likelihood estimator (MLE) of F for any specified population size N is obtained. Then, the nonparametric MLE of the (N, F) pair and thus for N is determined. Perhaps most importantly, since our MLE pair maximizes the likelihood under the entire nonparametric probability model, it provides an excellent foundation for estimating properties of estimators, conducting a goodness-of-fit test, and performing a likelihood ratio test. These are illustrated in the paper.

...read moreread less

Journal Article•DOI•

Pattern-mixture models for multivariate incomplete data with covariates.

[...]

Roderick J. A. Little¹, Yongxiao Wang²•Institutions (2)

University of Michigan¹, University of California, Los Angeles²

01 Mar 1996-Biometrics

TL;DR: The methods are illustrated on a data set involving alternative dosage regimens for the treatment of schizophrenia using haloperidol and on a regression example, where the new methods are compared with complete-case analysis and maximum likelihood for a probit selection model.

...read moreread less

Abstract: Pattern-mixture models stratify incomplete data by the pattern of missing values and formulate distinct models within each stratum. Pattern-mixture models are developed for analyzing a random sample on continuous variables y(1), y(2) when values of y(2) are nonrandomly missing. Methods for scalar y(1) and y(2) are here generalized to vector y(1) and y(2) with additional fixed covariates x. Parameters in these models are identified by alternative assumptions about the missing-data mechanism. Models may be underidentified (in which case additional assumptions are needed), just-identified, or overidentified. Maximum likelihood and Bayesian methods are developed for the latter two situations, using the EM and SEM algorithms, direct and interactive simulation methods. The methods are illustrated on a data set involving alternative dosage regimens for the treatment of schizophrenia using haloperidol and on a regression example. Sensitivity to alternative assumptions about the missing-data mechanism is assessed, and the new methods are compared with complete-case analysis and maximum likelihood for a probit selection model.

...read moreread less

Journal Article•DOI•

Using generative models for handwritten digit recognition

[...]

Michael Revow¹, Christopher Williams², Geoffrey E. Hinton¹•Institutions (2)

University of Toronto¹, Aston University²

01 Jun 1996-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A method of recognizing handwritten digits by fitting generative models that are built from deformable B-splines with Gaussian "ink generators" spaced along the length of the spline using a novel elastic matching procedure based on the expectation maximization algorithm.

...read moreread less

Abstract: We describe a method of recognizing handwritten digits by fitting generative models that are built from deformable B-splines with Gaussian "ink generators" spaced along the length of the spline. The splines are adjusted using a novel elastic matching procedure based on the expectation maximization algorithm that maximizes the likelihood of the model generating the data. This approach has many advantages: 1) the system not only produces a classification of the digit but also a rich description of the instantiation parameters which can yield information such as the writing style; 2) the generative models can perform recognition driven segmentation; 3) the method involves a relatively small number of parameters and hence training is relatively easy and fast; and 4) unlike many other recognition schemes, it does not rely on some form of pre-normalization of input images, but can handle arbitrary scalings, translations and a limited degree of image rotation. We have demonstrated that our method of fitting models to images does not get trapped in poor local minima. The main disadvantage of the method is that it requires much more computation than more standard OCR techniques.

...read moreread less

Journal Article•DOI•

Iterative multiuser receivers for CDMA channels: an EM-based approach

[...]

L.B. Nelson¹, H.V. Poor²•Institutions (2)

University of Minnesota¹, Princeton University²

01 Dec 1996-IEEE Transactions on Communications

TL;DR: This paper considers new iterative multiuser receivers based on the expectation-maximization (EM) algorithm and related, more powerful "space-alternating" algorithms that alternately updates individual parameter components or treats them as probabilistic missing data.

...read moreread less

Abstract: Maximum-likelihood detection for the multiuser code-division multiple-access (CDMA) channel is prohibitively complex. This paper considers new iterative multiuser receivers based on the expectation-maximization (EM) algorithm and related, more powerful "space-alternating" algorithms. The latter algorithms include the SAGE algorithm and a new "missing parameter" space-alternating algorithm that alternately updates individual parameter components or treats them as probabilistic missing data. Application of these EM-based algorithms to the problem of discrete parameter estimation (i.e., data detection) in the Gaussian multiple-access channel leads to a variety of convergent receiver structures that incorporate soft-decision feedback for interference cancellation and/or sequential updating of iterative bit estimates. Convergence and performance analyses are based on well-known properties of the EM algorithm and on numerical simulation.

...read moreread less

Journal Article•DOI•

A general maximum likelihood analysis of overdispersion in generalized linear models

[...]

Murray Aitkin¹•Institutions (1)

University of Western Australia¹

01 Sep 1996-Statistics and Computing

TL;DR: An EM algorithm for maximum likelihood estimation in generalized linear models with overdispersion is presented, initially derived as a form of Gaussian quadrature assuming a normal mixing distribution, giving a straightforward method for the fully non-parametric ML estimation of this distribution.

...read moreread less

Abstract: This paper presents an EM algorithm for maximum likelihood estimation in generalized linear models with overdispersion The algorithm is initially derived as a form of Gaussian quadrature assuming a normal mixing distribution, but with only slight variation it can be used for a completely unknown mixing distribution, giving a straightforward method for the fully non-parametric ML estimation of this distribution This is of value because the ML estimates of the GLM parameters may be sensitive to the specification of a parametric form for the mixing distribution A listing of a GLIM4 algorithm for fitting the overdispersed binomial logit model is given in an appendix

...read moreread less

Journal Article•DOI•

Stochastic versions of the em algorithm: an experimental study in the mixture case

[...]

Gilles Celeux, Didier Chauveau¹, Jean Diebolt²•Institutions (2)

University of Marne-la-Vallée¹, Joseph Fourier University²

01 Nov 1996-Journal of Statistical Computation and Simulation

TL;DR: It is shown that, for some particular mixture situations, the SEM algorithm is almost always preferable to the EM and “simulated annealing” versions SAEM and MCEM.

...read moreread less

Abstract: We compare three different stochastic versions of the EM algorithm: The Stochastic EM algorithm (SEM), the ''Simulated Annealing'' EM algorithm (SAEM) and the Monte Carlo EM algorithm (MCEM). We focus particularly on the mixture of distributions problem. In this context, we investigate the practical behaviour of these algorithms through intensive Monte Carlo numerical simulations and a real data study. We show that, for some particular mixture situations, the SEM algorithm is almost always preferable to the EM and ''simulated annealing'' versions SAEM and MCEM. For some severely overlapping mixtures, however, none of these algorithms can be confidently used. Then, SEM can be used as an efficient data exploratory tool for locating significant maxima of the likelihood function. In the real data case, we show that the SEM stationary distribution provides a contrasted view of the loglikelihood by emphasizing sensible maxima.

...read moreread less

Journal Article•DOI•

An EM algorithm for estimation in Markov-modulated Poisson processes

[...]

Tobias Rydén¹•Institutions (1)

Lund University¹

01 Apr 1996-Computational Statistics & Data Analysis

TL;DR: This paper presents an EM algorithm for computing maximum-likelihood estimates of the parameters of a Markov-modulated Poisson process, and compares it to the Nelder-Mead downhill simplex algorithm.

...read moreread less

Journal Article•DOI•

Fitting Full-Information Item Factor Models and an Empirical Investigation of Bridge Sampling

[...]

Xiao-Li Meng¹, Stephen G. Schilling²•Institutions (2)

University of Chicago¹, Vanderbilt University²

01 Sep 1996-Journal of the American Statistical Association

TL;DR: Two ways of implementing Monte Carlo Expectation Maximization algorithm to fit a FIIF model are illustrated, using the Gibbs sampler to carry out the computation for the E steps and how to use bridge sampling to simulate the likelihood ratios for monitoring the convergence of a Monte Carlo EM.

...read moreread less

Abstract: Based on item response theory, Bock and Aitken introduced a method of item factor analysis, termed full-information item factor (FIIF) analysis by Bartholomew because it uses all distinct item response vectors as data. But a limitation of their fitting algorithm is its reliance on fixed-point Gauss—Hermite quadrature, which can produce appreciable numerical errors, especially in high-dimension problems. The first purpose of this article is to offer more reliable methods by using recent advances in statistical computation. Specifically, we illustrate two ways of implementing Monte Carlo Expectation Maximization (EM) algorithm to fit a FIIF model, using the Gibbs sampler to carry out the computation for the E steps. We also show how to use bridge sampling to simulate the likelihood ratios for monitoring the convergence of a Monte Carlo EM, a strategy that is useful in general. Simulations demonstrate substantial improvement over Bock and Aitken's algorithm in recovering known factor loadings in hig...

...read moreread less

Journal Article•DOI•

A conditional model for incomplete covariates in parametric regression models

[...]

Stuart R. Lipsitz¹, Joseph G. Ibrahim¹•Institutions (1)

Harvard University¹

01 Dec 1996-Biometrika

TL;DR: In this paper, a conditional model for the covariate distribution was proposed to reduce the number of nuisance parameters for the distribution of the covariates in the E-step of the EM algorithm.

...read moreread less

Abstract: SUMMARY Incomplete covariate data arise in many data sets. When the missing covariates are categorical, a useful technique for obtaining parameter estimates is the EM algorithm by the method of weights proposed in Ibrahim (1990). This method requires the estimation of many nuisance parameters for the distribution of the covariates. Unfortunately, in data sets when the percentage of missing data is high, and the missing covariate patterns are highly non-monotone, the estimates of the nuisance parameters can lead to highly unstable estimates of the parameters of interest. We propose a conditional model for the covariate distribution that has several modelling advantages for the E-step and provides a reduction in the number of nuisance parameters, thus providing more stable estimates in finite samples. We present a clinical trials example with six covariates, five of which have some missing values.

...read moreread less

Journal Article•DOI•

Random effects in ordinal regression models

[...]

Gerhard Tutz, Wolfgang Hennevogl

30 Sep 1996-Computational Statistics & Data Analysis

TL;DR: In this paper, three alternative estimation procedures based on the EM algorithm are considered, two of them make use of numerical integration techniques (Gauss-Hermite or Monte Carlo), and the third one is a EM type algorithm based on posterior modes.

...read moreread less

Journal Article•DOI•

Modeling flat stretches, bursts, and outliers in time series using mixture transition distribution models

[...]

Nhu D. Le, R. Douglas Martin¹, Adrian E. Raftery¹•Institutions (1)

University of Washington¹

01 Dec 1996-Journal of the American Statistical Association

TL;DR: The class of mixture transition distribution (MTD) time series models is extended to general non-Gaussian time series and the stationarity and autocorrelation properties of the models are derived.

...read moreread less

Abstract: The class of mixture transition distribution (MTD) time series models is extended to general non-Gaussian time series. In these models the conditional distribution of the current observation given the past is a mixture of conditional distributions given each one of the last p observations. They can capture non-Gaussian and nonlinear features such as flat stretches, bursts of activity, outliers changepoints in a single unified model class. They can also represent time series defined on arbitrary state spaces, univariate or multivariate, continuous, discrete or mixed, which need not even be Euclidean. They perform well in the usual case of Gaussian time series without obvious nonstandard behaviors. The models are simple, analytically tractable, easy to simulate readily estimated. The stationarity and autocorrelation properties of the models are derived. A simple EM algorithm is given and shown to work well for estimation. The models are applied to several real and simulated datasets with satisfacto...

...read moreread less

Posted Content•

Statistical Analysis of Cointegrated VAR Processes with Markovian Regime Shifts

[...]

Hans-Martin Krolzig

01 Jan 1996-Research Papers in Economics

TL;DR: In this article, a two-stage maximum likelihood estimation technique is proposed for cointegrated vector autoregressive processes where Markovian shifts occur in the equilibrium mean and the drift of the system.

...read moreread less

Abstract: This paper suggests a new methodological approach to the analysis of cointegrated linear systems subject to changes in regime. We consider cointegrated vector autoregressive processes where Markovian shifts occur in the equilibrium mean and the drift of the system. A two-stage maximum likelihood estimation technique is proposed. In the first stage, based on a finite order VAR approximation of the cointegrated VARMA representation, the Johansen cointegration analysis is invoked to determine the cointegration rank and to estimate the cointegration matrix. An EM algorithm delivers the maximum likelihood estimates of the remaining parameters. The methodology is illustrated with an investigation of international and global business cycles.

...read moreread less

Journal Article•DOI•

A Semiparametric Mixture Approach to Case-Control Studies with Errors in Covariables

[...]

Kathryn Roeder¹, Raymond J. Carroll², Bruce G. Lindsay³•Institutions (3)

Carnegie Mellon University¹, Texas A&M University², Pennsylvania State University³

01 Jun 1996-Journal of the American Statistical Association

TL;DR: In this article, a mixture model was used to estimate the parameters of a prospective logistic model in a case-control study with dichotomous response D that depends on a covariate X for a portion of the sample, both the gold standard X and a surrogate covariate W are available.

...read moreread less

Abstract: Methods are devised for estimating the parameters of a prospective logistic model in a case-control study with dichotomous response D that depends on a covariate X For a portion of the sample, both the gold standard X and a surrogate covariate W are available; however, for the greater portion of the data, only the surrogate covariate W is available By using a mixture model, the relationship between the true covariate and the response can be modeled appropriately for both types of data The likelihood depends on the marginal distribution of X and the measurement error density (W|X, D) The latter is modeled parametrically based on the validation sample The marginal distribution of the true covariate is modeled using a nonparametric mixture distribution In this way we can improve the efficiency and reduce the bias of the parameter estimates The results also apply when there is no validation data provided the error distribution is known or estimated from an independent data source Many of the

...read moreread less

Journal Article•DOI•

An em algorithm for nonlinear random effects models

[...]

Stephen G Walker

01 Jan 1996-Biometrics

TL;DR: In this paper, an EM algorithm for exact maximum likelihood estimation of the population parameters for nonlinear random effects models was introduced, which can account for both within-and between-individual sources of variability and serial correlation within individual observations when analyzing unbalanced repeated measures data.

...read moreread less

Abstract: The pharmaceutical industry is currently interested in the population approach and population models, also known as mixed effects models and random effects models depending on the precise form. Population models are useful in that they can account for both withinand between-individual sources of variability and serial correlation within individual observations when analyzing unbalanced repeated measures data. The modelling of population pharmacodynamic or pharmacokinetic profiles typically involves nonlinear random effects models. Each individual's observations are modelled by identical (up to unknown parameter values) nonlinear regression models, with the distribution of the observations, or a transformation of the observations, about expected responses taken to be normal, with the degree of variability described by a variance model. Between-individual variability is modelled by a population distribution for the individual regression parameter values (random effects). In a parametric analysis the population distribution is taken to be normal, the parameters of which, along with the parameters of the variance model, are known as the population parameters. Maximum likelihood estimation of the population parameters for nonlinear random effects models was pioneered by Beal and Sheiner (1979), and since then a number of algorithms have appeared for approximate maximum likelihood, including Steimer et al. (1984), Lindstrom and Bates (1990), Beal and Sheiner (1992), and Mentre and Gomeni (1995). All of these algorithms are approximate in some way. For a summary see Beal and Sheiner (1992), Wolfinger (1993), Pinheiro and Bates (1994), and Davidian and Giltinan (1995). In this paper an EM algorithm for exact maximum likelihood estimation is introduced. An EM algorithm obtaining maximum likelihood estimates for linear random effects models was introduced by Dempster, Laird, and Rubin (1977). Laird and Ware (1982), Lindstrom and Bates (1988), Jennrich and Schluchter (1986), and Liu and Rubin (1994) all describe hybrid EM algorithms for the linear random effects model. A true EM algorithm for the linear model is described by Jamshidian and Jennrich (1993). Mentre and Gomeni (1995) describe an approximate EM algorithm for nonlinear random effects models and, from the algorithm given in this paper, it can be seen clearly how their approximations arise. The present algorithm uses Monte Carlo methods to perform the E step, a strategy previously adopted in an altogether different model by Guo and Thompson (1994). Guo and Thompson require a Gibbs sampler, that is, a Markov chain Monte Carlo method for their E step, but the present algorithm uses independent samples. In Section 2 of this paper the nonlinear random effects model is described. Section 3 gives the EM algorithm without random effect covariates, while Section 4 gives the modified algorithm in the

...read moreread less

Journal Article•DOI•

Divergence based feature selection for multimodal class densities

[...]

Jana Novovičová¹, Pavel Pudil¹, Josef Kittler²•Institutions (2)

Academy of Sciences of the Czech Republic¹, University of Surrey²

01 Feb 1996-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A new feature selection procedure based on the Kullback J-divergence between two class conditional density functions approximated by a finite mixture of parameterized densities of a special type is presented, which simultaneously yields a pseudo-Bayes decision rule.

...read moreread less

Abstract: A new feature selection procedure based on the Kullback J-divergence between two class conditional density functions approximated by a finite mixture of parameterized densities of a special type is presented. This procedure is suitable especially for multimodal data. Apart from finding a feature subset of any cardinality without involving any search procedure, it also simultaneously yields a pseudo-Bayes decision rule. Its performance is tested on real data.

...read moreread less

Collapse