scispace - formally typeset
Search or ask a question

Showing papers on "Expectation–maximization algorithm published in 1993"


Journal ArticleDOI
TL;DR: In many cases, complete-data maximum likelihood estimation is relatively simple when conditional on some function of the parameters being estimated as mentioned in this paper, and convergence is stable, with each iteration increasing the likelihood.
Abstract: Two major reasons for the popularity of the EM algorithm are that its maximum step involves only complete-data maximum likelihood estimation, which is often computationally simple, and that its convergence is stable, with each iteration increasing the likelihood. When the associated complete-data maximum likelihood estimation itself is complicated, EM is less attractive because the M-step is computationally unattractive. In many cases, however, complete-data maximum likelihood estimation is relatively simple when conditional on some function of the parameters being estimated

1,816 citations


Book ChapterDOI
01 Aug 1993
TL;DR: An expectation-maximization (EM) algorithm for adjusting the parameters of the tree-structured architecture for supervised learning is presented and an online learning algorithm in which the parameters are updated incrementally is developed.
Abstract: We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIMs). Learning is treated as a maximum likelihood problem; in particular, we present an expectation-maximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an online learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain.

1,689 citations


Journal ArticleDOI
TL;DR: In this paper, Scott et al. presented a method for estimating the parameters of a particular class of continuous-time equilibrium models of the term structure of interest, and developed a multifactor equilibrium model of term structure.
Abstract: LOUIS SCOTT is associate professor of finance at the University of Georgia in Athens. model of the term structure of interest necessary for the valuation of bonds and interest rate options; and parameter valestimates, are necessary for the implementation of a specific model. This article presents a method for estimating the parameters of a particular class of continuous-time equilibrium models of the term structure. The theoretical framework for the analysis is the model of Cox, Ingersoll, and Ross [1985a, 1985b1, where a general equilibrium model of asset pricing is used to examine the behavior of the term structure and related issues such as the valuation of interest ratecontingent claims. The Cox, Ingersoll, Ross model, hereafter the CIR model, is a single-factor equilibrium model of the term structure that is consistent with an asset pricing equilibrium, free of arbitrage opportunities, and retains the feature that the interest rate must be non-negative. The one-factor model, however, has the undesirable property that all bond returns are perfectly correlated, and it may not be adequate to characterize the term structure of interest rates and its changing shape over time. The advantage of this one-factor model is the relatively simple closed-form solution for bond prices. As CIR show, the model can be extended to a multifactor setting, with closed-form solutions for bond prices. We follow suggestions in CIR and develop a multifactor equilibrium model of the term structure. The primary objective is to estimate the parameters of the processes that drive interest rate changes and determine the number of factors necessary to characterize the term structure adequately over time. This analysis

589 citations


Journal ArticleDOI
TL;DR: In this article, a likelihood-based method for analysing correlated binary responses based on a multivariate model is discussed, which is related to the pseudo-maximum likelihood approach suggested recently by Zhao & Prentice (1990).
Abstract: SUMMARY In this paper, we discuss a likelihood-based method for analysing correlated binary responses based on a multivariate model. It is related to the pseudo-maximum likelihood approach suggested recently by Zhao & Prentice (1990). Their parameterization results in a simple pairwise model, in which the association between responses is modelled in terms of correlations, while the present paper uses conditional log odds-ratios. With this approach, higher-order associations can be incorporated in a natural way. One important advantage of this parameterization is that the maximum likelihood estimates of the marginal mean parameters are robust to misspecification of the time dependence. We describe an iterative two-stage procedure for obtaining the maximum likelihood estimates. Two examples are presented to illustrate this methodology.

332 citations


Journal ArticleDOI
TL;DR: In this article, the properties of normal/independent distributions are reviewed and several new results are presented for adaptive, robust regression with non-normal error distributions, such as the t, slash, and contaminated normal families.
Abstract: Maximum likelihood estimation with nonnormal error distributions provides one method of robust regression. Certain families of normal/independent distributions are particularly attractive for adaptive, robust regression. This article reviews the properties of normal/independent distributions and presents several new results. A major virtue of these distributions is that they lend themselves to EM algorithms for maximum likelihood estimation. EM algorithms are discussed for least Lp regression and for adaptive, robust regression based on the t, slash, and contaminated normal families. Four concrete examples illustrate the performance of the different methods on real data.

321 citations


Journal ArticleDOI
TL;DR: The online EM schemes have significantly reduced memory requirements and improved convergence, and they can estimate HMM parameters that vary slowly with time or undergo infrequent jump changes.
Abstract: Sequential or online hidden Markov model (HMM) signal processing schemes are derived, and their performance is illustrated by simulation. The online algorithms are sequential expectation maximization (EM) schemes and are derived by using stochastic approximations to maximize the Kullback-Leibler information measure. The schemes can be implemented either as filters or fixed-lag or sawtooth-lag smoothers. They yield estimates of the HMM parameters including transition probabilities, Markov state levels, and noise variance. In contrast to the offline EM algorithm (Baum-Welch scheme), which uses the fixed-interval forward-backward scheme, the online schemes have significantly reduced memory requirements and improved convergence, and they can estimate HMM parameters that vary slowly with time or undergo infrequent jump changes. Similar techniques are used to derive online schemes for extracting finite-state Markov chains imbedded in a mixture of white Gaussian noise (WGN) and deterministic signals of known functional form with unknown parameters. >

289 citations


Book
30 Mar 1993
TL;DR: Observed data techniques - normal approximation observed data techniques the EM algorithm data augmentation the Gibbs sampler.
Abstract: Observed data techniques - normal approximation observed data techniques the EM algorithm data augmentation the Gibbs sampler

267 citations


Journal ArticleDOI
TL;DR: The generalized Pareto distribution (GPD) as mentioned in this paper is a two-parameter family of distributions that can be used to model exceedances over a threshold, since they are asymptotically normal.
Abstract: The generalized Pareto distribution (GPD) is a two-parameter family of distributions that can be used to model exceedances over a threshold. Maximum likelihood estimators of the parameters are preferred, since they are asymptotically normal and asymptotically efficient in many cases. Numerical methods are required for maximizing the log-likelihood, however. This article investigates the properties of a reduction of the two-dimensional numerical search for the zeros of the log-likelihood gradient vector to a one-dimensional numerical search. An algorithm for computing the GPD maximum likelihood estimates based on this dimension reduction and properties are given.

249 citations


Journal ArticleDOI
TL;DR: The nontraditional approach to the problem of estimating the parameters of a stochastic linear system is presented and it is shown how the evolution of the dynamics as a function of the segment length can be modeled using alternative assumptions.
Abstract: A nontraditional approach to the problem of estimating the parameters of a stochastic linear system is presented. The method is based on the expectation-maximization algorithm and can be considered as the continuous analog of the Baum-Welch estimation algorithm for hidden Markov models. The algorithm is used for training the parameters of a dynamical system model that is proposed for better representing the spectral dynamics of speech for recognition. It is assumed that the observed feature vectors of a phone segment are the output of a stochastic linear dynamical system, and it is shown how the evolution of the dynamics as a function of the segment length can be modeled using alternative assumptions. A phoneme classification task using the TIMIT database demonstrates that the approach is the first effective use of an explicit model for statistical dependence between frames of speech. >

238 citations


Journal ArticleDOI
TL;DR: In this article, the authors developed a method to evaluate the smoothed estimator of the disturbance vector in a state space model together with its mean squared error matrix, which leads to an efficient smoother for the state vector.
Abstract: SUMMARY This paper develops a method to evaluate the smoothed estimator of the disturbance vector in a state space model together with its mean squared error matrix. This disturbance smoother also leads to an efficient smoother for the state vector. Applications include a method to calculate auxiliary residuals for unobserved components time series models and an EM algorithm for estimating covariance parameters in a state space model.

219 citations



Journal ArticleDOI
Y. Vardi1, Y. Vardi2, D. Lee1
TL;DR: In this paper, the problem of recovering an input signal from a blurred output, in an input-output system with linear distortion, is modeled as a mathematical inversion of a linear system with positive parameters, subject to positivity constraints on the solution.
Abstract: The problem of recovering an input signal from a blurred output, in an input-output system with linear distortion, is ubiquitous in science and technology. When the blurred output is not degraded by statistical noise the problem is entirely deterministic and amounts to a mathematical inversion of a linear system with positive parameters, subject to positivity constraints on the solution. We show that all such linear inverse problems with positivity restrictions (LININPOS problems for short) can be interpreted as statistical estimation problems from incomplete data based on infinitely large'samples', and that maximum likelihood (ML) estimation and the EM algorithm provide a straightforward method of solution for such problems

01 Jan 1993
TL;DR: The work addresses Bayesian unsupervised satellite image segmentation, using contextual methods, and shows that the spatial or spectral context contribution is sensitive to image parameters such as homogeneity, means, variances, andatial or spectral correlations of the noise.
Abstract: This work addresses Bayesian unsupervised satellite image segmentation. We propose, as an alternative to global methods like MAP or MPM, the use of contextual ones, which is partially justified by previous works. We show, via a simulation study, that spatial or spectral context contribution is sensitive to image parameters such as homogeneity, means, variances, and spatial or spectral correlations of the noise, From this one may choose the best context contribution according to the estimated values of the above parameters. The parameter estimation step is treated by the SEM, a densities mixture estimator which is a stochastic variant of the EM algorithm, Another simulation study shows good robustness of the SEM algorithm with respect to different image parameters. Thus modification of the behavior of the contextual methods, when the SEM-based unsupervised approaches are considered, remains limited and the conclusions of the supervised simulation study stay valid. We propose an "adaptive unsupervised method" using more relevant contextual features. Furthermore, we apply different SEM-based unsuper- vised contextual segmentation methods to two real SPOT images and observe that the results obtained are consistently better than those obtained by a classical histogram based method.

Journal ArticleDOI
Linda Kaufman1
TL;DR: It is shown that the same scaled steepest descent algorithm can be applied to the least squares merit function, and that it can be accelerated using the conjugate gradient approach.
Abstract: The EM algorithm is the basic approach used to maximize the log likelihood objective function for the reconstruction problem in positron emission tomography (PET). The EM algorithm is a scaled steepest ascent algorithm that elegantly handles the nonnegativity constraints of the problem. It is shown that the same scaled steepest descent algorithm can be applied to the least squares merit function, and that it can be accelerated using the conjugate gradient approach. The experiments suggest that one can cut the computation by about a factor of 3 by using this technique. The results are applied to various penalized least squares functions which might be used to produce a smoother image. >

Journal ArticleDOI
TL;DR: In this paper, a piecewise pseudo-maximum likelihood estimator is proposed for auction games, and the conditions for its consistency and its asymptotic distribution are established and analyzed.
Abstract: In applications of game theory to auctions, researchers assume that players choose strategies based upon a commonly known distribution of the latent characteristics. Rational behaviour, within an assumed class of distributions for the latent process, imposes testable restrictions upon the data generating process of the equilibrium strategies. Unfortunately, the support of the distribution of equilibrium strategies often depends upon all of the parameters of the distribution of the latent characteristics, making the standard application of maximum likelihood estimation procedures inappropriate. We present a piecewise pseudo-maximum likelihood estimator as well as the conditions for its consistency and its asymptotic distribution. In empirical applications of game theory to auctions, researchers assume that the distribution of latent (or unobserved) characteristics is common knowledge to the players of the game. For example, in the independent private values model of an auction, the distribution of valuations is known to all bidders. Moreover, each bidder knows that his opponents know the distribution of valuations, and his opponents know that he knows, etc. Based upon their knowledge of the distribution of latent characteristics, and given their realization from that valuation distribution, players are assumed to choose bids which maximize their expected pay-offs from winning the auction. Given this informational structure, the equilibrium of the game can be characterized by appealing to a particular concept of equilibrium (e.g., Bayesian-Nash).

Journal ArticleDOI
TL;DR: The key, as it is shown, is that the EM step can be viewed as a generalized gradient, making it natural to apply generalized conjugate gradient methods in an attempt to accelerate the EM algorithm.
Abstract: The EM algorithm is a very popular and widely applicable algorithm for the computation of maximum likelihood estimates. Although its implementation is generally simple, the EM algorithm often exhibits slow convergence and is costly in some areas of application. Past attempts to accelerate the EM algorithm have most commonly been based on some form of Aitken acceleration. Here we propose an alternative method based on conjugate gradients. The key, as we show, is that the EM step can be viewed (approximately at least) as a generalized gradient, making it natural to apply generalized conjugate gradient methods in an attempt to accelerate the EM algorithm. The proposed method is relatively simple to implement and can handle Problems with a large number of parameters, an important feature of most EM algorithms. To demonstrate the effectiveness of the proposed acceleration method, we consider its application to several Problems in each of the following areas: estimation of a covariance matrix from incomplete mu...

Journal ArticleDOI
TL;DR: It is shown that both image space reconstruction algorithm and expectation maximization algorithm may be obtained from a common mathematical framework and this fact is used to extend ISRA for penalized likelihood estimates.
Abstract: The image space reconstruction algorithm (ISRA) was proposed as a modification of the expectation maximization (EM) algorithm based on physical considerations for application in volume emission computered tomography. As a consequence of this modification, ISRA searches for least squares solutions instead of maximizing Poisson likelihoods as the EM algorithm. It is shown that both algorithms may be obtained from a common mathematical framework. This fact is used to extend ISRA for penalized likelihood estimates. >

Journal ArticleDOI
TL;DR: Several approaches to the identification problem are presented, including a new method based on the EM (expectation maximization) algorithm, which can be considerable if many data are missing.
Abstract: Parameter estimation when the measurement information may be incomplete is discussed. An ARX model is used as a basic system representation. The presentation covers both missing output and missing input. First reconstruction of the missing values is discussed. The reconstruction is based on a state-space formulation of the system, and is performed using Kalman filtering or fixed-interval smoothing formulas. Several approaches to the identification problem are presented, including a new method based on the EM (expectation maximization) algorithm. The different approaches are tested and compared using Monte Carlo simulations. The choice of method is always a tradeoff between estimation accuracy and computational complexity. According to the simulations the gain in accuracy using the EM method can be considerable if many data are missing. >

Journal ArticleDOI
TL;DR: In this article, Monte-Carlo numerical experiments comparing both approaches, mixture and classification, in both assumptions, equal and unknown mixing proprotions are reported, and the differences between the finited sample and the asymptotic behaviour of both approaches are analyzed through additional simulations.
Abstract: Generally, the mixture and the classification approaches via maximum likelihood had been contrasted under different underlying assumptions.In the classification approach, the mixing proportions are assumed to be equal whereas, in the mixture approach, there are supposed to be unknown.In this paper, Monte-Carlo numerical experiments comparing both approaches, mixture and classification, in both assumptions, equal and unknown mixing proprotions are reported.These numerical experiments exhibited that assumptions on the mixing proportions is a more sensitive factor than the choice of the clustering approach, especially in the small setting.Morever, the differences between the finited sample and the asymptotic behaviour of both approaches are analyzed through additional simulations.

Proceedings ArticleDOI
05 Jan 1993
TL;DR: A variant of the expectation maximization algorithm known as the Viterbi algorithm is used to obtain the statistical model from the unaligned sequences, and a multiple alignment of the 400 sequences and 225 other globin sequences was obtained that agrees almost perfectly with a structural alignment by D Bashford et al. (1987).
Abstract: The authors apply hidden Markov models to the problem of statistical modeling and multiple sequence alignment of protein families. A variant of the expectation maximization algorithm known as the Viterbi algorithm is used to obtain the statistical model from the unaligned sequences. In a detailed series of experiments, they have taken 400 unaligned globin sequences, and produced a statistical model entirely automatically from the primary sequences. The authors used no prior knowledge of globin structure. Using this model, a multiple alignment of the 400 sequences and 225 other globin sequences was obtained that agrees almost perfectly with a structural alignment by D. Bashford et al. (1987). This model can also discriminate all these 625 globins from nonglobin protein sequences with greater than 99% accuracy, and can thus be used for database searches. >

Journal ArticleDOI
TL;DR: In this article, the authors defined measures of effective local Gaussian resolution (ELGR) and effective global Gaussian resolutions (EGGR) to compare different reconstruction algorithms and observed their behavior in FBP images and in ML images using two different measurement techniques.
Abstract: Study of the maximum likelihood by EM algorithm (ML) with a reconstruction kernel equal to the intrinsic detector resolution and sieve regularization has demonstrated that any image improvements over filtered backprojection (FBP) are a function of image resolution. Comparing different reconstruction algorithms potentially requires measuring and matching the image resolution. Since there are no standard methods for describing the resolution of images from a nonlinear algorithm such as ML, the authors have defined measures of effective local Gaussian resolution (ELGR) and effective global Gaussian resolution (EGGR) and examined their behaviour in FBP images and in ML images using two different measurement techniques. For FBP these two resolution measures are equal and exhibit the standard convolution behaviour of linear systems.

Journal ArticleDOI
TL;DR: In this article, the authors studied the asymptotic properties of maximum likelihood estimators of parameters when observations are taken from a two-dimensional Gaussian random field with a multiplicative Ornstein-Uhlenbeck covariance function.
Abstract: We study in detail asymptotic properties of maximum likelihood estimators of parameters when observations are taken from a two-dimensional Gaussian random field with a multiplicative Ornstein-Uhlenbeck covariance function. Under the complete lattice sampling plan, it is shown that the maximum likelihood estimators are strongly consistent and asymptotically normal. The asymptotic normality here is normalized by the fourth root of the sample size and is obtained through higher order expansions of the likelihood score equations. Extensions of these results to higher-dimensional processes are also obtained, showing that the convergence rate becomes better as the dimension gets higher.

Journal ArticleDOI
Neil Shephard1
TL;DR: New strategies for the implementation of maximum likelihood estimation of nonlinear time series models are suggested and make use of recent work on the EM algorithm and iterative simulation techniques for fitting stochastic variance models to exchange rate data.
Abstract: New strategies for the implementation of maximum likelihood estimation of nonlinear time series models are suggested. They make use of recent work on the EM algorithm and iterative simulation techniques. The estimation procedures are applied to the problem of fitting stochastic variance models to exchange rate data.

Journal ArticleDOI
TL;DR: A weighted Euclidean distance model for analyzing three-way proximity data is proposed that incorporates a latent class approach and removes the rotational invariance of the classical multidimensional scaling model retaining psychologically meaningful dimensions, and drastically reduces the number of parameters in the traditional INDSCAL model.
Abstract: A weighted Euclidean distance model for analyzing three-way proximity data is proposed that incorporates a latent class approach. In this latent class weighted Euclidean model, the contribution to the distance function between two stimuli is per dimension weighted identically by all subjects in the same latent class. This model removes the rotational invariance of the classical multidimensional scaling model retaining psychologically meaningful dimensions, and drastically reduces the number of parameters in the traditional INDSCAL model. The probability density function for the data of a subject is posited to be a finite mixture of spherical multivariate normal densities. The maximum likelihood function is optimized by means of an EM algorithm; a modified Fisher scoring method is used to update the parameters in the M-step. A model selection strategy is proposed and illustrated on both real and artificial data.

Journal ArticleDOI
TL;DR: In this article, the expectation maximization (EM) method is used to calculate the energy distributions of molecular probes from their adsorption isotherms, and the results are compared to those obtained with the House and Jaycock algorithm HILDA.
Abstract: The expectation-maximization (EM) method of parameter estimation is used to calculate adsorption energy distributions of molecular probes from their adsorption isotherms. EM does not require prior knowledge of the distribution function, or the isotherm, requires no smoothing of the isotherm data, and converges with high stability toward the maximum-likelihood estimate. The method is therefore robust and accurate at high iteration numbers. The EM algorithm is tested with simulated energy distributions corresponding to unimodel Gaussian, bimodal Gaussian, Poisson distributions, and the distributions resulting from Misra isotherms. Theoretical isotherms are generated from these distributions using the Langmuir model, and then chromatographic band profiles are computed using the ideal model of chromatography. Noise is then introduced in the theoretical band profiles comparable to those observed experimentally. The isotherm is then calculated using the elution-by-characteristic points method. The energy distribution given by the EM method is compared to the original one. The results are contrasted to those obtained with the House and Jaycock algorithm HILDA and shown to be superior in terms of both robustness, accuracy, and information theory. 20 refs., 6 figs., 4 tabs.

Journal ArticleDOI
TL;DR: It is proven that the sequence of iterates that is generated by using the expectation maximization algorithm is monotonically increasing in posterior probability, with stable points of the iteration satisfying the necessary maximizer conditions of the maximum a posteriori solution.
Abstract: The three-dimensional image-reconstruction problem solved here for optical-sectioning microscopy is to estimate the fluorescence intensity λ(x), where x ∈ ℛ3, given a series of Poisson counting process measurements {Mj(dx)}j=1J, each with intensity sj(y) ∫ℛ3pj(y|x)λ(x)dx, with pj(y|x) being the point spread of the optics focused to the jth plane and sj(y) the detection probability for detector pointy at focal depth j. A maximum a posteriori reconstruction generated by inducing a prior distribution on the space of images via Good’s three-dimensional rotationally invariant roughness penalty ∫ℛ3 [|Δλ(x)|2/λ(x)]dx. It is proven that the sequence of iterates that is generated by using the expectation maximization algorithm is monotonically increasing in posterior probability, with stable points of the iteration satisfying the necessary maximizer conditions of the maximum a posteriori solution. The algorithms were implemented on the DECmpp-SX, a 64 × 64 parallel processor, running at <2 s/(643, 3-D iteration). Results are demonstrated from simulated as well as amoebae and volvox data. We study performance comparisons of the algorithms for the missing-data problems corresponding to fast data collection for rapid motion studies in which every other focal plane is removed and for imaging with limited detector areas and efficiency.

Journal ArticleDOI
TL;DR: By this approach the finite mixture model is embedded within the general framework of generalized linear models (GLMs) and the proposed EM algorithm can be readily done in statistical packages with facilities for GLMs.
Abstract: A generalized linear finite mixture model and an EM algorithm to fit the model to data are described. By this approach the finite mixture model is embedded within the general framework of generalized linear models (GLMs). Implementation of the proposed EM algorithm can be readily done in statistical packages with facilities for GLMs. A practical example is presented where a generalized linear finite mixture model of ten Weibull distributions is adopted. The example is concerned with the flow cytometric measurement of the DNA content of spermatids in a mutant mouse, which shows non-disjunction of specific chromosomes during meiosis.

Journal ArticleDOI
TL;DR: A model that assumes a multiplicative relationship between death rates with and without tumour and a piecewise exponential model for the base-line transition rates and can be fitted with information from a single sacrifice is developed.
Abstract: SUMMARY A three-state illness-death model provides a useful way to represent data from rodent tumorigenicity experiments. Some of the earliest proposals use fully parametric models based on, for example, Weibull distributional assumptions. Recently, nonparametric versions of this model have been proposed, but these generally require large data sets with frequent interim sacrifices to yield stable estimates. As a compromise between these extremes, others have considered semiparametric models. In this paper, we develop a model that assumes a multiplicative relationship between death rates with and without tumour and a piecewise exponential model for the base-line transition rates. The model can be fitted with information from a single sacrifice. An EM algorithm provides a useful way to fit the model, since the likelihood corresponds to that from a standard piecewise exponential survival model when time to tumour onset is known. We discuss the relationship between the piecewise exponential model and other recent proposals and illustrate the method with data from two carcinogenicity studies.

Journal ArticleDOI
TL;DR: In this article, a conditional mixture, maximum likelihood method for latent class censored regression is proposed to simultaneously estimate separate regression functions and subject membership in K latent classes or groups given a censored dependent variable for a cross-section of subjects.
Abstract: The standard tobit or censored regression model is typically utilized for regression analysis when the dependent variable is censored. This model is generalized by developing a conditional mixture, maximum likelihood method for latent class censored regression. The proposed method simultaneously estimates separate regression functions and subject membership in K latent classes or groups given a censored dependent variable for a cross-section of subjects. Maximum likelihood estimates are obtained using an EM algorithm. The proposed method is illustrated via a consumer psychology application.

Journal ArticleDOI
TL;DR: In this paper, the possibilities of using sample moments to identify mixtures of multivariate normals are investigated and a particular system of moment equations is devised and then shown to be one that identifies the true mixing distribution, with some limitations.
Abstract: A longstanding difficulty in multivariate statistics is identifying and evaluating nonnormal data structures in high dimensions with high statistical efficiency and low search effort. Here the possibilities of using sample moments to identify mixtures of multivariate normals are investigated. A particular system of moment equations is devised and then shown to be one that identifies the true mixing distribution, with some limitations (indicated in the text), and thus provides consistent estimates. Moreover, the estimates are shown to be quickly calculated in any dimension and to be highly efficient in the sense of being close to the values of the parameters that maximize the likelihood function. This is shown by simulation and the application of the method to Fisher's iris data. While establishing these results, we discuss certain limitations associated with moment methods with regard to uniqueness and equivariance and explain how we addressed these problems.