Showing papers on "Expectation–maximization algorithm published in 2014"

PDF

Open Access

Reference Entry•DOI•

Maximum Likelihood Estimation

[...]

F. W. Scholz

29 Sep 2014

515 citations

Journal Article•DOI•

Robust Point Matching via Vector Field Consensus

[...]

Jiayi Ma¹, Ji Zhao², Jinwen Tian¹, Alan L. Yuille³, Zhuowen Tu⁴ - Show less +1 more•Institutions (4)

Huazhong University of Science and Technology¹, Carnegie Mellon University², University of California, Los Angeles³, University of California, San Diego⁴

01 Apr 2014-IEEE Transactions on Image Processing

TL;DR: This paper proposes an efficient algorithm, called vector field consensus, for establishing robust point correspondences between two sets of points, and suggests a two-stage strategy, where the nonparametric model is used to reduce the size of the putative set and a parametric variant of the approach to estimate the geometric parameters.

...read moreread less

Abstract: In this paper, we propose an efficient algorithm, called vector field consensus, for establishing robust point correspondences between two sets of points. Our algorithm starts by creating a set of putative correspondences which can contain a very large number of false correspondences, or outliers, in addition to a limited number of true correspondences (inliers). Next, we solve for correspondence by interpolating a vector field between the two point sets, which involves estimating a consensus of inlier points whose matching follows a nonparametric geometrical constraint. We formulate this a maximum a posteriori (MAP) estimation of a Bayesian model with hidden/latent variables indicating whether matches in the putative set are outliers or inliers. We impose nonparametric geometrical constraints on the correspondence, as a prior distribution, using Tikhonov regularizers in a reproducing kernel Hilbert space. MAP estimation is performed by the EM algorithm which by also estimating the variance of the prior model (initialized to a large value) is able to obtain good estimates very quickly (e.g., avoiding many of the local minima inherent in this formulation). We illustrate this method on data sets in 2D and 3D and demonstrate that it is robust to a very large number of outliers (even up to 90%). We also show that in the special case where there is an underlying parametric geometrical model (e.g., the epipolar line constraint) that we obtain better results than standard alternatives like RANSAC if a large number of outliers are present. This suggests a two-stage strategy, where we use our nonparametric model to reduce the size of the putative set and then apply a parametric variant of our approach to estimate the geometric parameters. Our algorithm is computationally efficient and we provide code for others to use it. In addition, our approach is general and can be applied to other problems, such as learning with a badly corrupted training data set.

...read moreread less

489 citations

Maximum Likelihood Estimation.

[...]

Thomas Brox

01 Jan 2014

TL;DR: Maximum likelihood is illustrated by replicating Daniel Treisman's (2016) paper, Russia’s Billionaires, which connects the number of billionaires in a country to its economic characteristics, and concludes that Russia has a higher number of millionaires than economic factors such as market size and tax rate predict.

...read moreread less

Abstract: In a previous lecture, we estimated the relationship between dependent and explanatory variables using linear regression. But what if a linear relationship is not an appropriate assumption for our model? One widely used alternative is maximum likelihood estimation, which involves specifying a class of distributions, indexed by unknown parameters, and then using the data to pin down these parameter values. The benefit relative to linear regression is that it allows more flexibility in the probabilistic relationships between variables. Here we illustrate maximum likelihood by replicating Daniel Treisman’s (2016) paper, Russia’s Billionaires, which connects the number of billionaires in a country to its economic characteristics. The paper concludes that Russia has a higher number of billionaires than economic factors such as market size and tax rate predict.

...read moreread less

464 citations

Journal Article•DOI•

Maximum likelihood estimation of factor models on datasets with arbitrary pattern of missing data

[...]

Marta Banbura¹, Michele Modugno²•Institutions (2)

European Central Bank¹, Université libre de Bruxelles²

01 Jan 2014-Journal of Applied Econometrics

TL;DR: The expectation maximization algorithm is modified in order to estimate the parameters of the dynamic factor model on a dataset with an arbitrary pattern of missing data and the model is extended to the case with a serially correlated idiosyncratic component.

...read moreread less

Abstract: SUMMARY In this paper we modify the expectation maximization algorithm in order to estimate the parameters of the dynamic factor model on a dataset with an arbitrary pattern of missing data. We also extend the model to the case with a serially correlated idiosyncratic component. The framework allows us to handle efficiently and in an automatic manner sets of indicators characterized by different publication delays, frequencies and sample lengths. This can be relevant, for example, for young economies for which many indicators have been compiled only recently. We evaluate the methodology in a Monte Carlo experiment and we apply it to nowcasting of the euro area gross domestic product. Copyright © 2012 John Wiley & Sons, Ltd.

...read moreread less

330 citations

Posted Content•

Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing

[...]

Yuchen Zhang¹, Xi Chen², Dengyong Zhou³, Michael I. Jordan¹•Institutions (3)

University of California, Berkeley¹, New York University², Microsoft³

15 Jun 2014-arXiv: Machine Learning

TL;DR: In this article, a two-stage efficient algorithm for multi-class crowd labeling problems is proposed, where the first stage uses the spectral method to obtain an initial estimate of parameters, and the second stage refines the estimation by optimizing the objective function of the Dawid-Skene estimator via the EM algorithm.

...read moreread less

Abstract: Crowdsourcing is a popular paradigm for effectively collecting labels at low cost. The Dawid-Skene estimator has been widely used for inferring the true labels from the noisy labels provided by non-expert crowdsourcing workers. However, since the estimator maximizes a non-convex log-likelihood function, it is hard to theoretically justify its performance. In this paper, we propose a two-stage efficient algorithm for multi-class crowd labeling problems. The first stage uses the spectral method to obtain an initial estimate of parameters. Then the second stage refines the estimation by optimizing the objective function of the Dawid-Skene estimator via the EM algorithm. We show that our algorithm achieves the optimal convergence rate up to a logarithmic factor. We conduct extensive experiments on synthetic and real datasets. Experimental results demonstrate that the proposed algorithm is comparable to the most accurate empirical approach, while outperforming several other recently proposed methods.

...read moreread less

272 citations

Journal Article•DOI•

Model-based clustering for multivariate functional data

[...]

Julien Jacques¹, Cristian Preda²•Institutions (2)

university of lille¹, French Institute for Research in Computer Science and Automation²

01 Mar 2014-Computational Statistics & Data Analysis

TL;DR: The first model-based clustering algorithm for multivariate functional data is proposed, based on the assumption of normality of the principal component scores, and it ability to take into account the dependence among curves.

...read moreread less

239 citations

Journal Article•DOI•

Finite mixtures of multivariate skew t-distributions: some recent and new results

[...]

Sharon X. Lee¹, Geoffrey J. McLachlan¹•Institutions (1)

University of Queensland¹

01 Mar 2014-Statistics and Computing

TL;DR: Comparisons are presented to illustrate the relative performance of the restricted and unrestricted models, and demonstrate the usefulness of the recently proposed methodology for the unrestricted MST mixture, by some applications to three real datasets.

...read moreread less

Abstract: Finite mixtures of multivariate skew t (MST) distributions have proven to be useful in modelling heterogeneous data with asymmetric and heavy tail behaviour. Recently, they have been exploited as an effective tool for modelling flow cytometric data. A number of algorithms for the computation of the maximum likelihood (ML) estimates for the model parameters of mixtures of MST distributions have been put forward in recent years. These implementations use various characterizations of the MST distribution, which are similar but not identical. While exact implementation of the expectation-maximization (EM) algorithm can be achieved for `restricted' characterizations of the component skew t-distributions, Monte Carlo (MC) methods have been used to fit the `unrestricted' models. In this paper, we review several recent fitting algorithms for finite mixtures of multivariate skew t-distributions, at the same time clarifying some of the connections between the various existing proposals. In particular, recent results have shown that the EM algorithm can be implemented exactly for faster computation of ML estimates for mixtures with unrestricted MST components. The gain in computational time is effected by noting that the semi-infinite integrals on the E-step of the EM algorithm can be put in the form of moments of the truncated multivariate non-central t-distribution, similar to the restricted case, which subsequently can be expressed in terms of the non-truncated form of the central t-distribution function for which fast algorithms are available. We present comparisons to illustrate the relative performance of the restricted and unrestricted models, and demonstrate the usefulness of the recently proposed methodology for the unrestricted MST mixture, by some applications to three real datasets.

...read moreread less

233 citations

Journal Article•DOI•

Remaining Useful Life Prediction of Lithium-Ion Batteries Based on the Wiener Process with Measurement Error

[...]

Shengjin Tang, Chuanqiang Yu, Xue Wang, Xiaosong Guo, Xiaosheng Si - Show less +1 more

23 Jan 2014-Energies

TL;DR: In this paper, the authors proposed a novel RUL prediction method for lithium-ion batteries based on the Wiener process with measurement error (WPME), which used the truncated normal distribution (TND) based modeling approach for the estimated degradation state and obtained an exact and closed-form RUL distribution by simultaneously considering the measurement uncertainty and the distribution of the estimated drift parameter.

...read moreread less

Abstract: Remaining useful life (RUL) prediction is central to the prognostics and health management (PHM) of lithium-ion batteries. This paper proposes a novel RUL prediction method for lithium-ion batteries based on the Wiener process with measurement error (WPME). First, we use the truncated normal distribution (TND) based modeling approach for the estimated degradation state and obtain an exact and closed-form RUL distribution by simultaneously considering the measurement uncertainty and the distribution of the estimated drift parameter. Then, the traditional maximum likelihood estimation (MLE) method for population based parameters estimation is remedied to improve the estimation efficiency. Additionally, we analyze the relationship between the classic MLE method and the combination of the Bayesian updating algorithm and the expectation maximization algorithm for the real time RUL prediction. Interestingly, it is found that the result of the combination algorithm is equal to the classic MLE method. Inspired by this observation, a heuristic algorithm for the real time parameters updating is presented. Finally, numerical examples and a case study of lithium-ion batteries are provided to substantiate the superiority of the proposed RUL prediction method.

...read moreread less

202 citations

Journal Article•DOI•

Maximum Likelihood, Profile Likelihood, and Penalized Likelihood: A Primer

[...]

Stephen R. Cole¹, Haitao Chu², Sander Greenland³•Institutions (3)

University of North Carolina at Chapel Hill¹, University of Minnesota², University of California, Los Angeles³

15 Jan 2014-American Journal of Epidemiology

TL;DR: A primer on maximum likelihood and some important extensions which have proven useful in epidemiologic research, and which reveal connections betweenmaximum likelihood and Bayesian methods.

...read moreread less

Abstract: The method of maximum likelihood is widely used in epidemiology, yet many epidemiologists receive little or no education in the conceptual underpinnings of the approach. Here we provide a primer on maximum likelihood and some important extensions which have proven useful in epidemiologic research, and which reveal connections between maximum likelihood and Bayesian methods. For a given data set and probability model, maximum likelihood finds values of the model parameters that give the observed data the highest probability. As with all inferential statistical methods, maximum likelihood is based on an assumed model and cannot account for bias sources that are not controlled by the model or the study design. Maximum likelihood is nonetheless popular, because it is computationally straightforward and intuitive and because maximum likelihood estimators have desirable large-sample properties in the (largely fictitious) case in which the model has been correctly specified. Here, we work through an example to illustrate the mechanics of maximum likelihood estimation and indicate how improvements can be made easily with commercial software. We then describe recent extensions and generalizations which are better suited to observational health research and which should arguably replace standard maximum likelihood as the default method.

...read moreread less

154 citations

Journal Article•DOI•

Mixtures of Shifted AsymmetricLaplace Distributions

[...]

Brian C. Franczak¹, Ryan P. Browne¹, Paul D. McNicholas¹•Institutions (1)

University of Guelph¹

01 Jun 2014-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work marks an important step in the non-Gaussian model-based clustering and classification direction, and a variant of the EM algorithm is developed for parameter estimation by exploiting the relationship with the generalized inverse Gaussian distribution.

...read moreread less

Abstract: A mixture of shifted asymmetric Laplace distributions is introduced and used for clustering and classification. A variant of the EM algorithm is developed for parameter estimation by exploiting the relationship with the generalized inverse Gaussian distribution. This approach is mathematically elegant and relatively computationally straightforward. Our novel mixture modelling approach is demonstrated on both simulated and real data to illustrate clustering and classification applications. In these analyses, our mixture of shifted asymmetric Laplace distributions performs favourably when compared to the popular Gaussian approach. This work, which marks an important step in the non-Gaussian model-based clustering and classification direction, concludes with discussion as well as suggestions for future work.

...read moreread less

151 citations

Journal Article•DOI•

Quasi-Maximum Likelihood Estimation of GARCH Models With Heavy-Tailed Likelihoods

[...]

Jianqing Fan¹, Lei Qi¹, Dacheng Xiu²•Institutions (2)

Princeton University¹, University of Chicago²

16 May 2014-Journal of Business & Economic Statistics

TL;DR: In this paper, the authors identify an unknown scale parameter ηf that is critical to the identification for consistency and propose a three-step quasi-maximum likelihood procedure with non-Gaussian likelihood functions.

...read moreread less

Abstract: The non-Gaussian maximum likelihood estimator is frequently used in GARCH models with the intention of capturing heavy-tailed returns. However, unless the parametric likelihood family contains the true likelihood, the estimator is inconsistent due to density misspecification. To correct this bias, we identify an unknown scale parameter ηf that is critical to the identification for consistency and propose a three-step quasi-maximum likelihood procedure with non-Gaussian likelihood functions. This novel approach is consistent and asymptotically normal under weak moment conditions. Moreover, it achieves better efficiency than the Gaussian alternative, particularly when the innovation error has heavy tails. We also summarize and compare the values of the scale parameter and the asymptotic efficiency for estimators based on different choices of likelihood functions with an increasing level of heaviness in the innovation tails. Numerical studies confirm the advantages of the proposed approach.

...read moreread less

Journal Article•DOI•

Clustering performance comparison using K-means and expectation maximization algorithms

[...]

Yong Gyu Jung¹, Min Soo Kang¹, Jun Heo•Institutions (1)

Eulji University¹

28 Nov 2014-Biotechnology & Biotechnological Equipment

TL;DR: The logistic regression analysis is applied to EM clusters and the K-means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.

...read moreread less

Abstract: Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K-means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K-means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.

...read moreread less

Journal Article•DOI•

4D image reconstruction for emission tomography

[...]

Andrew J. Reader¹, Jeroen Verhaeghe²•Institutions (2)

St Thomas' Hospital¹, University of Antwerp²

21 Nov 2014-Physics in Medicine and Biology

TL;DR: An overview of the theory of 4D image reconstruction for emission tomography is given, and maximum likelihood or maximum a posteriori (MAP) estimation of either linear or non-linear model parameters can be achieved in image space after carrying out a conventional expectation maximization update of the dynamic image series, using a Kullback-Leibler distance metric.

...read moreread less

Abstract: An overview of the theory of 4D image reconstruction for emission tomography is given along with a review of the current state of the art, covering both positron emission tomography and single photon emission computed tomography (SPECT). By viewing 4D image reconstruction as a matter of either linear or non-linear parameter estimation for a set of spatiotemporal functions chosen to approximately represent the radiotracer distribution, the areas of so-called 'fully 4D' image reconstruction and 'direct kinetic parameter estimation' are unified within a common framework. Many choices of linear and non-linear parameterization of these functions are considered (including the important case where the parameters have direct biological meaning), along with a review of the algorithms which are able to estimate these often non-linear parameters from emission tomography data. The other crucial components to image reconstruction (the objective function, the system model and the raw data format) are also covered, but in less detail due to the relatively straightforward extension from their corresponding components in conventional 3D image reconstruction. The key unifying concept is that maximum likelihood or maximum a posteriori (MAP) estimation of either linear or non-linear model parameters can be achieved in image space after carrying out a conventional expectation maximization (EM) update of the dynamic image series, using a Kullback-Leibler distance metric (comparing the modeled image values with the EM image values), to optimize the desired parameters. For MAP, an image-space penalty for regularization purposes is required. The benefits of 4D and direct reconstruction reported in the literature are reviewed, and furthermore demonstrated with simple simulation examples. It is clear that the future of reconstructing dynamic or functional emission tomography images, which often exhibit high levels of spatially correlated noise, should ideally exploit these 4D approaches.

...read moreread less

Journal Article•DOI•

Parsimonious skew mixture models for model-based clustering and classification

[...]

Irene Vrbik¹, Paul D. McNicholas¹•Institutions (1)

University of Guelph¹

01 Mar 2014-Computational Statistics & Data Analysis

TL;DR: Parsimonious skew- t and skew-normal analogues of the GPCM family that employ an eigenvalue decomposition of a scale matrix are introduced and are compared to existing models in both unsupervised and semi-supervised classification frameworks.

...read moreread less

Journal Article•DOI•

A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: application to robust clustering

[...]

Florence Forbes¹, Darren Wraith¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Nov 2014-Statistics and Computing

TL;DR: A family of multivariate heavy-tailed distributions that allow variable marginal amounts of tailweight and can account for a variety of shapes and have a simple tractable form with a closed-form probability density function whatever the dimension.

...read moreread less

Abstract: We propose a family of multivariate heavy-tailed distributions that allow variable marginal amounts of tailweight. The originality comes from introducing multidimensional instead of univariate scale variables for the mixture of scaled Gaussian family of distributions. In contrast to most existing approaches, the derived distributions can account for a variety of shapes and have a simple tractable form with a closed-form probability density function whatever the dimension. We examine a number of properties of these distributions and illustrate them in the particular case of Pearson type VII and t tails. For these latter cases, we provide maximum likelihood estimation of the parameters and illustrate their modelling flexibility on simulated and real data clustering examples.

...read moreread less

Journal Article•DOI•

Distributed Sparse Recursive Least-Squares Over Networks

[...]

Zhaoting Liu¹, Ying Liu², Chunguang Li²•Institutions (2)

Shaoxing University¹, Zhejiang University²

01 Mar 2014-IEEE Transactions on Signal Processing

TL;DR: This paper addresses the problem of in-network distributed estimation for sparse vectors, and develops several distributed sparse recursive least-squares (RLS) algorithms based on the maximum likelihood framework, and the expectation-maximization algorithm is used to numerically solve the sparse estimation problem.

...read moreread less

Abstract: Distributed estimation over networks has received much attention in recent years due to its broad applicability. Many signals in nature present high level of sparsity, which contain only a few large coefficients among many negligible ones. In this paper, we address the problem of in-network distributed estimation for sparse vectors, and develop several distributed sparse recursive least-squares (RLS) algorithms. The proposed algorithms are based on the maximum likelihood framework, and the expectation-maximization algorithm, with the aid of thresholding operators, is used to numerically solve the sparse estimation problem. To improve the estimation performance, the thresholding operators related to l0- and l1-norms with real-time self-adjustable thresholds are derived. With these thresholding operators, we can exploit the underlying sparsity to implement the distributed estimation with low computational complexity and information exchange amount among neighbors. The sparsity-promoting intensity is also adaptively adjusted so that a good performance of the sparse solution can be achieved. Both theoretical analysis and numerical simulations are presented to show the effectiveness of the proposed algorithms.

...read moreread less

Proceedings Article•

Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing

[...]

Yuchen Zhang¹, Xi Chen², Dengyong Zhou³, Michael I. Jordan¹•Institutions (3)

University of California, Berkeley¹, New York University², Microsoft³

08 Dec 2014

TL;DR: Experimental results demonstrate that the proposed algorithm for multi-class crowd labeling problems is comparable to the most accurate empirical approach, while outperforming several other recently proposed methods.

...read moreread less

Abstract: The Dawid-Skene estimator has been widely used for inferring the true labels from the noisy labels provided by non-expert crowdsourcing workers. However, since the estimator maximizes a non-convex log-likelihood function, it is hard to theoretically justify its performance. In this paper, we propose a two-stage efficient algorithm for multi-class crowd labeling problems. The first stage uses the spectral method to obtain an initial estimate of parameters. Then the second stage refines the estimation by optimizing the objective function of the Dawid-Skene estimator via the EM algorithm. We show that our algorithm achieves the optimal convergence rate up to a logarithmic factor. We conduct extensive experiments on synthetic and real datasets. Experimental results demonstrate that the proposed algorithm is comparable to the most accurate empirical approach, while outperforming several other recently proposed methods.

...read moreread less

Journal Article•DOI•

Maximum likelihood analysis of conflicting observations in social sensing

[...]

Dong Wang¹, Lance M. Kaplan², Tarek Abdelzaher¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, United States Army Research Laboratory²

31 Jan 2014-ACM Transactions on Sensor Networks

TL;DR: The first maximum likelihood solution to handle the cases where measurements from different participants may be conflicting is provided and is shown to outperform previous work used for corroborating observations, the state-of-the-art fact-finding baselines, as well as simple heuristics such as majority voting.

...read moreread less

Abstract: This article addresses the challenge of truth discovery from noisy social sensing data. The work is motivated by the emergence of social sensing as a data collection paradigm of growing interest, where humans perform sensory data collection tasks. Unlike the case with well-calibrated and well-tested infrastructure sensors, humans are less reliable, and the likelihood that participants' measurements are correct is often unknown a priori. Given a set of human participants of unknown trustworthiness together with their sensory measurements, we pose the question of whether one can use this information alone to determine, in an analytically founded manner, the probability that a given measurement is true. In our previous conference paper, we offered the first maximum likelihood solution to the aforesaid truth discovery problem for corroborating observations only. In contrast, this article extends the conference paper and provides the first maximum likelihood solution to handle the cases where measurements from different participants may be conflicting. The article focuses on binary measurements. The approach is shown to outperform our previous work used for corroborating observations, the state-of-the-art fact-finding baselines, as well as simple heuristics such as majority voting.

...read moreread less

Book Chapter•DOI•

A Generative Model for the Joint Registration of Multiple Point Sets

[...]

Georgios Evangelidis¹, Dionyssos Kounades-Bastian¹, Dionyssos Kounades-Bastian², Radu Horaud¹, Emmanouil Z. Psarakis² - Show less +1 more•Institutions (2)

French Institute for Research in Computer Science and Automation¹, University of Patras²

06 Sep 2014

TL;DR: A probabilistic generative model and its associated algorithm to jointly register multiple point sets that are realizations of a Gaussian mixture and the registration is cast into a clustering problem.

...read moreread less

Abstract: This paper describes a probabilistic generative model and its associated algorithm to jointly register multiple point sets. The vast majority of state-of-the-art registration techniques select one of the sets as the “model” and perform pairwise alignments between the other sets and this set. The main drawback of this mode of operation is that there is no guarantee that the model-set is free of noise and outliers, which contaminates the estimation of the registration parameters. Unlike previous work, the proposed method treats all the point sets on an equal footing: they are realizations of a Gaussian mixture (GMM) and the registration is cast into a clustering problem. We formally derive an EM algorithm that estimates both the GMM parameters and the rotations and translations that map each individual set onto the “central” model. The mixture means play the role of the registered set of points while the variances provide rich information about the quality of the registration. We thoroughly validate the proposed method with challenging datasets, we compare it with several state-of-the-art methods, and we show its potential for fusing real depth data.

...read moreread less

Journal Article•DOI•

Semiparametric Estimation of Gamma Processes for Deteriorating Products

[...]

Zhi-Sheng Ye¹, Min Xie², Loon Ching Tang¹, Nan Chen¹•Institutions (2)

National University of Singapore¹, City University of Hong Kong²

10 Dec 2014-Technometrics

TL;DR: In this article, the authors investigated the semiparametric inference of the simple Gamma-process model and a random effects variant, where the maximum likelihood estimates of the parameters were obtained through the EM algorithm and the bootstrap was used to construct confidence intervals.

...read moreread less

Abstract: This article investigates the semiparametric inference of the simple Gamma-process model and a random-effects variant. Maximum likelihood estimates of the parameters are obtained through the EM algorithm. The bootstrap is used to construct confidence intervals. A simulation study reveals that an estimation based on the full likelihood method is more efficient than the pseudo likelihood method. In addition, a score test is developed to examine the existence of random effects under the semiparametric scenario. A comparison study using a fatigue-crack growth dataset shows that performance of a semiparametric estimation is comparable to the parametric counterpart. This article has supplementary material online.

...read moreread less

Journal Article•DOI•

Convergence and Stability of Iteratively Re-weighted Least Squares Algorithms

[...]

Demba Ba¹, Behtash Babadi¹, Patrick L. Purdon¹, Emery N. Brown¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2014-IEEE Transactions on Signal Processing

TL;DR: A one-to-one correspondence between the IRLS algorithms and a class of Expectation-Maximization algorithms for constrained maximum likelihood estimation under a Gaussian scale mixture (GSM) distribution is demonstrated.

...read moreread less

Abstract: In this paper, we study the theoretical properties of iteratively re-weighted least squares (IRLS) algorithms and their utility in sparse signal recovery in the presence of noise. We demonstrate a one-to-one correspondence between the IRLS algorithms and a class of Expectation-Maximization (EM) algorithms for constrained maximum likelihood estimation under a Gaussian scale mixture (GSM) distribution. The EM formalism, as well as the connection to GSMs, allow us to establish that the IRLS algorithms minimize smooth versions of the lν `norms', for . We leverage EM theory to show that the limit points of the sequence of IRLS iterates are stationary points of the smooth lν “norm” minimization problem on the constraint set. We employ techniques from Compressive Sampling (CS) theory to show that the IRLS algorithm is stable, if the limit point of the iterates coincides with the global minimizer. We further characterize the convergence rate of the IRLS algorithm, which implies global linear convergence for ν = 1 and local super-linear convergence for . We demonstrate our results via simulation experiments. The simplicity of IRLS, along with the theoretical guarantees provided in this contribution, make a compelling case for its adoption as a standard tool for sparse signal recovery.

...read moreread less

Book•

Robust Cluster Analysis and Variable Selection

[...]

Gunter Ritter

02 Sep 2014

TL;DR: In this paper, a mixture and classification models and their likelihood estimators are used to measure the robustness of the MAP criterion with respect to the number of components and outliers.

...read moreread less

Abstract: Introduction Mixture and classification models and their likelihood estimators General consistency and asymptotic normality Local likelihood estimates Maximum likelihood estimates Notes Mixture models and their likelihood estimators Latent distributions Finite mixture models Identifiable mixture models Asymptotic properties of local likelihood maxima Asymptotic properties of the MLE: constrained nonparametric mixture models Asymptotic properties of the MLE: constrained parametric mixture models Notes Classification models and their criteria Probabilistic criteria for general populations Admissibility and size constraints Steady partitions Elliptical models Normal models Geometric considerations Consistency of the MAP criterion Notes Robustification by trimming Outliers and measures of robustness Outliers The sensitivities Sensitivity of ML estimates of mixture models Breakdown points Trimming the mixture model Trimmed likelihood function of the mixture model Normal components Universal breakdown points of covariance matrices, mixing rates, and means Restricted breakdown point of mixing rates and means Notes Trimming the classification model - the TDC Trimmed MAP classification model Normal case - the Trimmed Determinant Criterion, TDC Breakdown robustness of the constrained TDC Universal breakdown point of covariance matrices and means Restricted breakdown point of the means Notes Algorithms EM algorithm for mixtures General mixtures Normal mixtures Mixtures of multivariate t-distributions Trimming - the EMT algorithm Order of Convergence Acceleration of the mixture EM Notes k-Parameters algorithms General and elliptically symmetric models Steady solutions and trimming Using combinatorial optimization Overall algorithms Notes Hierarchical methods for initial solutions Favorite solutions and cluster validation Scale balance and Pareto solutions Number of components of uncontaminated data Likelihood-ratio tests Using cluster criteria as test statistics Model selection criteria Ridgeline manifold Number of components and outliers Classification trimmed likelihood curves Trimmed BIC Adjusted BIC Cluster validation Separation indices Normality and related tests Visualization Measures of agreement of partitions Stability Notes Variable selection in clustering Irrelevance Definition and general properties The normal case Filters Univariate filters Multivariate filters Wrappers Using the likelihood ratio test Using Bayes factors and their BIC approximations Maximum likelihood subset selection Consistency of the MAP cluster criterion with variable selection Practical guidelines Notes Applications Miscellaneous data sets IRIS data SWISS BILLS STONE FLAKES Gene expression data Supervised and unsupervised methods Combining gene selection and profile clustering Application to the LEUKEMIA data Notes Appendix A: Geometry and linear algebra Appendix B: Topology Appendix C: Analysis Appendix D: Measures and probabilities Appendix E: Probability Appendix F: Statistics Appendix G: Optimization

...read moreread less

Journal Article•DOI•

Bayesian Estimation of the von-Mises Fisher Mixture Model with Variational Inference

[...]

Jalil Taghia¹, Zhanyu Ma², Arne Leijon¹•Institutions (2)

Royal Institute of Technology¹, Beijing University of Posts and Telecommunications²

01 Sep 2014-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The proposed algorithm can potentially determine the model complexity and avoid the over-fitting problem associated with conventional approaches based on the expectation maximization, and derive an analytically tractable approximation to the predictive density of the Bayesian mixture model of vMF distributions.

...read moreread less

Abstract: This paper addresses the Bayesian estimation of the von-Mises Fisher (vMF) mixture model with variational inference (VI). The learning task in VI consists of optimization of the variational posterior distribution. However, the exact solution by VI does not lead to an analytically tractable solution due to the evaluation of intractable moments involving functional forms of the Bessel function in their arguments. To derive a closed-form solution, we further lower bound the evidence lower bound where the bound is tight at one point in the parameter distribution. While having the value of the bound guaranteed to increase during maximization, we derive an analytically tractable approximation to the posterior distribution which has the same functional form as the assigned prior distribution. The proposed algorithm requires no iterative numerical calculation in the re-estimation procedure, and it can potentially determine the model complexity and avoid the over-fitting problem associated with conventional approaches based on the expectation maximization. Moreover, we derive an analytically tractable approximation to the predictive density of the Bayesian mixture model of vMF distributions. The performance of the proposed approach is verified by experiments with both synthetic and real data.

...read moreread less

Journal Article•DOI•

Model-based clustering via linear cluster-weighted models

[...]

Salvatore Ingrassia¹, Simona C. Minotti², Antonio Punzo¹•Institutions (2)

University of Catania¹, University of Milano-Bicocca²

01 Mar 2014-Computational Statistics & Data Analysis

TL;DR: A novel family of twelve mixture models with random covariates nested in the linear t cluster-weighted model (CWM), is introduced for model-based clustering and provides a unified framework that also includes the linear Gaussian CWM as a special case.

...read moreread less

Posted Content•

A Multi-threshold Segmentation Approach Based on Artificial Bee Colony Optimization

[...]

Erik Cuevas, Felipe Sencion, Daniel Zaldivar, Marco Perez, Humberto Sossa - Show less +1 more

28 May 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: Experimental results over multiple images with different range of complexity validate the efficiency of the proposed technique with regard to segmentation accuracy, speed, and robustness and demonstrate a better performance from the proposed algorithm.

...read moreread less

Abstract: This paper explores the use of the Artificial Bee Colony (ABC) algorithm to compute threshold selection for image segmentation. ABC is a heuristic algorithm motivated by the intelligent behavior of honey-bees which has been successfully employed to solve complex optimization problems. In this approach, an image 1D histogram is approximated through a Gaussian mixture model whose parameters are calculated by the ABC algorithm. For the approximation scheme, each Gaussian function represents a pixel class and therefore a threshold. Unlike the Expectation Maximization (EM) algorithm, the ABC based method shows fast convergence and low sensitivity to initial conditions. Remarkably, it also improves complex time consuming computations commonly required by gradient-based methods. Experimental results demonstrate the algorithms ability to perform automatic multi threshold selection yet showing interesting advantages by comparison to other well known algorithms.

...read moreread less

Journal Article•DOI•

Robust mixture regression model fitting by Laplace distribution

[...]

Weixing Song¹, Weixin Yao¹, Y. Xing¹•Institutions (1)

Kansas State University¹

01 Mar 2014-Computational Statistics & Data Analysis

TL;DR: A robust estimation procedure for mixture linear regression models is proposed by assuming that the error terms follow a Laplace distribution, implemented by an EM algorithm which incorporates two types of missing information from the mixture class membership and the latent variable.

...read moreread less

Journal Article•DOI•

Robust mixture regression using the t-distribution

[...]

Weixin Yao¹, Yan Wei¹, Chun Yu¹•Institutions (1)

Kansas State University¹

01 Mar 2014-Computational Statistics & Data Analysis

TL;DR: A modified version of the proposed method, which fits the mixture regression based on the t-distribution to the data after adaptively trimming high leverage points, has high efficiency due to the adaptive choice of degrees of freedom.

...read moreread less

Proceedings Article•

Alternating Minimization for Mixed Linear Regression

[...]

Xinyang Yi¹, Constantine Caramanis¹, Sujay Sanghavi¹•Institutions (1)

University of Texas at Austin¹

21 Jun 2014

TL;DR: This paper provides a new initialization procedure for EM, based on finding the leading two eigenvectors of an appropriate matrix, and shows that a re-sampled version of the EM algorithm provably converges to the correct vectors, under natural assumptions on the sampling distribution, and with nearly optimal sample complexity.

...read moreread less

Abstract: Mixed linear regression involves the recovery of two (or more) unknown vectors from unlabeled linear measurements; that is, where each sample comes from exactly one of the vectors, but we do not know which one. It is a classic problem, and the natural and empirically most popular approach to its solution has been the EM algorithm. As in other settings, this is prone to bad local minima; however, each iteration is very fast (alternating between guessing labels, and solving with those labels). In this paper we provide a new initialization procedure for EM, based on finding the leading two eigenvectors of an appropriate matrix. We then show that with this, a re-sampled version of the EM algorithm provably converges to the correct vectors, under natural assumptions on the sampling distribution, and with nearly optimal (unimprovable) sample complexity. This provides not only the first characterization of EM's performance, but also much lower sample complexity as compared to both standard (randomly initialized) EM, and other methods for this problem.

...read moreread less

Journal Article•DOI•

Maximum likelihood estimation of the Markov-switching GARCH model

[...]

Maciej Augustyniak¹•Institutions (1)

Université de Montréal¹

01 Aug 2014-Computational Statistics & Data Analysis

TL;DR: A novel approach is developed based on both the Monte Carlo expectation-maximization algorithm and importance sampling to calculate the maximum likelihood estimator and asymptotic variance-covariance matrix of the Markov-switching GARCH model.

...read moreread less

Journal Article•DOI•

Latent Markov models: a review of a general framework for the analysis of longitudinal data with covariates

[...]

Francesco Bartolucci¹, Alessio Farcomeni², Fulvia Pennoni³•Institutions (3)

University of Perugia¹, Sapienza University of Rome², University of Milano-Bicocca³

21 Aug 2014-Test

TL;DR: A comprehensive overview of latent Markov (LM) models for the analysis of longitudinal categorical data is provided and methods for selecting the number of states and for path prediction are outlined.

...read moreread less

Abstract: We provide a comprehensive overview of latent Markov (LM) models for the analysis of longitudinal categorical data. We illustrate the general version of the LM model which includes individual covariates, and several constrained versions. Constraints make the model more parsimonious and allow us to consider and test hypotheses of interest. These constraints may be put on the conditional distribution of the response variables given the latent process (measurement model) or on the distribution of the latent process (latent model). We also illustrate in detail maximum likelihood estimation through the Expectation–Maximization algorithm, which may be efficiently implemented by recursions taken from the hidden Markov literature. We outline methods for obtaining standard errors for the parameter estimates. We also illustrate methods for selecting the number of states and for path prediction. Finally, we mention issues related to Bayesian inference of LM models. Possibilities for further developments are given among the concluding remarks.

...read moreread less

Collapse