scispace - formally typeset
Search or ask a question

Showing papers on "Expectation–maximization algorithm published in 2016"


Journal ArticleDOI
TL;DR: This updated version of mclust adds new covariance structures, dimension reduction capabilities for visualisation, model selection criteria, initialisation strategies for the EM algorithm, and bootstrap-based inference, making it a full-featured R package for data analysis via finite mixture modelling.
Abstract: Finite mixture models are being used increasingly to model a wide variety of random phenomena for clustering, classification and density estimation. mclust is a powerful and popular package which allows modelling of data as a Gaussian finite mixture with different covariance structures and different numbers of mixture components, for a variety of purposes of analysis. Recently, version 5 of the package has been made available on CRAN. This updated version adds new covariance structures, dimension reduction capabilities for visualisation, model selection criteria, initialisation strategies for the EM algorithm, and bootstrap-based inference, making it a full-featured R package for data analysis via finite mixture modelling.

1,841 citations


Journal ArticleDOI
TL;DR: This article proposes a framework where deep neural networks are used to model the source spectra and combined with the classical multichannel Gaussian model to exploit the spatial information and presents its application to a speech enhancement problem.
Abstract: This article addresses the problem of multichannel audio source separation We propose a framework where deep neural networks (DNNs) are used to model the source spectra and combined with the classical multichannel Gaussian model to exploit the spatial information The parameters are estimated in an iterative expectation-maximization (EM) fashion and used to derive a multichannel Wiener filter We present an extensive experimental study to show the impact of different design choices on the performance of the proposed technique We consider different cost functions for the training of DNNs, namely the probabilistically motivated Itakura–Saito divergence, and also Kullback–Leibler, Cauchy, mean squared error, and phase-sensitive cost functions We also study the number of EM iterations and the use of multiple DNNs, where each DNN aims to improve the spectra estimated by the preceding EM iteration Finally, we present its application to a speech enhancement problem The experimental results show the benefit of the proposed multichannel approach over a single-channel DNN-based approach and the conventional multichannel nonnegative matrix factorization-based iterative EM algorithm

304 citations


Journal ArticleDOI
TL;DR: A novel method for accurate classification of cardiac arrhythmias using Morphological and statistical features of individual heartbeats are used to train a classifier and a significant improvement of accuracy compared to other methods is proposed.

148 citations


Journal ArticleDOI
TL;DR: The Noisy CNN algorithm speeds training on average because the backpropagation algorithm is a special case of the generalized expectation-maximization (EM) algorithm and because such carefully chosen noise always speeds up the EM algorithm on average.

118 citations


Proceedings Article
01 Jan 2016
TL;DR: It is established that a first-order variant of EM will not converge to strict saddle points almost surely, indicating that the poor performance of the first- order method can be attributed to the existence of bad local maxima rather than bad saddle points.
Abstract: We provide two fundamental results on the population (infinite-sample) likelihood function of Gaussian mixture models with $M \geq 3$ components. Our first main result shows that the population likelihood function has bad local maxima even in the special case of equally-weighted mixtures of well-separated and spherical Gaussians. We prove that the log-likelihood value of these bad local maxima can be arbitrarily worse than that of any global optimum, thereby resolving an open question of Srebro (2007). Our second main result shows that the EM algorithm (or a first-order variant of it) with random initialization will converge to bad critical points with probability at least $1-e^{-\Omega(M)}$. We further establish that a first-order variant of EM will not converge to strict saddle points almost surely, indicating that the poor performance of the first-order method can be attributed to the existence of bad local maxima rather than bad saddle points. Overall, our results highlight the necessity of careful initialization when using the EM algorithm in practice, even when applied in highly favorable settings.

107 citations


Journal ArticleDOI
22 Mar 2016-PLOS ONE
TL;DR: This work introduces the Expectation-Maximization binary Clustering (EMbC), a general purpose, unsupervised approach to multivariate data clustering, and focuses on the suitability of the EMbC algorithm for behavioural annotation of movement data.
Abstract: The growing capacity to process and store animal tracks has spurred the development of new methods to segment animal trajectories into elementary units of movement. Key challenges for movement trajectory segmentation are to (i) minimize the need of supervision, (ii) reduce computational costs, (iii) minimize the need of prior assumptions (e.g. simple parametrizations), and (iv) capture biologically meaningful semantics, useful across a broad range of species. We introduce the Expectation-Maximization binary Clustering (EMbC), a general purpose, unsupervised approach to multivariate data clustering. The EMbC is a variant of the Expectation-Maximization Clustering (EMC), a clustering algorithm based on the maximum likelihood estimation of a Gaussian mixture model. This is an iterative algorithm with a closed form step solution and hence a reasonable computational cost. The method looks for a good compromise between statistical soundness and ease and generality of use (by minimizing prior assumptions and favouring the semantic interpretation of the final clustering). Here we focus on the suitability of the EMbC algorithm for behavioural annotation of movement data. We show and discuss the EMbC outputs in both simulated trajectories and empirical movement trajectories including different species and different tracking methodologies. We use synthetic trajectories to assess the performance of EMbC compared to classic EMC and Hidden Markov Models. Empirical trajectories allow us to explore the robustness of the EMbC to data loss and data inaccuracies, and assess the relationship between EMbC output and expert label assignments. Additionally, we suggest a smoothing procedure to account for temporal correlations among labels, and a proper visualization of the output for movement trajectories. Our algorithm is available as an R-package with a set of complementary functions to ease the analysis.

107 citations


Journal ArticleDOI
TL;DR: Lee and McLachlan as mentioned in this paper introduced a finite mixture of canonical fundamental skew $$t$$t (CFUST) distributions for a model-based approach to clustering where the clusters are asymmetric and possibly long-tailed.
Abstract: This paper introduces a finite mixture of canonical fundamental skew $$t$$t (CFUST) distributions for a model-based approach to clustering where the clusters are asymmetric and possibly long-tailed (in: Lee and McLachlan, arXiv:1401.8182 [statME], 2014b). The family of CFUST distributions includes the restricted multivariate skew $$t$$t and unrestricted multivariate skew $$t$$t distributions as special cases. In recent years, a few versions of the multivariate skew $$t$$t (MST) mixture model have been put forward, together with various EM-type algorithms for parameter estimation. These formulations adopted either a restricted or unrestricted characterization for their MST densities. In this paper, we examine a natural generalization of these developments, employing the CFUST distribution as the parametric family for the component distributions, and point out that the restricted and unrestricted characterizations can be unified under this general formulation. We show that an exact implementation of the EM algorithm can be achieved for the CFUST distribution and mixtures of this distribution, and present some new analytical results for a conditional expectation involved in the E-step.

105 citations


Posted Content
TL;DR: In this paper, the authors discuss the usefulness of simulation techniques in inference procedures, like maximum likelihood method, generalized moments method or pseudo maximum likelihood methods, from the point of view of consistency and asymptotic normality.
Abstract: In this paper we discuss the usefulness, for models with heterogeneity, of simulation techniques in inference procedures, like maximum likelihood method, generalized moments method or pseudo maximum likelihood methods. These proce dures are studied from the point of view of consistency, asymptotic normality, convergence rates and possible asymptotic bias. We carefully distinguish the case where the simulations are different for all the observations from the case where they are identical. Inference fondee sur des simulations dans des modeles avec heterogeneite

105 citations


Journal ArticleDOI
Sehyun Tak1, Soomin Woo1, Hwasoo Yeo1
TL;DR: Results and analysis of comparison of the proposed method to others such as nearest historical data and expectation maximization by varying missing data type, missing ratio, traffic state, and day type show that the proposed algorithm achieves better performance in almost all of the missing types, missing ratios, day types, and traffic states.
Abstract: Missing data imputation is a critical step in data processing for intelligent transportation systems. This paper proposes a data-driven imputation method for sections of road based on their spatial and temporal correlation using a modified $k$ - nearest neighbor method. This computing-distributable imputation method is different from the conventional algorithms in the fact that it attempts to impute missing data of a section with multiple sensors that have correlation to each other, at once. This increases computational efficiency greatly compared with other methods, whose imputation subject is individual sensors. In addition, the geometrical property of each section is conserved; in other words, the continuation of traffic properties that each sensor captures is conserved, therefore increasing accuracy of imputation. This paper shows results and analysis of comparison of the proposed method to others such as nearest historical data and expectation maximization by varying missing data type, missing ratio, traffic state, and day type. The results show that the proposed algorithm achieves better performance in almost all of the missing types, missing ratios, day types, and traffic states. When the missing data type cannot be identified or various missing types are mixed, the proposed algorithm shows accurate and stable imputation performance.

101 citations


Journal ArticleDOI
TL;DR: This work formulate the effects of potentially time-dependent covariates on the interval-censored failure time through a broad class of semiparametric transformation models that encompasses proportional hazards and proportional odds models, and devise an EM-type algorithm that converges stably, even in the presence of time- dependent covariates.
Abstract: Interval censoring arises frequently in clinical, epidemiological, financial and sociological studies, where the event or failure of interest is known only to occur within an interval induced by periodic monitoring. We formulate the effects of potentially time-dependent covariates on the interval-censored failure time through a broad class of semiparametric transformation models that encompasses proportional hazards and proportional odds models. We consider nonparametric maximum likelihood estimation for this class of models with an arbitrary number of monitoring times for each subject. We devise an EM-type algorithm that converges stably, even in the presence of time-dependent covariates, and show that the estimators for the regression parameters are consistent, asymptotically normal, and asymptotically efficient with an easily estimated covariance matrix. Finally, we demonstrate the performance of our procedures through simulation studies and application to an HIV/AIDS study conducted in Thailand.

98 citations


Proceedings Article
01 Jan 2016
TL;DR: In this article, a global analysis of EM for specific models in which the observations comprise an i.i.d. sample from a mixture of two Gaussians is provided.
Abstract: Expectation Maximization (EM) is among the most popular algorithms for estimating parameters of statistical models. However, EM, which is an iterative algorithm based on the maximum likelihood principle, is generally only guaranteed to find stationary points of the likelihood objective, and these points may be far from any maximizer. This article addresses this disconnect between the statistical principles behind EM and its algorithmic properties. Specifically, it provides a global analysis of EM for specific models in which the observations comprise an i.i.d. sample from a mixture of two Gaussians. This is achieved by (i) studying the sequence of parameters from idealized execution of EM in the infinite sample limit, and fully characterizing the limit points of the sequence in terms of the initial parameters; and then (ii) based on this convergence analysis, establishing statistical consistency (or lack thereof) for the actual sequence of parameters produced by EM.

01 Jan 2016
TL;DR: In this article, the authors studied distribu- tions which are generated from exponential families by loss of information due to the fact that only some function of the exponential family variable is observable.
Abstract: In this paper we study such classes of distribu- tions which are generated from exponential families by loss of information due to the fact that only some function of the exponential family variable is observable. Examples of such classes are mixtures and convolutions of exponential type distributions as well as grouped, censored and folded distribu- tions. Their common structure is analysed. The existence is demonstrated of a n1/2-consistent, asymptotically normally distributed and asymptotically efficient root of the likelihood equation which asymptotically maximizes the likelihood in every compact subset of the parameter space, imposing only the natural requirement that the information matrix is positive definite. It is further shown that even the weaker requirement of local parameter identifiability, which admits of application to non-regular cases, is sufficient for the existence of con- sistent maximum likelihood estimates. Finally the subject of large sample tests based on maximum likelihood estimates is touched upon.

Journal ArticleDOI
TL;DR: A new mixture model that associates a weight with each observed point and derives two EM algorithms that consider a fixed weight for each observation and a random variable following a gamma distribution is introduced.
Abstract: Data clustering has received a lot of attention and numerous methods, algorithms and software packages are available. Among these techniques, parametric finite-mixture models play a central role due to their interesting mathematical properties and to the existence of maximum-likelihood estimators based on expectation-maximization (EM). In this paper we propose a new mixture model that associates a weight with each observed point. We introduce the weighted-data Gaussian mixture and we derive two EM algorithms. The first one considers a fixed weight for each observation. The second one treats each weight as a random variable following a gamma distribution. We propose a model selection method based on a minimum message length criterion, provide a weight initialization strategy, and validate the proposed algorithms by comparing them with several state of the art parametric and non-parametric clustering techniques. We also demonstrate the effectiveness and robustness of the proposed clustering technique in the presence of heterogeneous data, namely audio-visual scene analysis.

Journal ArticleDOI
Xiangyong Cao1, Qian Zhao1, Deyu Meng1, Yang Chen1, Zongben Xu1 
TL;DR: In this article, a new low rank matrix factorization (LRMF) model was proposed by assuming noise as mixture of exponential power (MoEP) distributions and then proposed a penalized MoEP (PMoEP) model by combining the penalized likelihood method with MoEP distributions.
Abstract: Many computer vision problems can be posed as learning a low-dimensional subspace from high-dimensional data. The low rank matrix factorization (LRMF) represents a commonly utilized subspace learning strategy. Most of the current LRMF techniques are constructed on the optimization problems using $L_{1}$ -norm and $L_{2}$ -norm losses, which mainly deal with the Laplace and Gaussian noises, respectively. To make LRMF capable of adapting more complex noise, this paper proposes a new LRMF model by assuming noise as mixture of exponential power (MoEP) distributions and then proposes a penalized MoEP (PMoEP) model by combining the penalized likelihood method with MoEP distributions. Such setting facilitates the learned LRMF model capable of automatically fitting the real noise through MoEP distributions. Each component in this mixture distribution is adapted from a series of preliminary super- or sub-Gaussian candidates. Moreover, by facilitating the local continuity of noise components, we embed Markov random field into the PMoEP model and then propose the PMoEP-MRF model. A generalized expectation maximization (GEM) algorithm and a variational GEM algorithm are designed to infer all parameters involved in the proposed PMoEP and the PMoEP-MRF model, respectively. The superiority of our methods is demonstrated by extensive experiments on synthetic data, face modeling, hyperspectral image denoising, and background subtraction.

Journal ArticleDOI
TL;DR: In this paper, a mixture of multivariate contaminated normal distributions is developed for model-based clustering, where each cluster has a parameter controlling the proportion of mild outliers and one specifying the degree of contamination.
Abstract: A mixture of multivariate contaminated normal distributions is developed for model-based clustering. In addition to the parameters of the classical normal mixture, our contaminated mixture has, for each cluster, a parameter controlling the proportion of mild outliers and one specifying the degree of contamination. Crucially, these parameters do not have to be specified a priori, adding a flexibility to our approach. Parsimony is introduced via eigen-decomposition of the component covariance matrices, and sufficient conditions for the identifiability of all the members of the resulting family are provided. An expectation-conditional maximization algorithm is outlined for parameter estimation and various implementation issues are discussed. Using a large-scale simulation study, the behavior of the proposed approach is investigated and comparison with well-established finite mixtures is provided. The performance of this novel family of models is also illustrated on artificial and real data.

Journal ArticleDOI
TL;DR: A novel expectation-maximization (EM) algorithm is developed for finding the maximum likelihood estimates of the parameters in the proportional hazards model, which uses a monotone spline representation to approximate the unknown nondecreasing cumulative baseline hazard function.
Abstract: The proportional hazards model (PH) is currently the most popular regression model for analyzing time-to-event data. Despite its popularity, the analysis of interval-censored data under the PH model can be challenging using many available techniques. This article presents a new method for analyzing interval-censored data under the PH model. The proposed approach uses a monotone spline representation to approximate the unknown nondecreasing cumulative baseline hazard function. Formulating the PH model in this fashion results in a finite number of parameters to estimate while maintaining substantial modeling flexibility. A novel expectation-maximization (EM) algorithm is developed for finding the maximum likelihood estimates of the parameters. The derivation of the EM algorithm relies on a two-stage data augmentation involving latent Poisson random variables. The resulting algorithm is easy to implement, robust to initialization, enjoys quick convergence, and provides closed-form variance estimates. The performance of the proposed regression methodology is evaluated through a simulation study, and is further illustrated using data from a large population-based randomized trial designed and sponsored by the United States National Cancer Institute.

Journal ArticleDOI
TL;DR: The kernel method is extended, to incorporate anatomical side information into the PET reconstruction model and results in reduced noise at a matched contrast level compared with the conventional ML expectation maximization algorithm.
Abstract: This paper extends the kernel method that was proposed previously for dynamic PET reconstruction, to incorporate anatomical side information into the PET reconstruction model. In contrast to existing methods that incorporate anatomical information using a penalized likelihood framework, the proposed method incorporates this information in the simpler maximum likelihood (ML) formulation and is amenable to ordered subsets. The new method also does not require any segmentation of the anatomical image to obtain edge information. We compare the kernel method with the Bowsher method for anatomically-aided PET image reconstruction through a simulated data set. Computer simulations demonstrate that the kernel method offers advantages over the Bowsher method in region of interest quantification. Additionally the kernel method is applied to a 3D patient data set. The kernel method results in reduced noise at a matched contrast level compared with the conventional ML expectation maximization algorithm.

Journal ArticleDOI
TL;DR: An alternative approach for flexible modeling of heavy tailed, skewed insurance loss data exhibiting multimodality, such as the well-known data set on Danish Fire losses, based on finite mixture models of univariate distributions where all K components of the mixture are assumed to be from the same parametric family.
Abstract: In this paper, we propose an alternative approach for flexible modeling of heavy tailed, skewed insurance loss data exhibiting multimodality, such as the well-known data set on Danish Fire losses. Our approach is based on finite mixture models of univariate distributions where all K components of the mixture are assumed to be from the same parametric family. Six models are developed with components from parametric, non-Gaussian families of distributions previously used in actuarial modeling: Burr, Gamma, Inverse Burr, Inverse Gaussian, Log-normal, and Weibull. Some of these component distributions are already alone suitable to model data with heavy tails, but do not cover the case of multimodality. Estimation of the models with a fixed number of components K is proposed based on the EM algorithm using three different initialization strategies: distance-based, k -means, and random initialization. Model selection is possible using information criteria, and the fitted models can be used to estimate risk measures for the data, such as VaR and TVaR. The results of the mixture models are compared to the composite Weibull models considered in recent literature as the best models for modeling Danish Fire insurance losses. The results of this paper provide new valuable tools in the area of insurance loss modeling and risk evaluation.

Journal Article
TL;DR: In this article, a two-stage efficient algorithm for multi-class crowd labeling problems is proposed, where the first stage uses the spectral method to obtain an initial estimate of parameters, and the second stage refines the estimation by optimizing the objective function of the Dawid-Skene estimator via the EM algorithm.
Abstract: Crowdsourcing is a popular paradigm for effectively collecting labels at low cost. The Dawid-Skene estimator has been widely used for inferring the true labels from the noisy labels provided by non-expert crowdsourcing workers. However, since the estimator maximizes a non-convex log-likelihood function, it is hard to theoretically justify its performance. In this paper, we propose a two-stage efficient algorithm for multi-class crowd labeling problems. The first stage uses the spectral method to obtain an initial estimate of parameters. Then the second stage refines the estimation by optimizing the objective function of the Dawid-Skene estimator via the EM algorithm. We show that our algorithm achieves the optimal convergence rate up to a logarithmic factor. We conduct extensive experiments on synthetic and real datasets. Experimental results demonstrate that the proposed algorithm is comparable to the most accurate empirical approach, while outperforming several other recently proposed methods.

Journal ArticleDOI
TL;DR: This paper considers the competing cause scenario and assuming the time-to-event to follow the Weibull distribution, and derives the necessary steps of the expectation maximization algorithm for estimating the parameters of different cure rate survival models.
Abstract: Recently, a flexible cure rate survival model has been developed by assuming the number of competing causes of the event of interest to follow the Conway-Maxwell-Poisson distribution. This model includes some of the well-known cure rate models discussed in the literature as special cases. Data obtained from cancer clinical trials are often right censored and expectation maximization algorithm can be used in this case to efficiently estimate the model parameters based on right censored data. In this paper, we consider the competing cause scenario and assuming the time-to-event to follow the Weibull distribution, we derive the necessary steps of the expectation maximization algorithm for estimating the parameters of different cure rate survival models. The standard errors of the maximum likelihood estimates are obtained by inverting the observed information matrix. The method of inference developed here is examined by means of an extensive Monte Carlo simulation study. Finally, we illustrate the proposed methodology with a real data on cancer recurrence.

Proceedings ArticleDOI
01 Jun 2016
TL;DR: This paper introduces a method for constructing compact generative representations of PCD at multiple levels of detail, and explicitly enforce sparsity among points and mixtures, leading to a highly parallel hierarchical Expectation Maximization (EM) algorithm well-suited for the GPU and real-time execution.
Abstract: Finding meaningful, structured representations of 3D point cloud data (PCD) has become a core task for spatial perception applications. In this paper we introduce a method for constructing compact generative representations of PCD at multiple levels of detail. As opposed to deterministic structures such as voxel grids or octrees, we propose probabilistic subdivisions of the data through local mixture modeling, and show how these subdivisions can provide a maximum likelihood segmentation of the data. The final representation is hierarchical, compact, parametric, and statistically derived, facilitating run-time occupancy calculations through stochastic sampling. Unlike traditional deterministic spatial subdivision methods, our technique enables dynamic creation of voxel grids according the application's best needs. In contrast to other generative models for PCD, we explicitly enforce sparsity among points and mixtures, a technique which we call expectation sparsification. This leads to a highly parallel hierarchical Expectation Maximization (EM) algorithm well-suited for the GPU and real-time execution. We explore the trade-offs between model fidelity and model size at various levels of detail, our tests showing favorable performance when compared to octree and NDT-based methods.

Journal ArticleDOI
TL;DR: The virtual loop method is employed to improve the quality of video-based vehicle counting method and can improve the vehicle segmentation result and the vehicle occlusion detection.

Journal ArticleDOI
TL;DR: New multivariate generalized Birnbaum-Saunders regression models are proposed and derived using the maximum likelihood method and the EM algorithm to estimate their parameters and are illustrated with real-world multivariate fatigue data.
Abstract: Univariate Birnbaum-Saunders models have been widely applied to fatigue studies. Calculation of fatigue life is of great importance in determining the reliability of materials. We propose and derive new multivariate generalized Birnbaum-Saunders regression models. We use the maximum likelihood method and the EM algorithm to estimate their parameters. We carry out a simulation study to evaluate the performance of the corresponding maximum likelihood estimators. We illustrate the new models with real-world multivariate fatigue data.

Proceedings ArticleDOI
16 May 2016
TL;DR: The proposed method is based upon Gaussian Mixture Models (GMM) that are learned using an incremental Expectation Maximization clustering algorithm trained online using exemplars provided by a slow, conventional kinematic-based collision detection routine.
Abstract: This paper presents a new approach for fast collision detection in high dimensional configuration spaces for Rapidly-exploring Random Trees (RRT) motion planning The proposed method is based upon Gaussian Mixture Models (GMM) that are learned using an incremental Expectation Maximization clustering algorithm trained online using exemplars provided by a slow, conventional kinematic-based collision detection routine The number of collision checks needed can be drastically reduced using a biased random sampling from the learned GMM distribution, and the learned models are continually refined and improved as the RRT planning algorithm proceeds Our proposed method is demonstrated on several example applications and experimental results show marked improvement in computational efficiency over previous approaches

Journal ArticleDOI
TL;DR: A noise resilient probabilistic model for active learning of a Gaussian process classifier from crowds, i.e., a set of noisy labelers, that explicitly models both the overall label noise and the expertise level of each individual labeler with two levels of flip models is presented.
Abstract: We present a noise resilient probabilistic model for active learning of a Gaussian process classifier from crowds, i.e., a set of noisy labelers. It explicitly models both the overall label noise and the expertise level of each individual labeler with two levels of flip models. Expectation propagation is adopted for efficient approximate Bayesian inference of our probabilistic model for classification, based on which, a generalized EM algorithm is derived to estimate both the global label noise and the expertise of each individual labeler. The probabilistic nature of our model immediately allows the adoption of the prediction entropy for active selection of data samples to be labeled, and active selection of high quality labelers based on their estimated expertise to label the data. We apply the proposed model for four visual recognition tasks, i.e., object category recognition, multi-modal activity recognition, gender recognition, and fine-grained classification, on four datasets with real crowd-sourced labels from the Amazon Mechanical Turk. The experiments clearly demonstrate the efficacy of the proposed model. In addition, we extend the proposed model with the Predictive Active Set Selection Method to speed up the active learning system, whose efficacy is verified by conducting experiments on the first three datasets. The results show our extended model can not only preserve a higher accuracy, but also achieve a higher efficiency.

Journal ArticleDOI
01 Jun 2016-Test
TL;DR: In this paper, the authors propose a mixture regression model under the class of scale mixtures of skew-normal distributions, which allows to model data with great flexibility, accommodating skewness and heavy tails, and develop a simple EM-type algorithm to perform maximum likelihood inference of the parameters of the proposed model.
Abstract: The traditional estimation of mixture regression models is based on the assumption of normality (symmetry) of component errors and thus is sensitive to outliers, heavy-tailed errors and/or asymmetric errors In this work we present a proposal to deal with these issues simultaneously in the context of the mixture regression by extending the classic normal model by assuming that the random errors follow a scale mixtures of skew-normal distributions This approach allows us to model data with great flexibility, accommodating skewness and heavy tails The main virtue of considering the mixture regression models under the class of scale mixtures of skew-normal distributions is that they have a nice hierarchical representation which allows easy implementation of inference We develop a simple EM-type algorithm to perform maximum likelihood inference of the parameters of the proposed model In order to examine the robust aspect of this flexible model against outlying observations, some simulation studies are also presented Finally, a real data set is analyzed, illustrating the usefulness of the proposed method

Journal ArticleDOI
TL;DR: The introduction of the “optimally tuned robust improper maximum likelihood estimator” (OTRIMLE) for robust clustering based on the multivariate Gaussian model for clusters and a comprehensive simulation study comparing the OTRIMLE to maximum likelihood in Gaussian mixtures with and without noise component, mixtures of t-distributions, and the TCLUST approach for trimmed clustering are introduced.
Abstract: The two main topics of this article are the introduction of the “optimally tuned robust improper maximum likelihood estimator” (OTRIMLE) for robust clustering based on the multivariate Gaussian model for clusters, and a comprehensive simulation study comparing the OTRIMLE to maximum likelihood in Gaussian mixtures with and without noise component, mixtures of t-distributions, and the TCLUST approach for trimmed clustering. The OTRIMLE uses an improper constant density for modeling outliers and noise. This can be chosen optimally so that the nonnoise part of the data looks as close to a Gaussian mixture as possible. Some deviation from Gaussianity can be traded in for lowering the estimated noise proportion. Covariance matrix constraints and computation of the OTRIMLE are also treated. In the simulation study, all methods are confronted with setups in which their model assumptions are not exactly fulfilled, and to evaluate the experiments in a standardized way by misclassification rates, a new mode...

Journal ArticleDOI
TL;DR: A probabilistic form of the widely used Partial Least Squares (PLS) model for regression modeling and application in industrial processes is formulates and the Bayes rule is applied and an efficient Expectation-Maximization algorithm is designed.

Journal ArticleDOI
TL;DR: For structured types of correlations, such as exchangeable or first-order auto-regressive (AR-1) correlation, the EM algorithm outperforms the multiple imputation approach in terms of both estimation bias and efficiency.

Journal ArticleDOI
01 Oct 2016-Energy
TL;DR: The experimental results indicate that, the proposed model outperforms single and mixture models, and a nested expectation maximization algorithm is introduced, which develops a technique to estimate wind energy potential.