Showing papers on "Expectation–maximization algorithm published in 2005"

PDF

Open Access

Journal Article•DOI•

Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

[...]

Arindam Banerjee¹, Inderjit S. Dhillon¹, Joydeep Ghosh¹, Suvrit Sra¹•Institutions (1)

01 Dec 2005-Journal of Machine Learning Research

TL;DR: A generative mixture-model approach to clustering directional data based on the von Mises-Fisher distribution, which arises naturally for data distributed on the unit hypersphere, and derives and analyzes two variants of the Expectation Maximization framework for estimating the mean and concentration parameters of this mixture.

...read moreread less

Abstract: Several large scale data mining applications, such as text categorization and gene expression analysis, involve high-dimensional data that is also inherently directional in nature. Often such data is L2 normalized so that it lies on the surface of a unit hypersphere. Popular models such as (mixtures of) multi-variate Gaussians are inadequate for characterizing such data. This paper proposes a generative mixture-model approach to clustering directional data based on the von Mises-Fisher (vMF) distribution, which arises naturally for data distributed on the unit hypersphere. In particular, we derive and analyze two variants of the Expectation Maximization (EM) framework for estimating the mean and concentration parameters of this mixture. Numerical estimation of the concentration parameters is non-trivial in high dimensions since it involves functional inversion of ratios of Bessel functions. We also formulate two clustering algorithms corresponding to the variants of EM that we derive. Our approach provides a theoretical basis for the use of cosine similarity that has been widely employed by the information retrieval community, and obtains the spherical kmeans algorithm (kmeans with cosine similarity) as a special case of both variants. Empirical results on clustering of high-dimensional text and gene-expression data based on a mixture of vMF distributions show that the ability to estimate the concentration parameter for each vMF component, which is not present in existing approaches, yields superior results, especially for difficult clustering tasks in high-dimensional spaces.

...read moreread less

869 citations

Journal Article•DOI•

Effective Gaussian mixture learning for video background subtraction

[...]

Dar-Shyang Lee¹•Institutions (1)

Ricoh¹

01 May 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An effective scheme to improve the convergence rate without compromising model stability is proposed by replacing the global, static retention factor with an adaptive learning rate calculated for each Gaussian at every frame.

...read moreread less

Abstract: Adaptive Gaussian mixtures have been used for modeling nonstationary temporal distributions of pixels in video surveillance applications. However, a common problem for this approach is balancing between model convergence speed and stability. This paper proposes an effective scheme to improve the convergence rate without compromising model stability. This is achieved by replacing the global, static retention factor with an adaptive learning rate calculated for each Gaussian at every frame. Significant improvements are shown on both synthetic and real video data. Incorporating this algorithm into a statistical framework for background subtraction leads to an improved segmentation performance compared to a standard method.

...read moreread less

867 citations

Journal Article•DOI•

Using penalized contrasts for the change-point problem

[...]

Marc Lavielle¹•Institutions (1)

University of Paris-Sud¹

01 Aug 2005-Signal Processing

TL;DR: A methodology for model selection based on a penalized contrast is developed, and an adaptive choice of the penalty function for automatically estimating the dimension of the model, i.e., the number of change points, is proposed.

...read moreread less

554 citations

Journal Article•DOI•

Variable Selection using MM Algorithms.

[...]

David R. Hunter¹, Runze Li•Institutions (1)

Pennsylvania State University¹

01 Jan 2005-Annals of Statistics

TL;DR: This article proposes a new class of algorithms for finding a maximizer of the penalized likelihood for a broad class of penalty functions and proves that when these MM algorithms converge, they must converge to a desirable point.

...read moreread less

Abstract: Variable selection is fundamental to high-dimensional statistical modeling. Many variable selection techniques may be implemented by maximum penalized likelihood using various penalty functions. Optimizing the penalized likelihood function is often challenging because it may be nondifferentiable and/or nonconcave. This article proposes a new class of algorithms for finding a maximizer of the penalized likelihood for a broad class of penalty functions. These algorithms operate by perturbing the penalty function slightly to render it differentiable, then optimizing this differentiable function using a minorize-maximize (MM) algorithm. MM algorithms are useful extensions of the well-known class of EM algorithms, a fact that allows us to analyze the local and global convergence of the proposed algorithm using some of the techniques employed for EM algorithms. In particular, we prove that when our MM algorithms converge, they must converge to a desirable point; we also discuss conditions under which this convergence may be guaranteed. We exploit the Newton-Raphson-like aspect of these algorithms to propose a sandwich estimator for the standard errors of the estimators. Our method performs well in numerical tests.

...read moreread less

488 citations

Journal Article•DOI•

Missing-data methods for generalized linear models: A comparative review

[...]

Joseph G. Ibrahim, Ming-Hui Chen¹, Stuart R. Lipsitz², Amy H. Herring•Institutions (2)

University of Connecticut¹, Medical University of South Carolina²

01 Mar 2005-Journal of the American Statistical Association

TL;DR: This work examines data that are missing at random and nonignorable missing, and compares four common approaches for inference in generalized linear models with missing covariate data: maximum likelihood (ML), multiple imputation (MI), fully Bayesian (FB), and weighted estimating equations (WEEs).

...read moreread less

Abstract: Missing data is a major issue in many applied problems, especially in the biomedical sciences. We review four common approaches for inference in generalized linear models (GLMs) with missing covariate data: maximum likelihood (ML), multiple imputation (MI), fully Bayesian (FB), and weighted estimating equations (WEEs). There is considerable interest in how these four methodologies are related, the properties of each approach, the advantages and disadvantages of each methodology, and computational implementation. We examine data that are missing at random and nonignorable missing. For ML, we focus on techniques using the EM algorithm, and in particular, discuss the EM by the method of weights and related procedures as discussed by Ibrahim. For MI, we examine the techniques developed by Rubin. For FB, we review approaches considered by Ibrahim et al. For WEE, we focus on the techniques developed by Robins et al. We use a real dataset and a detailed simulation study to compare the four methods.

...read moreread less

478 citations

Journal Article•DOI•

Maximum likelihood estimation in nonlinear mixed effects models

[...]

E. Kuhn¹, Marc Lavielle¹•Institutions (1)

University of Paris-Sud¹

01 Jun 2005-Computational Statistics & Data Analysis

TL;DR: A stochastic approximation version of EM for maximum likelihood estimation of a wide class of nonlinear mixed effects models is proposed, able to provide an estimator close to the MLE in very few iterations.

...read moreread less

452 citations

Journal Article•DOI•

Discrete-Time Survival Mixture Analysis

[...]

Bengt Muthén¹, Katherine E. Masyn²•Institutions (2)

University of California, Los Angeles¹, University of California, Davis²

20 Mar 2005-Journal of Educational and Behavioral Statistics

TL;DR: A general latent variable approach to discrete-time survival analysis of nonrepeatable events such as onset of drug use and how the survival analysis can be formulated as a generalized latent class analysis of event history indicators is proposed.

...read moreread less

Abstract: This article proposes a general latent variable approach to discrete-time survival analysis of nonrepeatable events such as onset of drug use. It is showvn how the survival analysis can beformulated as a generalized latent class analysis of event history indicators. The latent class analysis can use covariates and can be combined wvith the joint modeling of other outcomes such as repeated measures for a related process. It is shown that conventional discrete-time survival analysis corresponds to a single-class latent class analysis. Multiple-class extensions are proposed, including the special cases of a class of long-term survivors and classes defined by outcomes related to survivaL The estimation uses a general latent variable framework, including both categorical and continuous latent variables and incorporated in the Mplus program. Estimation is carried out using maximum likelihood via the EM algorithm. Two examples serve as illustrations. The first example concerns recidivism after incarceration in a randomized field experiment. The second example concerns school removal related to the development of aggressive behavior in the classroom.

...read moreread less

242 citations

Journal Article•DOI•

Genetic-based EM algorithm for learning Gaussian mixture models

[...]

Franz Pernkopf¹, Djamel Bouchaffra•Institutions (1)

Graz University of Technology¹

01 Aug 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The proposed genetic-based expectation-maximization (GA-EM) algorithm for learning Gaussian mixture models from multivariate data is elitist which maintains the monotonic convergence property of the EM algorithm.

...read moreread less

Abstract: We propose a genetic-based expectation-maximization (GA-EM) algorithm for learning Gaussian mixture models from multivariate data. This algorithm is capable of selecting the number of components of the model using the minimum description length (MDL) criterion. Our approach benefits from the properties of genetic algorithms (GA) and the EM algorithm by combination of both into a single procedure. The population-based stochastic search of the GA explores the search space more thoroughly than the EM method. Therefore, our algorithm enables escaping from local optimal solutions since the algorithm becomes less sensitive to its initialization. The GA-EM algorithm is elitist which maintains the monotonic convergence property of the EM algorithm. The experiments on simulated and real data show that the GA-EM outperforms the EM method since: (1) we have obtained a better MDL score while using exactly the same termination condition for both algorithms; (2) our approach identifies the number of components which were used to generate the underlying data more often than the EM algorithm.

...read moreread less

239 citations

Journal Article•DOI•

A spatially constrained mixture model for image segmentation

[...]

Konstantinos Blekas¹, Aristidis Likas¹, Nikolas P. Galatsanos¹, Isaac E. Lagaris¹•Institutions (1)

University of Ioannina¹

01 Mar 2005-IEEE Transactions on Neural Networks

TL;DR: A new methodology for the M-step of the EM algorithm that is based on a novel constrained optimization formulation that shows superior performance in terms of the attained maximum value of the objective function and segmentation accuracy compared to previous implementations of this approach.

...read moreread less

Abstract: Gaussian mixture models (GMMs) constitute a well-known type of probabilistic neural networks. One of their many successful applications is in image segmentation, where spatially constrained mixture models have been trained using the expectation-maximization (EM) framework. In this letter, we elaborate on this method and propose a new methodology for the M-step of the EM algorithm that is based on a novel constrained optimization formulation. Numerical experiments using simulated images illustrate the superior performance of our method in terms of the attained maximum value of the objective function and segmentation accuracy compared to previous implementations of this approach.

...read moreread less

219 citations

Journal Article•DOI•

A comparison of algorithms for inference and learning in probabilistic graphical models

[...]

Brendan J. Frey¹, Nebojsa Jojic²•Institutions (2)

University of Toronto¹, Microsoft²

01 Sep 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper advocates the use of graph-based probability models and their associated inference and learning algorithms, and describes how each technique can be applied in a vision model of multiple, occluding objects and contrast the behaviors and performances of the techniques using a unifying cost function, free energy.

...read moreread less

Abstract: Research into methods for reasoning under uncertainty is currently one of the most exciting areas of artificial intelligence, largely because it has recently become possible to record, store, and process large amounts of data. While impressive achievements have been made in pattern classification problems such as handwritten character recognition, face detection, speaker identification, and prediction of gene function, it is even more exciting that researchers are on the verge of introducing systems that can perform large-scale combinatorial analyses of data, decomposing the data into interacting components. For example, computational methods for automatic scene analysis are now emerging in the computer vision community. These methods decompose an input image into its constituent objects, lighting conditions, motion patterns, etc. Two of the main challenges are finding effective representations and models in specific applications and finding efficient algorithms for inference and learning in these models. In this paper, we advocate the use of graph-based probability models and their associated inference and learning algorithms. We review exact techniques and various approximate, computationally efficient techniques, including iterated conditional modes, the expectation maximization (EM) algorithm, Gibbs sampling, the mean field method, variational techniques, structured variational techniques and the sum-product algorithm ("loopy" belief propagation). We describe how each technique can be applied in a vision model of multiple, occluding objects and contrast the behaviors and performances of the techniques using a unifying cost function, free energy.

...read moreread less

205 citations

Journal Article•DOI•

Analysis of Longitudinal Studies With Missing Data Using Covariance Structure Modeling With Full-Information Maximum Likelihood.

[...]

Tenko Raykov

01 Jul 2005-Structural Equation Modeling

TL;DR: In this paper, a didactic discussion of covariance structure modeling in longitudinal studies with missing data is presented, and use of the full-information maximum likelihood method is considered for model fitting, parameter estimation, and hypothesis testing purposes, particularly when interested in patterns of temporal change and its covariates and predictors.

...read moreread less

Abstract: A didactic discussion of covariance structure modeling in longitudinal studies with missing data is presented. Use of the full-information maximum likelihood method is considered for model fitting, parameter estimation, and hypothesis testing purposes, particularly when interested in patterns of temporal change as well as its covariates and predictors. The approach is illustrated with an application of the popular level-and-shape model to data from a cognitive intervention study of elderly adults.

...read moreread less

Journal Article•DOI•

A general diagnostic model applied to language testing data

[...]

Matthias von Davier¹•Institutions (1)

Princeton University¹

01 Dec 2005-ETS Research Report Series

TL;DR: An algorithm is described that capitalizes on using tools from item response theory for scale linking, item fit, and parameter estimation that applies to polytomous response variables as well as to skills with two or more proficiency levels.

...read moreread less

Abstract: Probabilistic models with more than one latent variable are designed to report profiles of skills or cognitive attributes. Testing programs want to offer additional information beyond what a single test score can provide using these skill profiles. Many recent approaches to skill profile models are limited to dichotomous data and have made use of computationally intensive estimation methods like the Markov chain Monte Carlo (MCMC), since standard maximum likelihood (ML) estimation techniques were deemed infeasible. This paper presents a class of general diagnostic models (GDMs) that can be estimated with customary ML techniques and applies to polytomous response variables as well as to skills with two or more proficiency levels. The model and the algorithm for estimating model parameters handles directly missing responses without the need of collapsing categories or recoding the data. Within the class of GDMs, compensatory as well as noncompensatory models may be specified. This report uses one member of this class of diagnostic models, a compensatory diagnostic model that is parameterized similar to the generalized partial credit model (GPCM). Many well-known models, such as uni- and multivariate versions of the Rasch model and the two parameter logistic item response theory (2PL-IRT) model, the GPCM, and the FACETS model, as well as a variety of skill profile models, are special cases of this member of the class of GDMs. This paper describes an algorithm that capitalizes on using tools from item response theory for scale linking, item fit, and parameter estimation. In addition to an introduction to the class of GDMs and to the partial credit instance of this class for dichotomous and polytomous skill profiles, this paper presents a parameter recovery study using simulated data and an application to real data from the field test for TOEFL® Internet-based testing (iBT).

...read moreread less

Journal Article•DOI•

High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature

[...]

Stephen G. Schilling¹, R. Darrell Bock²•Institutions (2)

University of Michigan¹, University of Illinois at Chicago²

05 Oct 2005-Psychometrika

TL;DR: Adapt G-H quadrature, combined with mean and covariance adjustments at each iteration of an EM algorithm, produces an accurate fast-converging solution with as few as two points per dimension.

...read moreread less

Abstract: Although the Bock–Aitkin likelihood-based estimation method for factor analysis of dichotomous item response data has important advantages over classical analysis of item tetrachoric correlations, a serious limitation of the method is its reliance on fixed-point Gauss-Hermite (G-H) quadrature in the solution of the likelihood equations and likelihood-ratio tests When the number of latent dimensions is large, computational considerations require that the number of quadrature points per dimension be few But with large numbers of items, the dispersion of the likelihood, given the response pattern, becomes so small that the likelihood cannot be accurately evaluated with the sparse fixed points in the latent space In this paper, we demonstrate that substantial improvement in accuracy can be obtained by adapting the quadrature points to the location and dispersion of the likelihood surfaces corresponding to each distinct pattern in the data In particular, we show that adaptive G-H quadrature, combined with mean and covariance adjustments at each iteration of an EM algorithm, produces an accurate fast-converging solution with as few as two points per dimension Evaluations of this method with simulated data are shown to yield accurate recovery of the generating factor loadings for models of upto eight dimensions Unlike an earlier application of adaptive Gibbs sampling to this problem by Meng and Schilling, the simulations also confirm the validity of the present method in calculating likelihood-ratio chi-square statistics for determining the number of factors required in the model Finally, we apply the method to a sample of real data from a test of teacher qualifications

...read moreread less

Journal Article•

Convergence Theorems for Generalized Alternating Minimization Procedures

[...]

Asela Gunawardana¹, William Byrne²•Institutions (2)

Microsoft¹, University of Cambridge²

01 Dec 2005-Journal of Machine Learning Research

TL;DR: This work studies EM variants in which the E-step is not performed exactly, either to obtain improved rates of convergence, or due to approximations needed to compute statistics under a model family over which E-steps cannot be realized.

...read moreread less

Abstract: The EM algorithm is widely used to develop iterative parameter estimation procedures for statistical models. In cases where these procedures strictly follow the EM formulation, the convergence properties of the estimation procedures are well understood. In some instances there are practical reasons to develop procedures that do not strictly fall within the EM framework. We study EM variants in which the E-step is not performed exactly, either to obtain improved rates of convergence, or due to approximations needed to compute statistics under a model family over which E-steps cannot be realized. Since these variants are not EM procedures, the standard (G)EM convergence results do not apply to them. We present an information geometric framework for describing such algorithms and analyzing their convergence properties. We apply this framework to analyze the convergence properties of incremental EM and variational EM. For incremental EM, we discuss conditions under these algorithms converge in likelihood. For variational EM, we show how the E-step approximation prevents convergence to local maxima in likelihood.

...read moreread less

Journal Article•DOI•

Maximization by Parts in Likelihood Inference

[...]

Peter X.-K. Song¹, Yanqin Fan², John D. Kalbfleisch³, Jiming Jiang⁴, Thomas A. Louis⁵, J. G. Liao, Bahjat F. Qaqish⁶, David Ruppert⁷ - Show less +4 more•Institutions (7)

University of Waterloo¹, Vanderbilt University², University of Michigan³, University of California, Davis⁴, Johns Hopkins University⁵, University of North Carolina at Chapel Hill⁶, Cornell University⁷

01 Dec 2005-Journal of the American Statistical Association

TL;DR: In this article, a new algorithm for solving a score equation for the maximum likelihood estimate in certain problems of practical interest is presented and examined, and convergence properties of this iterative (fixed-point) algorithm are derived for estimators obtained using only a finite number of iterations.

...read moreread less

Abstract: This article presents and examines a new algorithm for solving a score equation for the maximum likelihood estimate in certain problems of practical interest. The method circumvents the need to compute second-order derivatives of the full likelihood function. It exploits the structure of certain models that yield a natural decomposition of a very complicated likelihood function. In this decomposition, the first part is a log-likelihood from a simply analyzed model, and the second part is used to update estimates from the first part. Convergence properties of this iterative (fixed-point) algorithm are examined, and asymptotics are derived for estimators obtained using only a finite number of iterations. Illustrative examples considered in the article include multivariate Gaussian copula models, nonnormal random-effects models, generalized linear mixed models, and state-space models. Properties of the algorithm and of estimators are evaluated in simulation studies on a bivariate copula model and a nonnormal...

...read moreread less

Journal Article•DOI•

On an extension of the exponential-geometric distribution

[...]

Konstantinos Adamidis¹, Theodora Dimitrakopoulou¹, S. Loukas¹•Institutions (1)

University of Ioannina¹

01 Jul 2005-Statistics & Probability Letters

TL;DR: In this article, various statistical properties and reliability aspects of a two-parameter distribution with decreasing and increasing failure rate are explored; the model includes the exponential-geometric distribution, and expressions for their asymptotic variances and covariances are derived.

...read moreread less

Journal Article•DOI•

Maximum likelihood estimation of haplotype effects and haplotype-environment interactions in association studies.

[...]

Danyu Lin¹, Donglin Zeng¹, Robert C. Millikan¹•Institutions (1)

University of North Carolina at Chapel Hill¹

01 Dec 2005-Genetic Epidemiology

TL;DR: A general likelihood‐based approach to inferring haplotypes‐disease associations in studies of unrelated individuals and an application to the Carolina Breast Cancer Study reveals significant haplotype effects and haplotype‐smoking interactions in the development of breast cancer.

...read moreread less

Abstract: The associations between haplotypes and disease phenotypes offer valuable clues about the genetic determinants of complex diseases. It is highly challenging to make statistical inferences about these associations because of the unknown gametic phase in genotype data. We describe a general likelihood-based approach to inferring haplotype-disease associations in studies of unrelated individuals. We consider all possible phenotypes (including disease indicator, quantitative trait, and potentially censored age at onset of disease) and all commonly used study designs (including cross-sectional, case-control, cohort, nested case-control, and case-cohort). The effects of haplotypes on phenotype are characterized by appropriate regression models, which allow various genetic mechanisms and gene-environment interactions. We present the likelihood functions for all study designs and disease phenotypes under Hardy-Weinberg disequilibrium. The corresponding maximum likelihood estimators are approximately unbiased, normally distributed, and statistically efficient. We provide simple and efficient numerical algorithms to calculate the maximum likelihood estimators and their variances, and implement these algorithms in a freely available computer program. Extensive simulation studies demonstrate that the proposed methods perform well in realistic situations. An application to the Carolina Breast Cancer Study reveals significant haplotype effects and haplotype-smoking interactions in the development of breast cancer.

...read moreread less

Journal Article•DOI•

Solving the Likelihood Equations

[...]

Serkan Hosten¹, Amit Khetan², Bernd Sturmfels³•Institutions (3)

San Francisco State University¹, University of Massachusetts Amherst², University of California, Berkeley³

01 Nov 2005-Foundations of Computational Mathematics

TL;DR: In this article, the authors presented algebraic algorithms for computing all critical points of the likelihood function with the aim of identifying the local maxima in the probability simplex, and the maximum likelihood degree of a generic complete intersection.

...read moreread less

Abstract: Given a model in algebraic statistics and data, the likelihood function is a rational function on a projective variety. Algebraic algorithms are presented for computing all critical points of this function, with the aim of identifying the local maxima in the probability simplex. Applications include models specified by rank conditions on matrices and the Jukes–Cantor models of phylogenetics. The maximum likelihood degree of a generic complete intersection is also determined.

...read moreread less

Parameter convergence for em and mm algorithms

[...]

Florin Vaida

01 Jan 2005

TL;DR: It is shown that under general, simple, veriable conditions, any EM sequence is convergent, if the maximizer at the M-step is unique; this condition is almost always satis- ed in practice.

...read moreread less

Abstract: It is well known that the likelihood sequence of the EM algorithm is non- decreasing and convergent (Dempster, Laird and Rubin (1977)), and that the limit points of the EM algorithm are stationary points of the likelihood (Wu (1982)), but the issue of the convergence of the EM sequence itself has not been completely settled. In this paper we close this gap and show that under general, simple, veriable conditions, any EM sequence is convergent. In pathological cases we show that the sequence is cycling in the limit among a nite number of stationary points with equal likelihood. The results apply equally to the optimization transfer class of algorithms (MM algorithm) of Lange, Hunter, and Yang (2000). Two dieren t EM algorithms constructed on the same dataset illustrate the convergence and the cyclic behavior. This paper contains new results concerning the convergence of the EM al- gorithm. The EM algorithm was brought into the limelight by Dempster, Laird and Rubin (1977) as a general iterative method of computing the maximum likelihood estimator by maximizing a simpler likelihood on an augmented data space. However, the problem of the convergence of the algorithm has not been satisfactory resolved. Wu (1983), the main theoretical contribution in this area, showed that the limit points of the EM algorithm are stationary points of the likelihood, and that when the likelihood is unimodal, any EM sequence is con- vergent. Boyles (1983) has a number of results along similar lines. These results still allow the possibility of a non-convergent EM sequence when the likelihood is not unimodal. More importantly, the EM algorithm is useful when the likelihood is hard to obtain directly; for these cases, the unimodality of the likelihood is very dicult to verify. Here we give simple, general, veriable conditions for con- vergence: our main result (Theorem 3) is that any EM sequence is convergent, if the maximizer at the M-step is unique. This condition is almost always satis- ed in practice (otherwise the particular EM data augmentation scheme would

...read moreread less

Journal Article•DOI•

Learning mixtures of separated nonspherical gaussians

[...]

Sanjeev Arora, Ravi Kannan¹, Ravi Kannan²•Institutions (2)

Princeton University¹, Yale University²

01 Feb 2005-Annals of Applied Probability

TL;DR: This work presents the first algorithm that provably learns the component Gaussians in time that is polynomial in the dimension, and formalizes the more general problem of max-likelihood fit of a Gaussian mixture to unstructured data.

...read moreread less

Abstract: Mixtures of Gaussian (or normal) distributions arise in a variety of application areas. Many heuristics have been proposed for the task of finding the component Gaussians given samples from the mixture, such as the EM algorithm, a local-search heuristic from Dempster, Laird and Rubin [J. Roy. Statist. Soc. Ser. B 39 (1977) 1–38]. These do not provably run in polynomial time. We present the first algorithm that provably learns the component Gaussians in time that is polynomial in the dimension. The Gaussians may have arbitrary shape, but they must satisfy a “separation condition” which places a lower bound on the distance between the centers of any two component Gaussians. The mathematical results at the heart of our proof are “distance concentration” results—proved using isoperimetric inequalities—which establish bounds on the probability distribution of the distance between a pair of points generated according to the mixture. We also formalize the more general problem of max-likelihood fit of a Gaussian mixture to unstructured data.

...read moreread less

Journal Article•

Learning Hidden Variable Networks: The Information Bottleneck Approach

[...]

Gal Elidan¹, Nir Friedman•Institutions (1)

Hebrew University of Jerusalem¹

01 Dec 2005-Journal of Machine Learning Research

TL;DR: This work builds on the information bottleneck framework of Tishby et al. (1999) and constructs a learning algorithm that combines an information-theoretic smoothing term with a continuation procedure that bypasses local maxima and achieves superior solutions.

...read moreread less

Abstract: A central challenge in learning probabilistic graphical models is dealing with domains that involve hidden variables. The common approach for learning model parameters in such domains is the expectation maximization (EM) algorithm. This algorithm, however, can easily get trapped in sub-optimal local maxima. Learning the model structure is even more challenging. The structural EM algorithm can adapt the structure in the presence of hidden variables, but usually performs poorly without prior knowledge about the cardinality and location of the hidden variables. In this work, we present a general approach for learning Bayesian networks with hidden variables that overcomes these problems. The approach builds on the information bottleneck framework of Tishby et al. (1999). We start by proving formal correspondence between the information bottleneck objective and the standard parametric EM functional. We then use this correspondence to construct a learning algorithm that combines an information-theoretic smoothing term with a continuation procedure. Intuitively, the algorithm bypasses local maxima and achieves superior solutions by following a continuous path from a solution of, an easy and smooth, target function, to a solution of the desired likelihood function. As we show, our algorithmic framework allows learning of the parameters as well as the structure of a network. In addition, it also allows us to introduce new hidden variables during model selection and learn their cardinality. We demonstrate the performance of our procedure on several challenging real-life data sets.

...read moreread less

Journal Article•DOI•

Bivariate Poisson and diagonal inflated bivariate Poisson regression models in R

[...]

Dimitris Karlis, Ioannis Ntzoufras¹•Institutions (1)

Athens University of Economics and Business¹

05 Sep 2005-Journal of Statistical Software

TL;DR: In this article, an R package called bivpois is presented for maximum likelihood estimation of the parameters of bivariate and diagonal inflated bivariate Poisson regression models, and an Expectation-Maximization (EM) algorithm is implemented.

...read moreread less

Abstract: In this paper we present an R package called bivpois for maximum likelihood estimation of the parameters of bivariate and diagonal inflated bivariate Poisson regression models. An Expectation-Maximization (EM) algorithm is implemented. Inflated models allow for modelling both over-dispersion (or under-dispersion) and negative correlation and thus they are appropriate for a wide range of applications. Extensions of the algorithms for several other models are also discussed. Detailed guidance and implementation on simulated and real data sets using bivpois package is provided.

...read moreread less

Journal Article•DOI•

An EM algorithm for the block mixture model

[...]

Gérard Govaert¹, Mohamed Nadif²•Institutions (2)

University of Technology of Compiègne¹, Metz²

01 Apr 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper considers the block clustering problem under the maximum likelihood approach and proposes a generalized EM algorithm to estimate the parameters of the block mixture model and studies the case of binary data by using a Bernoulli block mixture.

...read moreread less

Abstract: Although many clustering procedures aim to construct an optimal partition of objects or, sometimes, of variables, there are other methods, called block clustering methods, which consider simultaneously the two sets and organize the data into homogeneous blocks. Recently, we have proposed a new mixture model called block mixture model which takes into account this situation. This model allows one to embed simultaneous clustering of objects and variables in a mixture approach. We have studied this probabilistic model under the classification likelihood approach and developed a new algorithm for simultaneous partitioning based on the classification EM algorithm. In this paper, we consider the block clustering problem under the maximum likelihood approach and the goal of our contribution is to estimate the parameters of this model. Unfortunately, the application of the EM algorithm for the block mixture model cannot be made directly; difficulties arise due to the dependence structure in the model and approximations are required. Using a variational approximation, we propose a generalized EM algorithm to estimate the parameters of the block mixture model and, to illustrate our approach, we study the case of binary data by using a Bernoulli block mixture.

...read moreread less

Journal Article•DOI•

Maximum weighted likelihood via rival penalized EM for density mixture clustering with automatic model selection

[...]

Yiu-ming Cheung¹•Institutions (1)

Hong Kong Baptist University¹

01 Jun 2005-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A stochastic version of RPCL and its type A variant, respectively, are proposed, in which the difficult selection problem of delearning rate has been novelly circumvented.

...read moreread less

Abstract: Expectation-maximization (EM) algorithm (A.P. Dempster et al., 1977) has been extensively used in density mixture clustering problems, but it is unable to-perform model selection automatically. This paper, therefore, proposes to learn the model parameters via maximizing a weighted likelihood. Under a specific weight design, we give out a rival penalized expectation-maximization (RPEM) algorithm, which makes the components in a density mixture compete each other at each time step. Not only are the associated parameters of the winner updated to adapt to an input, but also all rivals' parameters are penalized with the strength proportional to the corresponding posterior density probabilities. Compared to the EM algorithm (A.P. Dempster et al., 1977), the RPEM is able to fade out the redundant densities from a density mixture during the learning process. Hence, it can automatically select an appropriate number of densities in density mixture clustering. We experimentally demonstrate its outstanding performance on Gaussian mixtures and color image segmentation problem. Moreover, a simplified version of RPEM generalizes our recently proposed RPCCL algorithm (Y.M. Cheung, 2002) so that it is applicable to elliptical clusters as well with any input proportion. Compared to the existing heuristic RPCL (L. Xu et al., 1993) and its variants, this generalized RPCCL (G-RPCCL) circumvents the difficult preselection of the so-called delearning rate. Additionally, a special setting of the G-RPCCL not only degenerates to RPCL and its Type A variant, but also gives a guidance to choose an appropriate delearning rate for them. Subsequently, we propose a stochastic version of RPCL and its type A variant, respectively, in which the difficult selection problem of delearning rate has been novelly circumvented. The experiments show the promising results of this stochastic implementation.

...read moreread less

Proceedings Article•DOI•

Learning predictive state representations in dynamical systems without reset

[...]

Britton Wolfe¹, Michael James¹, Satinder Singh¹•Institutions (1)

University of Michigan¹

07 Aug 2005

TL;DR: Two algorithms can learn models for systems without requiring a reset action as was needed by the previously available general PSR-model learning algorithm: a Monte Carlo algorithm and a temporal difference algorithm.

...read moreread less

Abstract: Predictive state representations (PSRs) are a recently-developed way to model discrete-time, controlled dynamical systems. We present and describe two algorithms for learning a PSR model: a Monte Carlo algorithm and a temporal difference (TD) algorithm. Both of these algorithms can learn models for systems without requiring a reset action as was needed by the previously available general PSR-model learning algorithm. We present empirical results that compare our two algorithms and also compare their performance with that of existing algorithms, including an EM algorithm for learning POMDP models.

...read moreread less

Journal Article•DOI•

Development and calibration of an item response model that incorporates response time

[...]

Tianyou Wang¹, Bradley A. Hanson²•Institutions (2)

University of Iowa¹, CTB/McGraw Hill²

01 Sep 2005-Applied Psychological Measurement

TL;DR: In this article, an item response model that incorporates response time was proposed and a parameter estimation procedure using the EM algorithm was developed, evaluated with both real and simulated test data and the results suggest that the estimation procedure works well in estimating model parameters.

...read moreread less

Abstract: This article proposes an item response model that incorporates response time. A parameter estimation procedure using the EM algorithm is developed. The procedure is evaluated with both real and simulated test data. The results suggest that the estimation procedure works well in estimating model parameters. By using response time data, estimation of person ability parameters can be improved. Potential applications of this model are discussed. Directions for further study are suggested.

...read moreread less

Proceedings Article•DOI•

Dynamic Language Model Adaptation using Variational Bayes Inference

[...]

Yik-Cheung Tam, Tanja Schultz

04 Sep 2005

TL;DR: An unsupervised dynamic language model (LM) adaptation framework using long-distance latent topic mixtures and the LDA model is combined with the trigram language model using linear interpolation to reduce the perplexity and character error rate.

...read moreread less

Abstract: We propose an unsupervised dynamic language model (LM) adaptation framework using long-distance latent topic mixtures. The framework employs the Latent Dirichlet Allocation model (LDA) which models the latent topics of a document collection in an unsupervised and Bayesian fashion. In the LDA model, each word is modeled as a mixture of latent topics. Varying topics within a context can be modeled by re-sampling the mixture weights of the latent topics from a prior Dirichlet distribution. The model can be trained using the variational Bayes Expectation Maximization algorithm. During decoding, mixture weights of the latent topics are adapted dynamically using the hypotheses of previously decoded utterances. In our work, the LDA model is combined with the trigram language model using linear interpolation. We evaluated the approach on the CCTV episode of the RT04 Mandarin Broadcast News test set. Results show that the proposed approach reduces the perplexity by up to 15.4% relative and the character error rate by 4.9% relative depending on the size and setup of the training set.

...read moreread less

Journal Article•DOI•

Self-organizing mixture models

[...]

Jakob Verbeek¹, Nikos Vlassis¹, Ben Kröse¹•Institutions (1)

University of Amsterdam¹

01 Jan 2005-Neurocomputing

TL;DR: An expectation-maximization (EM) algorithm that yields topology preserving maps of data based on probabilistic mixture models that allows principled handling of missing data and learning of mixtures of SOMs.

...read moreread less

Journal Article•DOI•

Mixture model analysis of DNA microarray images

[...]

Konstantinos Blekas¹, Nikolaos Galatsanos¹, Aristidis Likas¹, Isaac E. Lagaris¹•Institutions (1)

University of Ioannina¹

05 Jul 2005-IEEE Transactions on Medical Imaging

TL;DR: A new gridding algorithm is proposed for determining the individual spots and their borders of the Gaussian mixture model (GMM) and the main advantages of the proposed methodology are modeling flexibility and adaptability to the data, which are well-known strengths of GMM.

...read moreread less

Abstract: In this paper, we propose a new methodology for analysis of microarray images. First, a new gridding algorithm is proposed for determining the individual spots and their borders. Then, a Gaussian mixture model (GMM) approach is presented for the analysis of the individual spot images. The main advantages of the proposed methodology are modeling flexibility and adaptability to the data, which are well-known strengths of GMM. The maximum likelihood and maximum a posteriori approaches are used to estimate the GMM parameters via the expectation maximization algorithm. The proposed approach has the ability to detect and compensate for artifacts that might occur in microarray images. This is accomplished by a model-based criterion that selects the number of the mixture components. We present numerical experiments with artificial and real data where we compare the proposed approach with previous ones and existing software tools for microarray image analysis and demonstrate its advantages.

...read moreread less

Calibration of multivariate generalized hyperbolic distributions using the em algorithm, with applications in risk management, portfolio optimization and portfolio credit risk

[...]

Alec N. Kercheval¹, Wenbo Hu¹•Institutions (1)

Florida State University¹

01 Jan 2005

TL;DR: This dissertation describes a way to stably calibrate GH distributions for a wider range of parameters than has previously been reported and develops a version of the EM algorithm for calibrating GH distributions, which enables for the first time certain GH distributions to be used in modeling contexts when previously they have been numerically intractable.

...read moreread less

Abstract: The distributions of many financial quantities are well-known to have heavy tails, exhibit skewness, and have other non-Gaussian characteristics. In this dissertation we study an especially promising family: the multivariate generalized hyperbolic distributions (GH). This family includes and generalizes the familiar Gaussian and Student t distributions, and the so-called skewed t distributions, among many others. The primary obstacle to the applications of such distributions is the numerical difficulty of calibrating the distributional parameters to the data. In this dissertation we describe a way to stably calibrate GH distributions for a wider range of parameters than has previously been reported. In particular, we develop a version of the EM algorithm for calibrating GH distributions. This is a modification of methods proposed in McNeil, Frey, and Embrechts (2005), and generalizes the algorithm of Protassov (2004). Our algorithm extends the stability of the calibration procedure to a wide range of parameters, now including parameter values that maximize log-likelihood for our real market data sets. This allows for the first time certain GH distributions to be used in modeling contexts when previously they have been numerically intractable. Our algorithm enables us to make new uses of GH distributions in three financial applications. First, we forecast univariate Value-at-Risk (VaR) for stock index returns, and we show in out-of-sample backtesting that the GH distributions outperform the Gaussian distribution. Second, we calculate an efficient frontier for equity portfolio optimization under the skewed-t distribution and using Expected Shortfall as the risk measure. Here, we show that the Gaussian efficient frontier is actually unreachable if returns are skewed t distributed. Third, we build an intensity-based model to price Basket Credit Default Swaps by calibrating the skewed t distribution directly, without the need to separately calibrate the skewed t copula. To our knowledge this is the first use of the skewed t distribution in portfolio optimization and in portfolio credit risk.

...read moreread less

Collapse