scispace - formally typeset
Search or ask a question

Showing papers on "Expectation–maximization algorithm published in 2005"


Journal ArticleDOI
TL;DR: A generative mixture-model approach to clustering directional data based on the von Mises-Fisher distribution, which arises naturally for data distributed on the unit hypersphere, and derives and analyzes two variants of the Expectation Maximization framework for estimating the mean and concentration parameters of this mixture.
Abstract: Several large scale data mining applications, such as text categorization and gene expression analysis, involve high-dimensional data that is also inherently directional in nature. Often such data is L2 normalized so that it lies on the surface of a unit hypersphere. Popular models such as (mixtures of) multi-variate Gaussians are inadequate for characterizing such data. This paper proposes a generative mixture-model approach to clustering directional data based on the von Mises-Fisher (vMF) distribution, which arises naturally for data distributed on the unit hypersphere. In particular, we derive and analyze two variants of the Expectation Maximization (EM) framework for estimating the mean and concentration parameters of this mixture. Numerical estimation of the concentration parameters is non-trivial in high dimensions since it involves functional inversion of ratios of Bessel functions. We also formulate two clustering algorithms corresponding to the variants of EM that we derive. Our approach provides a theoretical basis for the use of cosine similarity that has been widely employed by the information retrieval community, and obtains the spherical kmeans algorithm (kmeans with cosine similarity) as a special case of both variants. Empirical results on clustering of high-dimensional text and gene-expression data based on a mixture of vMF distributions show that the ability to estimate the concentration parameter for each vMF component, which is not present in existing approaches, yields superior results, especially for difficult clustering tasks in high-dimensional spaces.

869 citations


Journal ArticleDOI
Dar-Shyang Lee1
TL;DR: An effective scheme to improve the convergence rate without compromising model stability is proposed by replacing the global, static retention factor with an adaptive learning rate calculated for each Gaussian at every frame.
Abstract: Adaptive Gaussian mixtures have been used for modeling nonstationary temporal distributions of pixels in video surveillance applications. However, a common problem for this approach is balancing between model convergence speed and stability. This paper proposes an effective scheme to improve the convergence rate without compromising model stability. This is achieved by replacing the global, static retention factor with an adaptive learning rate calculated for each Gaussian at every frame. Significant improvements are shown on both synthetic and real video data. Incorporating this algorithm into a statistical framework for background subtraction leads to an improved segmentation performance compared to a standard method.

867 citations


Journal ArticleDOI
TL;DR: A methodology for model selection based on a penalized contrast is developed, and an adaptive choice of the penalty function for automatically estimating the dimension of the model, i.e., the number of change points, is proposed.

554 citations


Journal ArticleDOI
TL;DR: This article proposes a new class of algorithms for finding a maximizer of the penalized likelihood for a broad class of penalty functions and proves that when these MM algorithms converge, they must converge to a desirable point.
Abstract: Variable selection is fundamental to high-dimensional statistical modeling. Many variable selection techniques may be implemented by maximum penalized likelihood using various penalty functions. Optimizing the penalized likelihood function is often challenging because it may be nondifferentiable and/or nonconcave. This article proposes a new class of algorithms for finding a maximizer of the penalized likelihood for a broad class of penalty functions. These algorithms operate by perturbing the penalty function slightly to render it differentiable, then optimizing this differentiable function using a minorize-maximize (MM) algorithm. MM algorithms are useful extensions of the well-known class of EM algorithms, a fact that allows us to analyze the local and global convergence of the proposed algorithm using some of the techniques employed for EM algorithms. In particular, we prove that when our MM algorithms converge, they must converge to a desirable point; we also discuss conditions under which this convergence may be guaranteed. We exploit the Newton-Raphson-like aspect of these algorithms to propose a sandwich estimator for the standard errors of the estimators. Our method performs well in numerical tests.

488 citations


Journal ArticleDOI
TL;DR: This work examines data that are missing at random and nonignorable missing, and compares four common approaches for inference in generalized linear models with missing covariate data: maximum likelihood (ML), multiple imputation (MI), fully Bayesian (FB), and weighted estimating equations (WEEs).
Abstract: Missing data is a major issue in many applied problems, especially in the biomedical sciences. We review four common approaches for inference in generalized linear models (GLMs) with missing covariate data: maximum likelihood (ML), multiple imputation (MI), fully Bayesian (FB), and weighted estimating equations (WEEs). There is considerable interest in how these four methodologies are related, the properties of each approach, the advantages and disadvantages of each methodology, and computational implementation. We examine data that are missing at random and nonignorable missing. For ML, we focus on techniques using the EM algorithm, and in particular, discuss the EM by the method of weights and related procedures as discussed by Ibrahim. For MI, we examine the techniques developed by Rubin. For FB, we review approaches considered by Ibrahim et al. For WEE, we focus on the techniques developed by Robins et al. We use a real dataset and a detailed simulation study to compare the four methods.

478 citations


Journal ArticleDOI
TL;DR: A stochastic approximation version of EM for maximum likelihood estimation of a wide class of nonlinear mixed effects models is proposed, able to provide an estimator close to the MLE in very few iterations.

452 citations


Journal ArticleDOI
TL;DR: A general latent variable approach to discrete-time survival analysis of nonrepeatable events such as onset of drug use and how the survival analysis can be formulated as a generalized latent class analysis of event history indicators is proposed.
Abstract: This article proposes a general latent variable approach to discrete-time survival analysis of nonrepeatable events such as onset of drug use. It is showvn how the survival analysis can beformulated as a generalized latent class analysis of event history indicators. The latent class analysis can use covariates and can be combined wvith the joint modeling of other outcomes such as repeated measures for a related process. It is shown that conventional discrete-time survival analysis corresponds to a single-class latent class analysis. Multiple-class extensions are proposed, including the special cases of a class of long-term survivors and classes defined by outcomes related to survivaL The estimation uses a general latent variable framework, including both categorical and continuous latent variables and incorporated in the Mplus program. Estimation is carried out using maximum likelihood via the EM algorithm. Two examples serve as illustrations. The first example concerns recidivism after incarceration in a randomized field experiment. The second example concerns school removal related to the development of aggressive behavior in the classroom.

242 citations


Journal ArticleDOI
TL;DR: The proposed genetic-based expectation-maximization (GA-EM) algorithm for learning Gaussian mixture models from multivariate data is elitist which maintains the monotonic convergence property of the EM algorithm.
Abstract: We propose a genetic-based expectation-maximization (GA-EM) algorithm for learning Gaussian mixture models from multivariate data. This algorithm is capable of selecting the number of components of the model using the minimum description length (MDL) criterion. Our approach benefits from the properties of genetic algorithms (GA) and the EM algorithm by combination of both into a single procedure. The population-based stochastic search of the GA explores the search space more thoroughly than the EM method. Therefore, our algorithm enables escaping from local optimal solutions since the algorithm becomes less sensitive to its initialization. The GA-EM algorithm is elitist which maintains the monotonic convergence property of the EM algorithm. The experiments on simulated and real data show that the GA-EM outperforms the EM method since: (1) we have obtained a better MDL score while using exactly the same termination condition for both algorithms; (2) our approach identifies the number of components which were used to generate the underlying data more often than the EM algorithm.

239 citations


Journal ArticleDOI
TL;DR: A new methodology for the M-step of the EM algorithm that is based on a novel constrained optimization formulation that shows superior performance in terms of the attained maximum value of the objective function and segmentation accuracy compared to previous implementations of this approach.
Abstract: Gaussian mixture models (GMMs) constitute a well-known type of probabilistic neural networks. One of their many successful applications is in image segmentation, where spatially constrained mixture models have been trained using the expectation-maximization (EM) framework. In this letter, we elaborate on this method and propose a new methodology for the M-step of the EM algorithm that is based on a novel constrained optimization formulation. Numerical experiments using simulated images illustrate the superior performance of our method in terms of the attained maximum value of the objective function and segmentation accuracy compared to previous implementations of this approach.

219 citations


Journal ArticleDOI
TL;DR: This paper advocates the use of graph-based probability models and their associated inference and learning algorithms, and describes how each technique can be applied in a vision model of multiple, occluding objects and contrast the behaviors and performances of the techniques using a unifying cost function, free energy.
Abstract: Research into methods for reasoning under uncertainty is currently one of the most exciting areas of artificial intelligence, largely because it has recently become possible to record, store, and process large amounts of data. While impressive achievements have been made in pattern classification problems such as handwritten character recognition, face detection, speaker identification, and prediction of gene function, it is even more exciting that researchers are on the verge of introducing systems that can perform large-scale combinatorial analyses of data, decomposing the data into interacting components. For example, computational methods for automatic scene analysis are now emerging in the computer vision community. These methods decompose an input image into its constituent objects, lighting conditions, motion patterns, etc. Two of the main challenges are finding effective representations and models in specific applications and finding efficient algorithms for inference and learning in these models. In this paper, we advocate the use of graph-based probability models and their associated inference and learning algorithms. We review exact techniques and various approximate, computationally efficient techniques, including iterated conditional modes, the expectation maximization (EM) algorithm, Gibbs sampling, the mean field method, variational techniques, structured variational techniques and the sum-product algorithm ("loopy" belief propagation). We describe how each technique can be applied in a vision model of multiple, occluding objects and contrast the behaviors and performances of the techniques using a unifying cost function, free energy.

205 citations


Journal ArticleDOI
TL;DR: In this paper, a didactic discussion of covariance structure modeling in longitudinal studies with missing data is presented, and use of the full-information maximum likelihood method is considered for model fitting, parameter estimation, and hypothesis testing purposes, particularly when interested in patterns of temporal change and its covariates and predictors.
Abstract: A didactic discussion of covariance structure modeling in longitudinal studies with missing data is presented. Use of the full-information maximum likelihood method is considered for model fitting, parameter estimation, and hypothesis testing purposes, particularly when interested in patterns of temporal change as well as its covariates and predictors. The approach is illustrated with an application of the popular level-and-shape model to data from a cognitive intervention study of elderly adults.

Journal ArticleDOI
TL;DR: An algorithm is described that capitalizes on using tools from item response theory for scale linking, item fit, and parameter estimation that applies to polytomous response variables as well as to skills with two or more proficiency levels.
Abstract: Probabilistic models with more than one latent variable are designed to report profiles of skills or cognitive attributes. Testing programs want to offer additional information beyond what a single test score can provide using these skill profiles. Many recent approaches to skill profile models are limited to dichotomous data and have made use of computationally intensive estimation methods like the Markov chain Monte Carlo (MCMC), since standard maximum likelihood (ML) estimation techniques were deemed infeasible. This paper presents a class of general diagnostic models (GDMs) that can be estimated with customary ML techniques and applies to polytomous response variables as well as to skills with two or more proficiency levels. The model and the algorithm for estimating model parameters handles directly missing responses without the need of collapsing categories or recoding the data. Within the class of GDMs, compensatory as well as noncompensatory models may be specified. This report uses one member of this class of diagnostic models, a compensatory diagnostic model that is parameterized similar to the generalized partial credit model (GPCM). Many well-known models, such as uni- and multivariate versions of the Rasch model and the two parameter logistic item response theory (2PL-IRT) model, the GPCM, and the FACETS model, as well as a variety of skill profile models, are special cases of this member of the class of GDMs. This paper describes an algorithm that capitalizes on using tools from item response theory for scale linking, item fit, and parameter estimation. In addition to an introduction to the class of GDMs and to the partial credit instance of this class for dichotomous and polytomous skill profiles, this paper presents a parameter recovery study using simulated data and an application to real data from the field test for TOEFL® Internet-based testing (iBT).

Journal ArticleDOI
TL;DR: Adapt G-H quadrature, combined with mean and covariance adjustments at each iteration of an EM algorithm, produces an accurate fast-converging solution with as few as two points per dimension.
Abstract: Although the Bock–Aitkin likelihood-based estimation method for factor analysis of dichotomous item response data has important advantages over classical analysis of item tetrachoric correlations, a serious limitation of the method is its reliance on fixed-point Gauss-Hermite (G-H) quadrature in the solution of the likelihood equations and likelihood-ratio tests When the number of latent dimensions is large, computational considerations require that the number of quadrature points per dimension be few But with large numbers of items, the dispersion of the likelihood, given the response pattern, becomes so small that the likelihood cannot be accurately evaluated with the sparse fixed points in the latent space In this paper, we demonstrate that substantial improvement in accuracy can be obtained by adapting the quadrature points to the location and dispersion of the likelihood surfaces corresponding to each distinct pattern in the data In particular, we show that adaptive G-H quadrature, combined with mean and covariance adjustments at each iteration of an EM algorithm, produces an accurate fast-converging solution with as few as two points per dimension Evaluations of this method with simulated data are shown to yield accurate recovery of the generating factor loadings for models of upto eight dimensions Unlike an earlier application of adaptive Gibbs sampling to this problem by Meng and Schilling, the simulations also confirm the validity of the present method in calculating likelihood-ratio chi-square statistics for determining the number of factors required in the model Finally, we apply the method to a sample of real data from a test of teacher qualifications

Journal Article
TL;DR: This work studies EM variants in which the E-step is not performed exactly, either to obtain improved rates of convergence, or due to approximations needed to compute statistics under a model family over which E-steps cannot be realized.
Abstract: The EM algorithm is widely used to develop iterative parameter estimation procedures for statistical models. In cases where these procedures strictly follow the EM formulation, the convergence properties of the estimation procedures are well understood. In some instances there are practical reasons to develop procedures that do not strictly fall within the EM framework. We study EM variants in which the E-step is not performed exactly, either to obtain improved rates of convergence, or due to approximations needed to compute statistics under a model family over which E-steps cannot be realized. Since these variants are not EM procedures, the standard (G)EM convergence results do not apply to them. We present an information geometric framework for describing such algorithms and analyzing their convergence properties. We apply this framework to analyze the convergence properties of incremental EM and variational EM. For incremental EM, we discuss conditions under these algorithms converge in likelihood. For variational EM, we show how the E-step approximation prevents convergence to local maxima in likelihood.

Journal ArticleDOI
TL;DR: In this article, a new algorithm for solving a score equation for the maximum likelihood estimate in certain problems of practical interest is presented and examined, and convergence properties of this iterative (fixed-point) algorithm are derived for estimators obtained using only a finite number of iterations.
Abstract: This article presents and examines a new algorithm for solving a score equation for the maximum likelihood estimate in certain problems of practical interest. The method circumvents the need to compute second-order derivatives of the full likelihood function. It exploits the structure of certain models that yield a natural decomposition of a very complicated likelihood function. In this decomposition, the first part is a log-likelihood from a simply analyzed model, and the second part is used to update estimates from the first part. Convergence properties of this iterative (fixed-point) algorithm are examined, and asymptotics are derived for estimators obtained using only a finite number of iterations. Illustrative examples considered in the article include multivariate Gaussian copula models, nonnormal random-effects models, generalized linear mixed models, and state-space models. Properties of the algorithm and of estimators are evaluated in simulation studies on a bivariate copula model and a nonnormal...

Journal ArticleDOI
TL;DR: In this article, various statistical properties and reliability aspects of a two-parameter distribution with decreasing and increasing failure rate are explored; the model includes the exponential-geometric distribution, and expressions for their asymptotic variances and covariances are derived.

Journal ArticleDOI
TL;DR: A general likelihood‐based approach to inferring haplotypes‐disease associations in studies of unrelated individuals and an application to the Carolina Breast Cancer Study reveals significant haplotype effects and haplotype‐smoking interactions in the development of breast cancer.
Abstract: The associations between haplotypes and disease phenotypes offer valuable clues about the genetic determinants of complex diseases. It is highly challenging to make statistical inferences about these associations because of the unknown gametic phase in genotype data. We describe a general likelihood-based approach to inferring haplotype-disease associations in studies of unrelated individuals. We consider all possible phenotypes (including disease indicator, quantitative trait, and potentially censored age at onset of disease) and all commonly used study designs (including cross-sectional, case-control, cohort, nested case-control, and case-cohort). The effects of haplotypes on phenotype are characterized by appropriate regression models, which allow various genetic mechanisms and gene-environment interactions. We present the likelihood functions for all study designs and disease phenotypes under Hardy-Weinberg disequilibrium. The corresponding maximum likelihood estimators are approximately unbiased, normally distributed, and statistically efficient. We provide simple and efficient numerical algorithms to calculate the maximum likelihood estimators and their variances, and implement these algorithms in a freely available computer program. Extensive simulation studies demonstrate that the proposed methods perform well in realistic situations. An application to the Carolina Breast Cancer Study reveals significant haplotype effects and haplotype-smoking interactions in the development of breast cancer.

Journal ArticleDOI
TL;DR: In this article, the authors presented algebraic algorithms for computing all critical points of the likelihood function with the aim of identifying the local maxima in the probability simplex, and the maximum likelihood degree of a generic complete intersection.
Abstract: Given a model in algebraic statistics and data, the likelihood function is a rational function on a projective variety. Algebraic algorithms are presented for computing all critical points of this function, with the aim of identifying the local maxima in the probability simplex. Applications include models specified by rank conditions on matrices and the Jukes–Cantor models of phylogenetics. The maximum likelihood degree of a generic complete intersection is also determined.

01 Jan 2005
TL;DR: It is shown that under general, simple, veriable conditions, any EM sequence is convergent, if the maximizer at the M-step is unique; this condition is almost always satis- ed in practice.
Abstract: It is well known that the likelihood sequence of the EM algorithm is non- decreasing and convergent (Dempster, Laird and Rubin (1977)), and that the limit points of the EM algorithm are stationary points of the likelihood (Wu (1982)), but the issue of the convergence of the EM sequence itself has not been completely settled. In this paper we close this gap and show that under general, simple, veriable conditions, any EM sequence is convergent. In pathological cases we show that the sequence is cycling in the limit among a nite number of stationary points with equal likelihood. The results apply equally to the optimization transfer class of algorithms (MM algorithm) of Lange, Hunter, and Yang (2000). Two dieren t EM algorithms constructed on the same dataset illustrate the convergence and the cyclic behavior. This paper contains new results concerning the convergence of the EM al- gorithm. The EM algorithm was brought into the limelight by Dempster, Laird and Rubin (1977) as a general iterative method of computing the maximum likelihood estimator by maximizing a simpler likelihood on an augmented data space. However, the problem of the convergence of the algorithm has not been satisfactory resolved. Wu (1983), the main theoretical contribution in this area, showed that the limit points of the EM algorithm are stationary points of the likelihood, and that when the likelihood is unimodal, any EM sequence is con- vergent. Boyles (1983) has a number of results along similar lines. These results still allow the possibility of a non-convergent EM sequence when the likelihood is not unimodal. More importantly, the EM algorithm is useful when the likelihood is hard to obtain directly; for these cases, the unimodality of the likelihood is very dicult to verify. Here we give simple, general, veriable conditions for con- vergence: our main result (Theorem 3) is that any EM sequence is convergent, if the maximizer at the M-step is unique. This condition is almost always satis- ed in practice (otherwise the particular EM data augmentation scheme would

Journal ArticleDOI
TL;DR: This work presents the first algorithm that provably learns the component Gaussians in time that is polynomial in the dimension, and formalizes the more general problem of max-likelihood fit of a Gaussian mixture to unstructured data.
Abstract: Mixtures of Gaussian (or normal) distributions arise in a variety of application areas. Many heuristics have been proposed for the task of finding the component Gaussians given samples from the mixture, such as the EM algorithm, a local-search heuristic from Dempster, Laird and Rubin [J. Roy. Statist. Soc. Ser. B 39 (1977) 1–38]. These do not provably run in polynomial time. We present the first algorithm that provably learns the component Gaussians in time that is polynomial in the dimension. The Gaussians may have arbitrary shape, but they must satisfy a “separation condition” which places a lower bound on the distance between the centers of any two component Gaussians. The mathematical results at the heart of our proof are “distance concentration” results—proved using isoperimetric inequalities—which establish bounds on the probability distribution of the distance between a pair of points generated according to the mixture. We also formalize the more general problem of max-likelihood fit of a Gaussian mixture to unstructured data.

Journal Article
TL;DR: This work builds on the information bottleneck framework of Tishby et al. (1999) and constructs a learning algorithm that combines an information-theoretic smoothing term with a continuation procedure that bypasses local maxima and achieves superior solutions.
Abstract: A central challenge in learning probabilistic graphical models is dealing with domains that involve hidden variables. The common approach for learning model parameters in such domains is the expectation maximization (EM) algorithm. This algorithm, however, can easily get trapped in sub-optimal local maxima. Learning the model structure is even more challenging. The structural EM algorithm can adapt the structure in the presence of hidden variables, but usually performs poorly without prior knowledge about the cardinality and location of the hidden variables. In this work, we present a general approach for learning Bayesian networks with hidden variables that overcomes these problems. The approach builds on the information bottleneck framework of Tishby et al. (1999). We start by proving formal correspondence between the information bottleneck objective and the standard parametric EM functional. We then use this correspondence to construct a learning algorithm that combines an information-theoretic smoothing term with a continuation procedure. Intuitively, the algorithm bypasses local maxima and achieves superior solutions by following a continuous path from a solution of, an easy and smooth, target function, to a solution of the desired likelihood function. As we show, our algorithmic framework allows learning of the parameters as well as the structure of a network. In addition, it also allows us to introduce new hidden variables during model selection and learn their cardinality. We demonstrate the performance of our procedure on several challenging real-life data sets.

Journal ArticleDOI
TL;DR: In this article, an R package called bivpois is presented for maximum likelihood estimation of the parameters of bivariate and diagonal inflated bivariate Poisson regression models, and an Expectation-Maximization (EM) algorithm is implemented.
Abstract: In this paper we present an R package called bivpois for maximum likelihood estimation of the parameters of bivariate and diagonal inflated bivariate Poisson regression models. An Expectation-Maximization (EM) algorithm is implemented. Inflated models allow for modelling both over-dispersion (or under-dispersion) and negative correlation and thus they are appropriate for a wide range of applications. Extensions of the algorithms for several other models are also discussed. Detailed guidance and implementation on simulated and real data sets using bivpois package is provided.

Journal ArticleDOI
TL;DR: This paper considers the block clustering problem under the maximum likelihood approach and proposes a generalized EM algorithm to estimate the parameters of the block mixture model and studies the case of binary data by using a Bernoulli block mixture.
Abstract: Although many clustering procedures aim to construct an optimal partition of objects or, sometimes, of variables, there are other methods, called block clustering methods, which consider simultaneously the two sets and organize the data into homogeneous blocks. Recently, we have proposed a new mixture model called block mixture model which takes into account this situation. This model allows one to embed simultaneous clustering of objects and variables in a mixture approach. We have studied this probabilistic model under the classification likelihood approach and developed a new algorithm for simultaneous partitioning based on the classification EM algorithm. In this paper, we consider the block clustering problem under the maximum likelihood approach and the goal of our contribution is to estimate the parameters of this model. Unfortunately, the application of the EM algorithm for the block mixture model cannot be made directly; difficulties arise due to the dependence structure in the model and approximations are required. Using a variational approximation, we propose a generalized EM algorithm to estimate the parameters of the block mixture model and, to illustrate our approach, we study the case of binary data by using a Bernoulli block mixture.

Journal ArticleDOI
TL;DR: A stochastic version of RPCL and its type A variant, respectively, are proposed, in which the difficult selection problem of delearning rate has been novelly circumvented.
Abstract: Expectation-maximization (EM) algorithm (A.P. Dempster et al., 1977) has been extensively used in density mixture clustering problems, but it is unable to-perform model selection automatically. This paper, therefore, proposes to learn the model parameters via maximizing a weighted likelihood. Under a specific weight design, we give out a rival penalized expectation-maximization (RPEM) algorithm, which makes the components in a density mixture compete each other at each time step. Not only are the associated parameters of the winner updated to adapt to an input, but also all rivals' parameters are penalized with the strength proportional to the corresponding posterior density probabilities. Compared to the EM algorithm (A.P. Dempster et al., 1977), the RPEM is able to fade out the redundant densities from a density mixture during the learning process. Hence, it can automatically select an appropriate number of densities in density mixture clustering. We experimentally demonstrate its outstanding performance on Gaussian mixtures and color image segmentation problem. Moreover, a simplified version of RPEM generalizes our recently proposed RPCCL algorithm (Y.M. Cheung, 2002) so that it is applicable to elliptical clusters as well with any input proportion. Compared to the existing heuristic RPCL (L. Xu et al., 1993) and its variants, this generalized RPCCL (G-RPCCL) circumvents the difficult preselection of the so-called delearning rate. Additionally, a special setting of the G-RPCCL not only degenerates to RPCL and its Type A variant, but also gives a guidance to choose an appropriate delearning rate for them. Subsequently, we propose a stochastic version of RPCL and its type A variant, respectively, in which the difficult selection problem of delearning rate has been novelly circumvented. The experiments show the promising results of this stochastic implementation.

Proceedings ArticleDOI
07 Aug 2005
TL;DR: Two algorithms can learn models for systems without requiring a reset action as was needed by the previously available general PSR-model learning algorithm: a Monte Carlo algorithm and a temporal difference algorithm.
Abstract: Predictive state representations (PSRs) are a recently-developed way to model discrete-time, controlled dynamical systems. We present and describe two algorithms for learning a PSR model: a Monte Carlo algorithm and a temporal difference (TD) algorithm. Both of these algorithms can learn models for systems without requiring a reset action as was needed by the previously available general PSR-model learning algorithm. We present empirical results that compare our two algorithms and also compare their performance with that of existing algorithms, including an EM algorithm for learning POMDP models.

Journal ArticleDOI
TL;DR: In this article, an item response model that incorporates response time was proposed and a parameter estimation procedure using the EM algorithm was developed, evaluated with both real and simulated test data and the results suggest that the estimation procedure works well in estimating model parameters.
Abstract: This article proposes an item response model that incorporates response time. A parameter estimation procedure using the EM algorithm is developed. The procedure is evaluated with both real and simulated test data. The results suggest that the estimation procedure works well in estimating model parameters. By using response time data, estimation of person ability parameters can be improved. Potential applications of this model are discussed. Directions for further study are suggested.

Proceedings ArticleDOI
04 Sep 2005
TL;DR: An unsupervised dynamic language model (LM) adaptation framework using long-distance latent topic mixtures and the LDA model is combined with the trigram language model using linear interpolation to reduce the perplexity and character error rate.
Abstract: We propose an unsupervised dynamic language model (LM) adaptation framework using long-distance latent topic mixtures. The framework employs the Latent Dirichlet Allocation model (LDA) which models the latent topics of a document collection in an unsupervised and Bayesian fashion. In the LDA model, each word is modeled as a mixture of latent topics. Varying topics within a context can be modeled by re-sampling the mixture weights of the latent topics from a prior Dirichlet distribution. The model can be trained using the variational Bayes Expectation Maximization algorithm. During decoding, mixture weights of the latent topics are adapted dynamically using the hypotheses of previously decoded utterances. In our work, the LDA model is combined with the trigram language model using linear interpolation. We evaluated the approach on the CCTV episode of the RT04 Mandarin Broadcast News test set. Results show that the proposed approach reduces the perplexity by up to 15.4% relative and the character error rate by 4.9% relative depending on the size and setup of the training set.

Journal ArticleDOI
TL;DR: An expectation-maximization (EM) algorithm that yields topology preserving maps of data based on probabilistic mixture models that allows principled handling of missing data and learning of mixtures of SOMs.

Journal ArticleDOI
TL;DR: A new gridding algorithm is proposed for determining the individual spots and their borders of the Gaussian mixture model (GMM) and the main advantages of the proposed methodology are modeling flexibility and adaptability to the data, which are well-known strengths of GMM.
Abstract: In this paper, we propose a new methodology for analysis of microarray images. First, a new gridding algorithm is proposed for determining the individual spots and their borders. Then, a Gaussian mixture model (GMM) approach is presented for the analysis of the individual spot images. The main advantages of the proposed methodology are modeling flexibility and adaptability to the data, which are well-known strengths of GMM. The maximum likelihood and maximum a posteriori approaches are used to estimate the GMM parameters via the expectation maximization algorithm. The proposed approach has the ability to detect and compensate for artifacts that might occur in microarray images. This is accomplished by a model-based criterion that selects the number of the mixture components. We present numerical experiments with artificial and real data where we compare the proposed approach with previous ones and existing software tools for microarray image analysis and demonstrate its advantages.

01 Jan 2005
TL;DR: This dissertation describes a way to stably calibrate GH distributions for a wider range of parameters than has previously been reported and develops a version of the EM algorithm for calibrating GH distributions, which enables for the first time certain GH distributions to be used in modeling contexts when previously they have been numerically intractable.
Abstract: The distributions of many financial quantities are well-known to have heavy tails, exhibit skewness, and have other non-Gaussian characteristics. In this dissertation we study an especially promising family: the multivariate generalized hyperbolic distributions (GH). This family includes and generalizes the familiar Gaussian and Student t distributions, and the so-called skewed t distributions, among many others. The primary obstacle to the applications of such distributions is the numerical difficulty of calibrating the distributional parameters to the data. In this dissertation we describe a way to stably calibrate GH distributions for a wider range of parameters than has previously been reported. In particular, we develop a version of the EM algorithm for calibrating GH distributions. This is a modification of methods proposed in McNeil, Frey, and Embrechts (2005), and generalizes the algorithm of Protassov (2004). Our algorithm extends the stability of the calibration procedure to a wide range of parameters, now including parameter values that maximize log-likelihood for our real market data sets. This allows for the first time certain GH distributions to be used in modeling contexts when previously they have been numerically intractable. Our algorithm enables us to make new uses of GH distributions in three financial applications. First, we forecast univariate Value-at-Risk (VaR) for stock index returns, and we show in out-of-sample backtesting that the GH distributions outperform the Gaussian distribution. Second, we calculate an efficient frontier for equity portfolio optimization under the skewed-t distribution and using Expected Shortfall as the risk measure. Here, we show that the Gaussian efficient frontier is actually unreachable if returns are skewed t distributed. Third, we build an intensity-based model to price Basket Credit Default Swaps by calibrating the skewed t distribution directly, without the need to separately calibrate the skewed t copula. To our knowledge this is the first use of the skewed t distribution in portfolio optimization and in portfolio credit risk.