scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Computation in 2011"


Posted Content
TL;DR: The No-U-Turn Sampler (NUTS) as discussed by the authors is an extension to HMC that eliminates the need to set a number of steps L. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps.
Abstract: Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior and sensitivity to correlated parameters that plague many MCMC methods by taking a series of steps informed by first-order gradient information. These features allow it to converge to high-dimensional target distributions much more quickly than simpler methods such as random walk Metropolis or Gibbs sampling. However, HMC's performance is highly sensitive to two user-specified parameters: a step size {\epsilon} and a desired number of steps L. In particular, if L is too small then the algorithm exhibits undesirable random walk behavior, while if L is too large the algorithm wastes computation. We introduce the No-U-Turn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps L. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps. Empirically, NUTS perform at least as efficiently as and sometimes more efficiently than a well tuned standard HMC method, without requiring user intervention or costly tuning runs. We also derive a method for adapting the step size parameter {\epsilon} on the fly based on primal-dual averaging. NUTS can thus be used with no hand-tuning at all. NUTS is also suitable for applications such as BUGS-style automatic inference engines that require efficient "turnkey" sampling algorithms.

1,050 citations


Posted Content
TL;DR: The SMC2 algorithm is proposed, a sequential Monte Carlo algorithm, defined in the θ‐dimension, which propagates and resamples many particle filters in the x‐ dimension, which explores the applicability of the algorithm in both sequential and non‐sequential applications and considers various degrees of freedom.
Abstract: We consider the generic problem of performing sequential Bayesian inference in a state-space model with observation process y, state process x and fixed parameter theta. An idealized approach would be to apply the iterated batch importance sampling (IBIS) algorithm of Chopin (2002). This is a sequential Monte Carlo algorithm in the theta-dimension, that samples values of theta, reweights iteratively these values using the likelihood increments p(y_t|y_1:t-1, theta), and rejuvenates the theta-particles through a resampling step and a MCMC update step. In state-space models these likelihood increments are intractable in most cases, but they may be unbiasedly estimated by a particle filter in the x-dimension, for any fixed theta. This motivates the SMC^2 algorithm proposed in this article: a sequential Monte Carlo algorithm, defined in the theta-dimension, which propagates and resamples many particle filters in the x-dimension. The filters in the x-dimension are an example of the random weight particle filter as in Fearnhead et al. (2010). On the other hand, the particle Markov chain Monte Carlo (PMCMC) framework developed in Andrieu et al. (2010) allows us to design appropriate MCMC rejuvenation steps. Thus, the theta-particles target the correct posterior distribution at each iteration t, despite the intractability of the likelihood increments. We explore the applicability of our algorithm in both sequential and non-sequential applications and consider various degrees of freedom, as for example increasing dynamically the number of x-particles. We contrast our approach to various competing methods, both conceptually and empirically through a detailed simulation study, included here and in a supplement, and based on particularly challenging examples.

258 citations


Posted Content
TL;DR: This paper focuses on enhancements to Subset Simulation (SS), proposed by Au and Beck, which provides an efficient algorithm based on MCMC (Markov chain Monte Carlo) simulation for computing small failure probabilities for general high-dimensional reliability problems.
Abstract: Estimation of small failure probabilities is one of the most important and challenging computational problems in reliability engineering. The failure probability is usually given by an integral over a high-dimensional uncertain parameter space that is difficult to evaluate numerically. This paper focuses on enhancements to Subset Simulation (SS), proposed by Au and Beck, which provides an efficient algorithm based on MCMC (Markov chain Monte Carlo) simulation for computing small failure probabilities for general high-dimensional reliability problems. First, we analyze the Modified Metropolis algorithm (MMA), an MCMC technique, which is used in SS for sampling from high-dimensional conditional distributions. We present some observations on the optimal scaling of MMA, and develop an optimal scaling strategy for this algorithm when it is employed within SS. Next, we provide a theoretical basis for the optimal value of the conditional failure probability $p_0$, an important parameter one has to choose when using SS. Finally, a Bayesian post-processor SS+ for the original SS method is developed where the uncertain failure probability that one is estimating is modeled as a stochastic variable whose possible values belong to the unit interval. Simulated samples from SS are viewed as informative data relevant to the system's reliability. Instead of a single real number as an estimate, SS+ produces the posterior PDF of the failure probability, which takes into account both prior information and the information in the sampled data. This PDF quantifies the uncertainty in the value of the failure probability and it may be further used in risk analyses to incorporate this uncertainty. The relationship between the original SS and SS+ is also discussed

150 citations


Posted Content
TL;DR: In this article, the authors investigated the stability of a sequential Monte Carlo (SMC) method applied to the problem of sampling from a target distribution on arbitrary dimensions, and showed that in high dimensions, resampling leads to a reduction in the Monte Carlo error and an increase in the effective sample size.
Abstract: We investigate the stability of a Sequential Monte Carlo (SMC) method applied to the problem of sampling from a target distribution on $\mathbb{R}^d$ for large $d$. It is well known that using a single importance sampling step one produces an approximation for the target that deteriorates as the dimension $d$ increases, unless the number of Monte Carlo samples $N$ increases at an exponential rate in $d$. We show that this degeneracy can be avoided by introducing a sequence of artificial targets, starting from a `simple' density and moving to the one of interest, using an SMC method to sample from the sequence. Using this class of SMC methods with a fixed number of samples, one can produce an approximation for which the effective sample size (ESS) converges to a random variable $\varepsilon_N$ as $d\rightarrow\infty$ with $1<\varepsilon_{N}

138 citations


Journal ArticleDOI
TL;DR: An off-line handwritten alphabetical character recognition system using multilayer feed forward neural network that will be suitable for converting handwritten documents into structural text form and recognizing handwritten names is described in the paper.
Abstract: An off-line handwritten alphabetical character recognition system using multilayer feed forward neural network is described in the paper. A new method, called, diagonal based feature extraction is introduced for extracting the features of the handwritten alphabets. Fifty data sets, each containing 26 alphabets written by various people, are used for training the neural network and 570 different handwritten alphabetical characters are used for testing. The proposed recognition system performs quite well yielding higher levels of recognition accuracy compared to the systems employing the conventional horizontal and vertical methods of feature extraction. This system will be suitable for converting handwritten documents into structural text form and recognizing handwritten names.

135 citations


Posted Content
TL;DR: The EP-ABC algorithm is introduced, which is an adaptation to the likelihood-free context of the variational approximation algorithm known as expectation propagation that is faster by a few orders of magnitude than standard algorithms, while producing an overall approximation error that is typically negligible.
Abstract: Many models of interest in the natural and social sciences have no closed-form likelihood function, which means that they cannot be treated using the usual techniques of statistical inference. In the case where such models can be efficiently simulated, Bayesian inference is still possible thanks to the Approximate Bayesian Computation (ABC) algorithm. Although many refinements have been suggested, ABC inference is still far from routine. ABC is often excruciatingly slow due to very low acceptance rates. In addition, ABC requires introducing a vector of "summary statistics", the choice of which is relatively arbitrary, and often require some trial and error, making the whole process quite laborious for the user. We introduce in this work the EP-ABC algorithm, which is an adaptation to the likelihood-free context of the variational approximation algorithm known as Expectation Propagation (Minka, 2001). The main advantage of EP-ABC is that it is faster by a few orders of magnitude than standard algorithms, while producing an overall approximation error which is typically negligible. A second advantage of EP-ABC is that it replaces the usual global ABC constraint on the vector of summary statistics computed on the whole dataset, by n local constraints of the form that apply separately to each data-point. As a consequence, it is often possible to do away with summary statistics entirely. In that case, EP-ABC approximates directly the evidence (marginal likelihood) of the model. Comparisons are performed in three real-world applications which are typical of likelihood-free inference, including one application in neuroscience which is novel, and possibly too challenging for standard ABC techniques.

72 citations


Journal ArticleDOI
TL;DR: In this article, it was shown that the asymptotic variance associated with a particle filter approximation of the prediction filter is bounded uniformly in time, and the nonasymptotic, relative variance of a particle approximation of a normalizing constant is bounded linearly in time.
Abstract: Under multiplicative drift and other regularity conditions, it is established that the asymptotic variance associated with a particle filter approximation of the prediction filter is bounded uniformly in time, and the nonasymptotic, relative variance associated with a particle approximation of the normalizing constant is bounded linearly in time. The conditions are demonstrated to hold for some hidden Markov models on noncompact state spaces. The particle stability results are obtained by proving $v$-norm multiplicative stability and exponential moment results for the underlying Feynman-Kac formulas.

65 citations


Posted Content
TL;DR: In this article, the authors show how the Hamiltonian Monte Carlo algorithm can sometimes be speeded up by "splitting" the Hamiltonians in a way that allows much of the movement around the state space to be done at low computational cost.
Abstract: We show how the Hamiltonian Monte Carlo algorithm can sometimes be speeded up by "splitting" the Hamiltonian in a way that allows much of the movement around the state space to be done at low computational cost. One context where this is possible is when the log density of the distribution of interest (the potential energy function) can be written as the log of a Gaussian density, which is a quadratic function, plus a slowly varying function. Hamiltonian dynamics for quadratic energy functions can be analytically solved. With the splitting technique, only the slowly-varying part of the energy needs to be handled numerically, and this can be done with a larger stepsize (and hence fewer steps) than would be necessary with a direct simulation of the dynamics. Another context where splitting helps is when the most important terms of the potential energy function and its gradient can be evaluated quickly, with only a slowly-varying part requiring costly computations. With splitting, the quick portion can be handled with a small stepsize, while the costly portion uses a larger stepsize. We show that both of these splitting approaches can reduce the computational cost of sampling from the posterior distribution for a logistic regression model, using either a Gaussian approximation centered on the posterior mode, or a Hamiltonian split into a term that depends on only a small number of critical cases, and another term that involves the larger number of cases whose influence on the posterior distribution is small. Supplemental materials for this paper are available online.

59 citations


Posted Content
TL;DR: This paper proposes consistent and asymptotically Gaussian estimators for the parameters $$\lambda, \sigma $$ and $$H$$ of the discretely observed fractional Ornstein–Uhlenbeck process solution of the stochastic differential equation.
Abstract: This paper proposes consistent and asymptotically Gaussian estimators for the drift, the diffusion coefficient and the Hurst exponent of the discretely observed fractional Ornstein-Uhlenbeck process. For the estimation of the drift, the results are obtained only in the case when 1/2 < H < 3/4. This paper also provides ready-to-use software for the R statistical environment based on the YUIMA package.

54 citations


Posted Content
TL;DR: In this paper, the likelihood of a log-Gaussian Cox process is approximated directly by making use of a continuously specified Gaussian random field prior distribution, and the approximation can converge with arbitrarily high order, while an approximation based on a counting process on a partition of the domain only achieves first order convergence.
Abstract: This paper introduces a new method for performing computational inference on log-Gaussian Cox processes. The likelihood is approximated directly by making novel use of a continuously specified Gaussian random field. We show that for sufficiently smooth Gaussian random field prior distributions, the approximation can converge with arbitrarily high order, while an approximation based on a counting process on a partition of the domain only achieves first-order convergence. The given results improve on the general theory of convergence of the stochastic partial differential equation models, introduced by Lindgren et al. (2011). The new method is demonstrated on a standard point pattern data set and two interesting extensions to the classical log-Gaussian Cox process framework are discussed. The first extension considers variable sampling effort throughout the observation window and implements the method of Chakraborty et al. (2011). The second extension constructs a log-Gaussian Cox process on the world's oceans. The analysis is performed using integrated nested Laplace approximation for fast approximate inference.

45 citations


Posted Content
TL;DR: It is shown that ensemble MCMC for Gaussian process regression models can indeed substantially improve sampling performance and its relationship to the "multiple-try Metropolis" method of Liu, Liang, and Wong and the "multiset sampler" of Leman, Chen, and Lavine is discussed.
Abstract: I introduce a Markov chain Monte Carlo (MCMC) scheme in which sampling from a distribution with density pi(x) is done using updates operating on an "ensemble" of states. The current state x is first stochastically mapped to an ensemble, x^{(1)},...,x^{(K)}. This ensemble is then updated using MCMC updates that leave invariant a suitable ensemble density, rho(x^{(1)},...,x^{(K)}), defined in terms of pi(x^{(i)}) for i=1,...,K. Finally a single state is stochastically selected from the ensemble after these updates. Such ensemble MCMC updates can be useful when characteristics of pi and the ensemble permit pi(x^{(i)}) for all i in {1,...,K}, to be computed in less than K times the amount of computation time needed to compute pi(x) for a single x. One common situation of this type is when changes to some "fast" variables allow for quick re-computation of the density, whereas changes to other "slow" variables do not. Gaussian process regression models are an example of this sort of problem, with an overall scaling factor for covariances and the noise variance being fast variables. I show that ensemble MCMC for Gaussian process regression models can indeed substantially improve sampling performance. Finally, I discuss other possible applications of ensemble MCMC, and its relationship to the "multiple-try Metropolis" method of Liu, Liang, and Wong and the "multiset sampler" of Leman, Chen, and Lavine.

Posted Content
TL;DR: This work develops a fully Bayesian approach to fitting single-index models in conditional quantile regression with a Gaussian process prior for the unknown nonparametric link function and a Laplace distribution on the index vector, with the latter motivated by the recent popularity of the Bayesian lasso idea.
Abstract: Using an asymmetric Laplace distribution, which provides a mechanism for Bayesian inference of quantile regression models, we develop a fully Bayesian approach to fitting single-index models in conditional quantile regression. In this work, we use a Gaussian process prior for the unknown nonparametric link function and a Laplace distribution on the index vector, with the latter motivated by the recent popularity of the Bayesian lasso idea. We design a Markov chain Monte Carlo algorithm for posterior inference. Careful consideration of the singularity of the kernel matrix, and tractability of some of the full conditional distributions leads to a partially collapsed approach where the nonparametric link function is integrated out in some of the sampling steps. Our simulations demonstrate the superior performance of the Bayesian method versus the frequentist approach. The method is further illustrated by an application to the hurricane data.

Posted Content
TL;DR: This work utilises matrix functions, Krylov subspaces, and probing vectors to construct an iterative numerical method for computing the log-likelihood of high dimensional Gaussian models.
Abstract: In order to compute the log-likelihood for high dimensional spatial Gaussian models, it is necessary to compute the determinant of the large, sparse, symmetric positive definite precision matrix, Q. Traditional methods for evaluating the log-likelihood for very large models may fail due to the massive memory requirements. We present a novel approach for evaluating such likelihoods when the matrix-vector product, Qv, is fast to compute. In this approach we utilise matrix functions, Krylov subspaces, and probing vectors to construct an iterative method for computing the log-likelihood.

Posted Content
TL;DR: It is demonstrated that approximate Bayesian computing can provide estimates with a lower mean square error than the composite likelihood approach, though at an appreciably higher computational cost when estimating the spatial dependence of extremes.
Abstract: Statistical analysis of max-stable processes used to model spatial extremes has been limited by the difficulty in calculating the joint likelihood function. This precludes all standard likelihood-based approaches, including Bayesian approaches. In this paper we present a Bayesian approach through the use of approximate Bayesian computing. This circumvents the need for a joint likelihood function by instead relying on simulations from the (unavailable) likelihood. This method is compared with an alternative approach based on the composite likelihood. We demonstrate that approximate Bayesian computing can result in a lower mean square error than the composite likelihood approach when estimating the spatial dependence of extremes, though at an appreciably higher computational cost. We also illustrate the performance of the method with an application to US temperature data to estimate the risk of crop loss due to an unlikely freeze event.

Book ChapterDOI
TL;DR: A tutorial chapter on the Online EM algorithm to appear in the volume 'Mixtures' edited by Kerrie Mengersen, Mike Titterington and Christian P. Robert.
Abstract: Tutorial chapter on the Online EM algorithm to appear in the volume 'Mixtures' edited by Kerrie Mengersen, Mike Titterington and Christian P. Robert.

Posted Content
TL;DR: In this paper, finite sample stability properties of sequential Monte Carlo methods for approximating sequences of probability distributions were investigated and it was shown that the effect of the initial distribution decays exponentially fast in the number of intermediate steps and the corresponding stochastic error is stable in \mathbb{L}_{p} norm.
Abstract: This paper addresses finite sample stability properties of sequential Monte Carlo methods for approximating sequences of probability distributions. The results presented herein are applicable in the scenario where the start and end distributions in the sequence are fixed and the number of intermediate steps is a parameter of the algorithm. Under assumptions which hold on non-compact spaces, it is shown that the effect of the initial distribution decays exponentially fast in the number of intermediate steps and the corresponding stochastic error is stable in \mathbb{L}_{p} norm.

Journal ArticleDOI
TL;DR: In this article, a transformation-based Markov chain Monte Carlo (TMCMC) method is proposed to generate an appropriate Markov Chain that is guaranteed to converge to the high-dimensional target distribution.
Abstract: In this article we propose a novel MCMC method based on deterministic transformations T: X x D --> X where X is the state-space and D is some set which may or may not be a subset of X. We refer to our new methodology as Transformation-based Markov chain Monte Carlo (TMCMC). One of the remarkable advantages of our proposal is that even if the underlying target distribution is very high-dimensional, deterministic transformation of a one-dimensional random variable is sufficient to generate an appropriate Markov chain that is guaranteed to converge to the high-dimensional target distribution. Apart from clearly leading to massive computational savings, this idea of deterministically transforming a single random variable very generally leads to excellent acceptance rates, even though all the random variables associated with the high-dimensional target distribution are updated in a single block. Since it is well-known that joint updating of many random variables using Metropolis-Hastings (MH) algorithm generally leads to poor acceptance rates, TMCMC, in this regard, seems to provide a significant advance. We validate our proposal theoretically, establishing the convergence properties. Furthermore, we show that TMCMC can be very effectively adopted for simulating from doubly intractable distributions. TMCMC is compared with MH using the well-known Challenger data, demonstrating the effectiveness of of the former in the case of highly correlated variables. Moreover, we apply our methodology to a challenging posterior simulation problem associated with the geostatistical model of Diggle et al. (1998), updating 160 unknown parameters jointly, using a deterministic transformation of a one-dimensional random variable. Remarkable computational savings as well as good convergence properties and acceptance rates are the results.

Journal ArticleDOI
TL;DR: A new path-following algorithm for quadratic programming that replaces hard constraints by what are called exact penalties is proposed, which can be framed entirely in terms of the sweep operator of regression analysis.
Abstract: Many least squares problems involve affine equality and inequality constraints. Although there are variety of methods for solving such problems, most statisticians find constrained estimation challenging. The current paper proposes a new path following algorithm for quadratic programming based on exact penalization. Similar penalties arise in $l_1$ regularization in model selection. Classical penalty methods solve a sequence of unconstrained problems that put greater and greater stress on meeting the constraints. In the limit as the penalty constant tends to $\infty$, one recovers the constrained solution. In the exact penalty method, squared penalties are replaced by absolute value penalties, and the solution is recovered for a finite value of the penalty constant. The exact path following method starts at the unconstrained solution and follows the solution path as the penalty constant increases. In the process, the solution path hits, slides along, and exits from the various constraints. Path following in lasso penalized regression, in contrast, starts with a large value of the penalty constant and works its way downward. In both settings, inspection of the entire solution path is revealing. Just as with the lasso and generalized lasso, it is possible to plot the effective degrees of freedom along the solution path. For a strictly convex quadratic program, the exact penalty algorithm can be framed entirely in terms of the sweep operator of regression analysis. A few well chosen examples illustrate the mechanics and potential of path following.

Journal ArticleDOI
TL;DR: In this article, the authors employ an information-theoretical framework that can be used to construct (approximately) sufficient statistics by combining different statistics until the loss of information is minimized, such sufficient sets of statistics are constructed for both parameter estimation and model selection problems.
Abstract: For nearly any challenging scientific problem evaluation of the likelihood is problematic if not impossible. Approximate Bayesian computation (ABC) allows us to employ the whole Bayesian formalism to problems where we can use simulations from a model, but cannot evaluate the likelihood directly. When summary statistics of real and simulated data are compared --- rather than the data directly --- information is lost, unless the summary statistics are sufficient. Here we employ an information-theoretical framework that can be used to construct (approximately) sufficient statistics by combining different statistics until the loss of information is minimized. Such sufficient sets of statistics are constructed for both parameter estimation and model selection problems. We apply our approach to a range of illustrative and real-world model selection problems.

Posted Content
TL;DR: This work proposes novel approaches to model selection based on posterior predictive distributions and approximations of the deviance that can settle some contradictions between the computation of model probabilities and posterior predictive checks using ABC posterior distributions.
Abstract: Approximate Bayesian computation (ABC) is a class of algorithmic methods in Bayesian inference using statistical summaries and computer simulations. ABC has become popular in evolutionary genetics and in other branches of biology. However model selection under ABC algorithms has been a subject of intense debate during the recent years. Here we propose novel approaches to model selection based on posterior predictive distributions and approximations of the deviance. We argue that this framework can settle some contradictions between the computation of model probabilities and posterior predictive checks using ABC posterior distributions. A simulation study and an analysis of a resequencing data set of human DNA show that the deviance criteria lead to sensible results in a number of model choice problems of interest to population geneticists.

Posted Content
TL;DR: An explicit bound on the Monte-Carlo error is deduced for estimates derived using the SMC sampler and the exact asymptotic relative $\mathbb{L}_2$-error of the estimate of the normalizing constant is established.
Abstract: In a recent paper Beskos et al (2011), the Sequential Monte Carlo (SMC) sampler introduced in Del Moral et al (2006), Neal (2001) has been shown to be asymptotically stable in the dimension of the state space d at a cost that is only polynomial in d, when N the number of Monte Carlo samples, is fixed. More precisely, it has been established that the effective sample size (ESS) of the ensuing (approximate) sample and the Monte Carlo error of fixed dimensional marginals will converge as $d$ grows, with a computational cost of $\mathcal{O}(Nd^2)$. In the present work, further results on SMC methods in high dimensions are provided as $d\to\infty$ and with $N$ fixed. We deduce an explicit bound on the Monte-Carlo error for estimates derived using the SMC sampler and the exact asymptotic relative $\mathbb{L}_2$-error of the estimate of the normalizing constant. We also establish marginal propagation of chaos properties of the algorithm. The accuracy in high-dimensions of some approximate SMC-based filtering schemes is also discussed.

Posted Content
TL;DR: This talk will outline the Integrated Nested Laplace Approximation method and its related R package, and focus on using INLA for survival and point process models and demonstrate some of the new features.
Abstract: Latent Gaussian models are an extremely popular, exible class of models. Bayesian inference for these models is, however, tricky and time consuming. Recently, Rue, Martino and Chopin introduced the Integrated Nested Laplace Approximation (INLA) method for deterministic fast approximate inference. In this talk we will outline the INLA approximation and its related R package. We will focus on using INLA for survival and point process models and demonstrate some of the new features. Finally we will discuss possible extensions for INLA.

Posted Content
TL;DR: In this article, a self-learning bitmap (S-bitmap) is proposed for cardinality estimation in a data stream setting, where each incoming data can be seen only once and cannot be stored for long periods of time.
Abstract: Counting the number of distinct elements (cardinality) in a dataset is a fundamental problem in database management. In recent years, due to many of its modern applications, there has been significant interest to address the distinct counting problem in a data stream setting, where each incoming data can be seen only once and cannot be stored for long periods of time. Many probabilistic approaches based on either sampling or sketching have been proposed in the computer science literature, that only require limited computing and memory resources. However, the performances of these methods are not scale-invariant, in the sense that their relative root mean square estimation errors (RRMSE) depend on the unknown cardinalities. This is not desirable in many applications where cardinalities can be very dynamic or inhomogeneous and many cardinalities need to be estimated. In this paper, we develop a novel approach, called self-learning bitmap (S-bitmap) that is scale-invariant for cardinalities in a specified range. S-bitmap uses a binary vector whose entries are updated from 0 to 1 by an adaptive sampling process for inferring the unknown cardinality, where the sampling rates are reduced sequentially as more and more entries change from 0 to 1. We prove rigorously that the S-bitmap estimate is not only unbiased but scale-invariant. We demonstrate that to achieve a small RRMSE value of $\epsilon$ or less, our approach requires significantly less memory and consumes similar or less operations than state-of-the-art methods for many common practice cardinality scales. Both simulation and experimental studies are reported.

Posted Content
TL;DR: The theory behind EM as well as a number of EM variants are reviewed, suggesting that beyond the current state of the art is an even much wider territory still to be discovered.
Abstract: The expectation-maximization (EM) algorithm introduced by Dempster et al [12] in 1977 is avery general method to solve maximum likelihood estimation problems. In this informal report,we review the theory behind EM as well as a number of EM variants, suggesting that beyondthe current state of the art is an even much wider territory still to be discovered.

Posted Content
TL;DR: Recently, a sparse Bayesian learning algorithm, namely T-SBL, and its variants, are developed, which adaptively learn the correlation structure and exploit such correlation information to significantly improve reconstruction performance.
Abstract: A trend in compressed sensing (CS) is to exploit structure for improved reconstruction performance. In the basic CS model, exploiting the clustering structure among nonzero elements in the solution vector has drawn much attention, and many algorithms have been proposed. However, few algorithms explicitly consider correlation within a cluster. Meanwhile, in the multiple measurement vector (MMV) model correlation among multiple solution vectors is largely ignored. Although several recently developed algorithms consider the exploitation of the correlation, these algorithms need to know a priori the correlation structure, thus limiting their effectiveness in practical problems. Recently, we developed a sparse Bayesian learning (SBL) algorithm, namely T-SBL, and its variants, which adaptively learn the correlation structure and exploit such correlation information to significantly improve reconstruction performance. Here we establish their connections to other popular algorithms, such as the group Lasso, iterative reweighted $\ell_1$ and $\ell_2$ algorithms, and algorithms for time-varying sparsity. We also provide strategies to improve these existing algorithms.

Journal ArticleDOI
TL;DR: In this paper, a prior distribution that is uniform in the space of functional shapes of the underlying nonlinear function and then back-transform to obtain a prior for the original model parameters is proposed.
Abstract: This paper considers the topic of finding prior distributions when a major component of the statistical model depends on a nonlinear function. Using results on how to construct uniform distributions in general metric spaces, we propose a prior distribution that is uniform in the space of functional shapes of the underlying nonlinear function and then back-transform to obtain a prior distribution for the original model parameters. The primary application considered in this article is nonlinear regression, but the idea might be of interest beyond this case. For nonlinear regression the so constructed priors have the advantage that they are parametrization invariant and do not violate the likelihood principle, as opposed to uniform distributions on the parameters or the Jeffrey's prior, respectively. The utility of the proposed priors is demonstrated in the context of nonlinear regression modelling in clinical dose-finding trials, through a real data example and simulation. In addition the proposed priors are used for calculation of an optimal Bayesian design.

Journal ArticleDOI
TL;DR: In this article, the principal components and the squared loadings are obtained from a Singular Value Decomposition, and the loadings of the quantitative variables and the principal coordinates of the categories of the qualitative variables are also obtained directly.
Abstract: Kiers (1991) considered the orthogonal rotation in PCAMIX, a principal component method for a mixture of qualitative and quantitative variables. PCAMIX includes the ordinary principal component analysis (PCA) and multiple correspondence analysis (MCA) as special cases. In this paper, we give a new presentation of PCAMIX where the principal components and the squared loadings are obtained from a Singular Value Decomposition. The loadings of the quantitative variables and the principal coordinates of the categories of the qualitative variables are also obtained directly. In this context, we propose a computationaly efficient procedure for varimax rotation in PCAMIX and a direct solution for the optimal angle of rotation. A simulation study shows the good computational behavior of the proposed algorithm. An application on a real data set illustrates the interest of using rotation in MCA. All source codes are available in the R package "PCAmixdata".

Posted Content
TL;DR: This framework, termed Particle Markov chain Monte Carlo (PMCMC), was shown to provide powerful methods for joint Bayesian state and parameter inference in nonlinear/non-Gaussian state-space models, but the mixing of the resulting MCMC kernels can be quite sensitive.
Abstract: Recently, Andrieu, Doucet and Holenstein [1] introduced a general framework for using particle lters (PFs) to construct proposal kernels for Markov chain Monte Carlo (MCMC) methods. This framework, termed Particle Markov chain Monte Carlo (PMCMC), was shown to provide powerful methods for joint Bayesian state and parameter inference in nonlinear/non-Gaussian state-space models. However, the mixing of the resulting MCMC kernels can be quite sensitive, both to the number of particles used in the underlying PF and to the number of observations in the data. In the discussion following [1], Whiteley [2] suggested a modied

Posted Content
TL;DR: This small study suggests that GP models are fertile ground for further implementation on CPU+GPU systems, and demonstrates that the computational cost of implementing GP models can be significantly reduced by using a CPU+ GPU heterogeneous computing system over an analogous implementation on a traditional computing system with no GPU acceleration.
Abstract: The graphics processing unit (GPU) has emerged as a powerful and cost effective processor for general performance computing. GPUs are capable of an order of magnitude more floating-point operations per second as compared to modern central processing units (CPUs), and thus provide a great deal of promise for computationally intensive statistical applications. Fitting complex statistical models with a large number of parameters and/or for large datasets is often very computationally expensive. In this study, we focus on Gaussian process (GP) models -- statistical models commonly used for emulating expensive computer simulators. We demonstrate that the computational cost of implementing GP models can be significantly reduced by using a CPU+GPU heterogeneous computing system over an analogous implementation on a traditional computing system with no GPU acceleration. Our small study suggests that GP models are fertile ground for further implementation on CPU+GPU systems.

Posted Content
TL;DR: The alternating linearization method for proximal decomposition is adapted to structured regularization problems, in particular, to the generalized lasso problems, and numerical results for several synthetic and real-world examples are presented.
Abstract: We adapt the alternating linearization method for proximal decomposition to structured regularization problems, in particular, to the generalized lasso problems. The method is related to two well-known operator splitting methods, the Douglas--Rachford and the Peaceman--Rachford method, but it has descent properties with respect to the objective function. This is achieved by employing a special update test, which decides whether it is beneficial to make a Peaceman--Rachford step, any of the two possible Douglas--Rachford steps, or none. The convergence mechanism of the method is related to that of bundle methods of nonsmooth optimization. We also discuss implementation for very large problems, with the use of specialized algorithms and sparse data structures. Finally, we present numerical results for several synthetic and real-world examples, including a three-dimensional fused lasso problem, which illustrate the scalability, efficacy, and accuracy of the method.