scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Computation in 2012"


BookDOI
TL;DR: In this paper, the authors discuss theoretical and practical aspects of Hamiltonian Monte Carlo, and present some of its variations, including using windows of states for deciding on acceptance or rejection, computing trajectories using fast approximations, tempering during the course of a trajectory to handle isolated modes, and short-cut methods that prevent useless trajectories from taking much computation time.
Abstract: Hamiltonian dynamics can be used to produce distant proposals for the Metropolis algorithm, thereby avoiding the slow exploration of the state space that results from the diffusive behaviour of simple random-walk proposals. Though originating in physics, Hamiltonian dynamics can be applied to most problems with continuous state spaces by simply introducing fictitious "momentum" variables. A key to its usefulness is that Hamiltonian dynamics preserves volume, and its trajectories can thus be used to define complex mappings without the need to account for a hard-to-compute Jacobian factor - a property that can be exactly maintained even when the dynamics is approximated by discretizing time. In this review, I discuss theoretical and practical aspects of Hamiltonian Monte Carlo, and present some of its variations, including using windows of states for deciding on acceptance or rejection, computing trajectories using fast approximations, tempering during the course of a trajectory to handle isolated modes, and short-cut methods that prevent useless trajectories from taking much computation time.

2,501 citations


Posted Content
TL;DR: This paper provides a generalization of M0ller et al. (2004) and a new MCMC algorithm, which obtains better acceptance probabilities for the same amount of exact sampling, and removes the need to estimate model parameters before sampling begins.
Abstract: Markov Chain Monte Carlo (MCMC) algorithms are routinely used to draw samples from distributions with intractable normalization constants. However, standard MCMC algorithms do not apply to doubly-intractable distributions in which there are additional parameter-dependent normalization terms; for example, the posterior over parameters of an undirected graphical model. An ingenious auxiliary-variable scheme (Moeller et al., 2004) offers a solution: exact sampling (Propp and Wilson, 1996) is used to sample from a Metropolis-Hastings proposal for which the acceptance probability is tractable. Unfortunately the acceptance probability of these expensive updates can be low. This paper provides a generalization of Moeller et al. (2004) and a new MCMC algorithm, which obtains better acceptance probabilities for the same amount of exact sampling, and removes the need to estimate model parameters before sampling begins.

312 citations


Posted Content
TL;DR: In this paper, a Hamiltonian Monte Carlo (HMMC) algorithm is proposed to sample from multivariate Gaussian distributions in which the target space is constrained by linear and quadratic inequalities or products thereof.
Abstract: We present a Hamiltonian Monte Carlo algorithm to sample from multivariate Gaussian distributions in which the target space is constrained by linear and quadratic inequalities or products thereof. The Hamiltonian equations of motion can be integrated exactly and there are no parameters to tune. The algorithm mixes faster and is more efficient than Gibbs sampling. The runtime depends on the number and shape of the constraints but the algorithm is highly parallelizable. In many cases, we can exploit special structure in the covariance matrices of the untruncated Gaussian to further speed up the runtime. A simple extension of the algorithm permits sampling from distributions whose log-density is piecewise quadratic, as in the "Bayesian Lasso" model.

138 citations


Posted Content
TL;DR: This paper gives eigenvalue bounds for the G-ISTA iterates, providing a closed-form linear convergence rate, which is shown to be closely related to the condition number of the optimal point.
Abstract: The L1-regularized maximum likelihood estimation problem has recently become a topic of great interest within the machine learning, statistics, and optimization communities as a method for producing sparse inverse covariance estimators. In this paper, a proximal gradient method (G-ISTA) for performing L1-regularized covariance matrix estimation is presented. Although numerous algorithms have been proposed for solving this problem, this simple proximal gradient method is found to have attractive theoretical and numerical properties. G-ISTA has a linear rate of convergence, resulting in an O(log e) iteration complexity to reach a tolerance of e. This paper gives eigenvalue bounds for the G-ISTA iterates, providing a closed-form linear convergence rate. The rate is shown to be closely related to the condition number of the optimal point. Numerical convergence results and timing comparisons for the proposed method are presented. G-ISTA is shown to perform very well, especially when the optimal point is well-conditioned.

94 citations


Posted Content
TL;DR: In this article, the authors use a suitable parameterization of the beta law in terms of its mean and a precision parameter, and allow both parameters to be modeled through regression structures that may involve fixed and random effects.
Abstract: This paper builds on recent research that focuses on regression modeling of continuous bounded data, such as proportions measured on a continuous scale. Specifically, it deals with beta regression models with mixed effects from a Bayesian approach. We use a suitable parameterization of the beta law in terms of its mean and a precision parameter, and allow both parameters to be modeled through regression structures that may involve fixed and random effects. Specification of prior distributions is discussed, computational implementation via Gibbs sampling is provided, and illustrative examples are presented.

88 citations


Posted Content
TL;DR: The Bergm package as discussed by the authors provides a comprehensive framework for Bayesian analysis for exponential random graph models: tools for parameter estimation, model selection and goodness-of-fit diagnostics, and illustrate the capabilities of this package through a tutorial analysis of two well-known network datasets.
Abstract: In this paper we describe the main featuress of the Bergm package for the open-source R software which provides a comprehensive framework for Bayesian analysis for exponential random graph models: tools for parameter estimation, model selection and goodness-of-fit diagnostics. We illustrate the capabilities of this package describing the algorithms through a tutorial analysis of two well-known network datasets.

80 citations


Posted Content
TL;DR: The results question the notion that the latter technique is both significantly faster and more robust than MCMC in this setting; 100,000 iterations of the MALA algorithm running in 20 min on a desktop PC delivered greater predictive accuracy than the default INLA strategy and gave comparative performance to the full Laplace approximation which ran in 39 min.
Abstract: We investigate two options for performing Bayesian inference on spatial log-Gaussian Cox processes assuming a spatially continuous latent field: Markov chain Monte Carlo (MCMC) and the integrated nested Laplace approximation (INLA). We first describe the device of approximating a spatially continuous Gaussian field by a Gaussian Markov random field on a discrete lattice, and present a simulation study showing that, with careful choice of parameter values, small neighbourhood sizes can give excellent approximations. We then introduce the spatial log-Gaussian Cox process and describe MCMC and INLA methods for spatial prediction within this model class. We report the results of a simulation study in which we compare MALA and the technique of approximating the continuous latent field by a discrete one, followed by approximate Bayesian inference via INLA over a selection of 18 simulated scenarios. The results question the notion that the latter technique is both significantly faster and more robust than MCMC in this setting; 100,000 iterations of the MALA algorithm running in 20 minutes on a desktop PC delivered greater predictive accuracy than the default \verb=INLA= strategy, which ran in 4 minutes and gave comparative performance to the full Laplace approximation which ran in 39 minutes.

70 citations


Posted Content
TL;DR: In this article, the posterior distribution of a Markov random field (MRF) is computed by conditioning on the remaining trees of the MRF and then the MRFs are partitioned into non-overlapping trees.
Abstract: We present new MCMC algorithms for computing the posterior distributions and expectations of the unknown variables in undirected graphical models with regular structure. For demonstration purposes, we focus on Markov Random Fields (MRFs). By partitioning the MRFs into non-overlapping trees, it is possible to compute the posterior distribution of a particular tree exactly by conditioning on the remaining tree. These exact solutions allow us to construct efficient blocked and Rao-Blackwellised MCMC algorithms. We show empirically that tree sampling is considerably more efficient than other partitioned sampling schemes and the naive Gibbs sampler, even in cases where loopy belief propagation fails to converge. We prove that tree sampling exhibits lower variance than the naive Gibbs sampler and other naive partitioning schemes using the theoretical measure of maximal correlation. We also construct new information theory tools for comparing different MCMC schemes and show that, under these, tree sampling is more efficient.

65 citations


Posted Content
TL;DR: In this article, the authors show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases, and they focus on optimization and massive parallelization of cyclic coordinate descent approaches.
Abstract: Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this paper we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety.

59 citations


Journal ArticleDOI
TL;DR: The application of Riemann manifold Markov chain Monte Carlo methods using an approximation to the likelihood of the MJP that is valid when the system modelled is near its thermodynamic limit is described.
Abstract: Bayesian analysis for Markov jump processes is a non-trivial and challenging problem Although exact inference is theoretically possible, it is computationally demanding thus its applicability is limited to a small class of problems In this paper we describe the application of Riemann manifold MCMC methods using an approximation to the likelihood of the Markov jump process which is valid when the system modelled is near its thermodynamic limit The proposed approach is both statistically and computationally efficient while the convergence rate and mixing of the chains allows for fast MCMC inference The methodology is evaluated using numerical simulations on two problems from chemical kinetics and one from systems biology

49 citations


Posted Content
TL;DR: A class of MCMC algorithms characterized by a pseudo-marginal transition probability kernel is given and it is shown that on average O(m) of the samples realized by a simulation approximating a randomized chain of length n are exactly the same as those of a coupled (exact) randomized chain.
Abstract: We consider Metropolis Hastings MCMC with tar- get �(�) in cases where the log of the ratio of target distributions D = log(�(� ' )/�(�)) is replaced by an estimator ˆ D(W). The esti- mator is based on m samples W = (W1,W2,...,Wm) from an inde- pendent online Monte Carlo simulation. Under some conditions on the distribution of ˆ D(W) the process resembles Metropolis Hast- ings MCMC with a randomized transition kernel. When this is the case there is a correction to the estimated acceptance proba- bility which ensures that the target distribution remains the equi- librium distribution. The simplest versions of the penalty method of Ceperley and Dewing 1999 (6), the universal algorithm of Ball et al. 2003 (3) and the single variable exchange algorithm of Murray et al. 2006 (15) are special cases. In many applications of interest the correction terms cannot be computed. We consider approxi- mate versions of the algorithms. We show that on average O(m) of the samples realized by a simulation approximating a randomized chain of length n are exactly the same as those of a coupled (exact) randomized chain. Approximation biases Monte Carlo estimates with terms O(1/m) or smaller. This should be compared to the Monte Carlo error which is O(1/ √ n). Monte Carlo simulation offers a direct route to statistical inference for many otherwise awkward fitting problems. The class of problems which may be treated using Markov chain Monte Carlo (MCMC) and Sequential Monte Carlo has grown a great deal since the core algorithms were proposed (12, 7). One of the most important recent advances (11, 4, 2, 1) has given us pseudo-marginal MCMC algorithms which are useful for some doubly intractable distributions. Such distributions are hard to simulate as we cannot readily compute the ratio of densities at two values of the target variable. Algorithms with a pseudo-marginal target distribution put an estimate for the target distribution in the Monte Carlo state along with the target variable. In this paper we give a class of MCMC algorithms characterized by a pseudo-marginal transition probability kernel. In these algorithms the Monte Carlo state 1

Posted Content
TL;DR: The contribution of the present paper is to consider regression density estimation techniques to approximate the likelihood in the ABC setting, which builds on recently developed marginal adaptation density estimators by extending them for conditional density estimation.
Abstract: Approximate Bayesian computation (ABC) methods, which are applicable when the likelihood is difficult or impossible to calculate, are an active topic of current research. Most current ABC algorithms directly approximate the posterior distribution, but an alternative, less common strategy is to approximate the likelihood function. This has several advantages. First, in some problems, it is easier to approximate the likelihood than to approximate the posterior. Second, an approximation to the likelihood allows reference analyses to be constructed based solely on the likelihood. Third, it is straightforward to perform sensitivity analyses for several different choices of prior once an approximation to the likelihood is constructed, which needs to be done only once. The contribution of the present paper is to consider regression density estimation techniques to approximate the likelihood in the ABC setting. Our likelihood approximations build on recently developed marginal adaptation density estimators by extending them for conditional density estimation. Our approach facilitates reference Bayesian inference, as well as frequentist inference. The method is demonstrated via a challenging problem of inference for stereological extremes, where we perform both frequentist and Bayesian inference.

Journal ArticleDOI
TL;DR: In this article, a Markov chain Monte Carlo (MCMCMC) method is proposed to estimate the Potts parameter B jointly with the unknown parameters of a Bayesian model within a MCMC algorithm.
Abstract: This paper addresses the problem of estimating the Potts parameter B jointly with the unknown parameters of a Bayesian model within a Markov chain Monte Carlo (MCMC) algorithm Standard MCMC methods cannot be applied to this problem because performing inference on B requires computing the intractable normalizing constant of the Potts model In the proposed MCMC method the estimation of B is conducted using a likelihood-free Metropolis-Hastings algorithm Experimental results obtained for synthetic data show that estimating B jointly with the other unknown parameters leads to estimation results that are as good as those obtained with the actual value of B On the other hand, assuming that the value of B is known can degrade estimation performance significantly if this value is incorrect To illustrate the interest of this method, the proposed algorithm is successfully applied to real bidimensional SAR and tridimensional ultrasound images

Journal ArticleDOI
TL;DR: In this article, the authors show and remark upon the flexibility of the design of MTM-type methods, fulfilling the detailed balance condition, and discuss several possibilities and show different numerical results.
Abstract: The Multiple Try Metropolis (MTM) method is a generalization of the classical Metropolis-Hastings algorithm in which the next state of the chain is chosen among a set of samples, according to normalized weights. In the literature, several extensions have been proposed. In this work, we show and remark upon the flexibility of the design of MTM-type methods, fulfilling the detailed balance condition. We discuss several possibilities and show different numerical results.

Journal ArticleDOI
TL;DR: A more flexible modeling framework is introduced, the variational-approximation estimation algorithm is improved, standard error estimation is discussed and implemented via a parametric bootstrap approach, and the usefulness of the model-based clustering framework is demonstrated by applying it to a discrete-valued network.
Abstract: We describe a network clustering framework, based on finite mixture models, that can be applied to discrete-valued networks with hundreds of thousands of nodes and billions of edge variables. Relative to other recent model-based clustering work for networks, we introduce a more flexible modeling framework, improve the variational-approximation estimation algorithm, discuss and implement standard error estimation via a parametric bootstrap approach, and apply these methods to much larger data sets than those seen elsewhere in the literature. The more flexible framework is achieved through introducing novel parameterizations of the model, giving varying degrees of parsimony, using exponential family models whose structure may be exploited in various theoretical and algorithmic ways. The algorithms are based on variational generalized EM algorithms, where the E-steps are augmented by a minorization-maximization (MM) idea. The bootstrapped standard error estimates are based on an efficient Monte Carlo network simulation idea. Last, we demonstrate the usefulness of the model-based clustering framework by applying it to a discrete-valued network with more than 131,000 nodes and 17 billion edge variables.

Posted Content
TL;DR: A parallelizable Markov chain Monte Carlo algorithm for effciently sampling from continuous probability distributions that can take advantage of hundreds of cores and shares information between parallel Markov chains to build a scale-location mixture of Gaussians approximation to the density function of the target distribution.
Abstract: Probabilistic models are conceptually powerful tools for finding structure in data, but their practical effectiveness is often limited by our ability to perform inference in them. Exact inference is frequently intractable, so approximate inference is often performed using Markov chain Monte Carlo (MCMC). To achieve the best possible results from MCMC, we want to efficiently simulate many steps of a rapidly mixing Markov chain which leaves the target distribution invariant. Of particular interest in this regard is how to take advantage of multi-core computing to speed up MCMC-based inference, both to improve mixing and to distribute the computational load. In this paper, we present a parallelizable Markov chain Monte Carlo algorithm for efficiently sampling from continuous probability distributions that can take advantage of hundreds of cores. This method shares information between parallel Markov chains to build a scale-mixture of Gaussians approximation to the density function of the target distribution. We combine this approximation with a recent method known as elliptical slice sampling to create a Markov chain with no step-size parameters that can mix rapidly without requiring gradient or curvature computations.

Posted Content
TL;DR: In this article, a particle Gibbs with ancestor sampling (PG-AS) method was proposed to improve the mixing of the particle MCMC kernel in a single forward sweep instead of using separate forward and backward sweeps.
Abstract: We present a novel method in the family of particle MCMC methods that we refer to as particle Gibbs with ancestor sampling (PG-AS). Similarly to the existing PG with backward simulation (PG-BS) procedure, we use backward sampling to (considerably) improve the mixing of the PG kernel. Instead of using separate forward and backward sweeps as in PG-BS, however, we achieve the same effect in a single forward sweep. We apply the PG-AS framework to the challenging class of non-Markovian state-space models. We develop a truncation strategy of these models that is applicable in principle to any backward-simulation-based method, but which is particularly well suited to the PG-AS framework. In particular, as we show in a simulation study, PG-AS can yield an order-of-magnitude improved accuracy relative to PG-BS due to its robustness to the truncation error. Several application examples are discussed, including Rao-Blackwellized particle smoothing and inference in degenerate state-space models.

Journal ArticleDOI
TL;DR: An R package for specifying and estimating linear latent variable models to separate the model specification from the actual data leads to a dynamic and easy way of modeling complex hierarchical structures.
Abstract: An R package for specifying and estimating linear latent variable models is presented. The philosophy of the implementation is to separate the model specification from the actual data, which leads to a dynamic and easy way of modeling complex hierarchical structures. Several advanced features are implemented including robust standard errors for clustered correlated data, multigroup analyses, non-linear parameter constraints, inference with incomplete data, maximum likelihood estimation with censored and binary observations, and instrumental variable estimators. In addition an extensive simulation interface covering a broad range of non-linear generalized structural equation models is described. The model and software are demonstrated in data of measurements of the serotonin transporter in the human brain.

Posted Content
TL;DR: A new type of generalization of the EM procedure introduced in [Chretien and Hero (1998)] and called Kullback-proximal algorithms is studied and it is shown that some cluster points lie on the boundary of the parameter space.
Abstract: In this paper, we analyze the celebrated EM algorithm from the point of view of proximal point algorithms. More precisely, we study a new type of generalization of the EM procedure introduced in \cite{Chretien&Hero:98} and called Kullback-proximal algorithms. The proximal framework allows us to prove new results concerning the cluster points. An essential contribution is a detailed analysis of the case where some cluster points lie on the boundary of the parameter space.

Posted Content
TL;DR: In this paper, a new lifetime distribution which is obtained by compounding Lindley and geometric distributions, named Lindley-geometric (LG) distribution, is introduced and several properties of the new distribution such as density, failure rate, mean lifetime, moments, and order statistics are derived.
Abstract: In this paper a new lifetime distribution which is obtained by compounding Lindley and geometric distributions, named Lindley-geometric (LG) distribution, is introduced. Several properties of the new distribution such as density, failure rate, mean lifetime, moments, and order statistics are derived. Furthermore, estimation by maximum likelihood and inference for large sample are discussed. The paper is motivated by two applications to real data sets and we hope that this model be able to attract wider applicability in survival and reliability.

Posted Content
TL;DR: It is demonstrated that a Metropolis resampler can be faster where the variance in importance weights is modest, and so is worth considering in a performance-critical context, such as particle Markov chain Monte Carlo and real-time applications.
Abstract: We consider deployment of the particle filter on modern massively parallel hardware architectures, such as Graphics Processing Units (GPUs), with a focus on the resampling stage. While standard multinomial and stratified resamplers require a sum of importance weights computed collectively between threads, a Metropolis resampler favourably requires only pair-wise ratios between weights, computed independently by threads, and can be further tuned for performance by adjusting its number of iterations. While achieving respectable results for the stratified and multinomial resamplers, we demonstrate that a Metropolis resampler can be faster where the variance in importance weights is modest, and so is worth considering in a performance-critical context, such as particle Markov chain Monte Carlo and real-time applications.

Posted Content
TL;DR: This work proves the convergence of the AMIS, at a cost of a slight modification in the learning process, in the asymptotic regime where the number of iterations is going to infinity while theNumber of drawings per iteration is a fixed, but growing sequence of integers.
Abstract: Among Monte Carlo techniques, the importance sampling requires fine tuning of a proposal distribution, which is now fluently resolved through iterative schemes. The Adaptive Multiple Importance Sampling (AMIS) of Cornuet et al. (2012) provides a significant improvement in stability and effective sample size due to the introduction of a recycling procedure. However, the consistency of the AMIS estimator remains largely open. In this work we prove the convergence of the AMIS, at a cost of a slight modification in the learning process. Contrary to Douc et al. (2007a), results are obtained here in the asymptotic regime where the number of iterations is going to infinity while the number of drawings per iteration is a fixed, but growing sequence of integers. Hence some of the results shed new light on adaptive population Monte Carlo algorithms in that last regime.

Posted Content
TL;DR: In this article, an approximate Bayesian inference for LGP density estimation in a grid using Laplace's method to integrate over the non-Gaussian posterior distribution of latent function values and to determine the covariance function parameters with type-II maximum a posteriori (MAP) estimation is presented.
Abstract: Logistic Gaussian process (LGP) priors provide a flexible alternative for modelling unknown densities. The smoothness properties of the density estimates can be controlled through the prior covariance structure of the LGP, but the challenge is the analytically intractable inference. In this paper, we present approximate Bayesian inference for LGP density estimation in a grid using Laplace's method to integrate over the non-Gaussian posterior distribution of latent function values and to determine the covariance function parameters with type-II maximum a posteriori (MAP) estimation. We demonstrate that Laplace's method with MAP is sufficiently fast for practical interactive visualisation of 1D and 2D densities. Our experiments with simulated and real 1D data sets show that the estimation accuracy is close to a Markov chain Monte Carlo approximation and state-of-the-art hierarchical infinite Gaussian mixture models. We also construct a reduced-rank approximation to speed up the computations for dense 2D grids, and demonstrate density regression with the proposed Laplace approach.

Posted Content
TL;DR: This paper presents two homotopy-based algorithms that efficiently solve reweighted L1 problems and proposes an algorithm that solves a weighted L1 problem by adaptively selecting the weights while estimating the signal, and compares the performance of both algorithms against state-of-the-art solvers.
Abstract: To recover a sparse signal from an underdetermined system, we often solve a constrained L1-norm minimization problem. In many cases, the signal sparsity and the recovery performance can be further improved by replacing the L1 norm with a "weighted" L1 norm. Without any prior information about nonzero elements of the signal, the procedure for selecting weights is iterative in nature. Common approaches update the weights at every iteration using the solution of a weighted L1 problem from the previous iteration. In this paper, we present two homotopy-based algorithms that efficiently solve reweighted L1 problems. First, we present an algorithm that quickly updates the solution of a weighted L1 problem as the weights change. Since the solution changes only slightly with small changes in the weights, we develop a homotopy algorithm that replaces the old weights with the new ones in a small number of computationally inexpensive steps. Second, we propose an algorithm that solves a weighted L1 problem by adaptively selecting the weights while estimating the signal. This algorithm integrates the reweighting into every step along the homotopy path by changing the weights according to the changes in the solution and its support, allowing us to achieve a high quality signal reconstruction by solving a single homotopy problem. We compare the performance of both algorithms, in terms of reconstruction accuracy and computational complexity, against state-of-the-art solvers and show that our methods have smaller computational cost. In addition, we will show that the adaptive selection of the weights inside the homotopy often yields reconstructions of higher quality.

Journal ArticleDOI
TL;DR: In this paper, the group lasso has been extended to the group selection problem, giving rise to group SCAD and group MCP methods, and algorithms for fitting these models stably and efficiently.
Abstract: Penalized regression is an attractive framework for variable selection problems. Often, variables possess a grouping structure, and the relevant selection problem is that of selecting groups, not individual variables. The group lasso has been proposed as a way of extending the ideas of the lasso to the problem of group selection. Nonconvex penalties such as SCAD and MCP have been proposed and shown to have several advantages over the lasso; these penalties may also be extended to the group selection problem, giving rise to group SCAD and group MCP methods. Here, we describe algorithms for fitting these models stably and efficiently. In addition, we present simulation results and real data examples comparing and contrasting the statistical properties of these methods.

Journal ArticleDOI
TL;DR: This paper examines the use of partially noncentered parametrizations in VB for generalized linear mixed models (GLMMs) and shows how to implement an algorithm called nonconjugate variational message passing for GLMMs and shows that partial noncentering can accelerate convergence and produce more accurate posterior approximations than centering ornoncentering.
Abstract: The effects of different parametrizations on the convergence of Bayesian computational algorithms for hierarchical models are well explored. Techniques such as centering, noncentering and partial noncentering can be used to accelerate convergence in MCMC and EM algorithms but are still not well studied for variational Bayes (VB) methods. As a fast deterministic approach to posterior approximation, VB is attracting increasing interest due to its suitability for large high-dimensional data. Use of different parametrizations for VB has not only computational but also statistical implications, as different parametrizations are associated with different factorized posterior approximations. We examine the use of partially noncentered parametrizations in VB for generalized linear mixed models (GLMMs). Our paper makes four contributions. First, we show how to implement an algorithm called nonconjugate variational message passing for GLMMs. Second, we show that the partially noncentered parametrization can adapt to the quantity of information in the data and determine a parametrization close to optimal. Third, we show that partial noncentering can accelerate convergence and produce more accurate posterior approximations than centering or noncentering. Finally, we demonstrate how the variational lower bound, produced as part of the computation, can be useful for model selection.

Journal ArticleDOI
TL;DR: In this paper, a gradient-based stochastic optimization method was proposed for nonlinear systems from a Bayesian perspective, with the goal of choosing experiments that are optimal for parameter inference.
Abstract: Optimal experimental design (OED) seeks experiments expected to yield the most useful data for some purpose. In practical circumstances where experiments are time-consuming or resource-intensive, OED can yield enormous savings. We pursue OED for nonlinear systems from a Bayesian perspective, with the goal of choosing experiments that are optimal for parameter inference. Our objective in this context is the expected information gain in model parameters, which in general can only be estimated using Monte Carlo methods. Maximizing this objective thus becomes a stochastic optimization problem. This paper develops gradient-based stochastic optimization methods for the design of experiments on a continuous parameter space. Given a Monte Carlo estimator of expected information gain, we use infinitesimal perturbation analysis to derive gradients of this estimator. We are then able to formulate two gradient-based stochastic optimization approaches: (i) Robbins-Monro stochastic approximation, and (ii) sample average approximation combined with a deterministic quasi-Newton method. A polynomial chaos approximation of the forward model accelerates objective and gradient evaluations in both cases. We discuss the implementation of these optimization methods, then conduct an empirical comparison of their performance. To demonstrate design in a nonlinear setting with partial differential equation forward models, we use the problem of sensor placement for source inversion. Numerical results yield useful guidelines on the choice of algorithm and sample sizes, assess the impact of estimator bias, and quantify tradeoffs of computational cost versus solution quality and robustness.

Journal ArticleDOI
TL;DR: In this paper, the authors investigate sampling laws for particle algorithms and the influence of these laws on the efficiency of particle approximations of marginal likelihoods in hidden Markov models, and characterize the essentially unique family of particle system transition kernels which is optimal with respect to an asymptotic-in-time variance growth rate criterion.
Abstract: We investigate sampling laws for particle algorithms and the influence of these laws on the efficiency of particle approximations of marginal likelihoods in hidden Markov models. Among a broad class of candidates we characterize the essentially unique family of particle system transition kernels which is optimal with respect to an asymptotic-in-time variance growth rate criterion. The sampling structure of the algorithm defined by these optimal transitions turns out to be only subtly different from standard algorithms and yet the fluctuation properties of the estimates it provides can be dramatically different. The structure of the optimal transition suggests a new class of algorithms, which we term "twisted" particle filters and which we validate with asymptotic analysis of a more traditional nature, in the regime where the number of particles tends to infinity.

Posted Content
TL;DR: In this article, an explicit expression for the trun-cated mean and variance for the multivariate normal distribution with ar- bitrary rectangular double truncation was derived, and a formula for the bivari- ate marginal density of truncated multinormal variates was given.
Abstract: In the present article we derive an explicit expression for the trun- cated mean and variance for the multivariate normal distribution with ar- bitrary rectangular double truncation. We use the moment generating ap- proach of Tallis (1961) and extend it to general {\mu}, {\Sigma} and all combinations of truncation. As part of the solution we also give a formula for the bivari- ate marginal density of truncated multinormal variates. We also prove an invariance property of some elements of the inverse covariance after trunca- tion. Computer algorithms for computing the truncated mean, variance and the bivariate marginal probabilities for doubly truncated multivariate normal variates have been written in R and are presented along with three examples.

Journal ArticleDOI
TL;DR: This work investigates nonlinear state-space models without a closed-form transition density and proposes reformulating such models over their latent noise variables rather than their latent state variables, finding that the tractable noise density emerges in place of the intractable transition density.
Abstract: We investigate nonlinear state-space models without a closed-form transition density, and propose reformulating such models over their latent noise variables rather than their latent state variables. In doing so the tractable noise density emerges in place of the intractable transition density. For importance sampling methods such as the auxiliary particle filter, this enables importance weights to be computed where they could not be otherwise. As case studies we take two multivariate marine biogeochemical models and perform state and parameter estimation using the particle marginal Metropolis-Hastings sampler. For the particle filter within this sampler, we compare several proposal strategies over noise variables, all based on lookaheads with the unscented Kalman filter. These strategies are compared using conventional means for assessing Metropolis-Hastings efficiency, as well as with a novel metric called the conditional acceptance rate for assessing the consequences of using an estimated, and not exact, likelihood. Results indicate the utility of reformulating the model over noise variables, particularly for fast-mixing process models.