scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Accelerated Gibbs sampling of normal distributions using matrix splittings and polynomials

02 Nov 2017-Bernoulli (Bernoulli Society for Mathematical Statistics and Probability)-Vol. 23, Iss: 48, pp 3711-3743
TL;DR: Standard Gibbs sampling applied to a multivariate normal distribution with a specified precision matrix is equivalent in fundamental ways to the Gauss-Seidel iterative solution of linear equations in the precision matrix, giving easy access to mature results in numerical linear algebra.
Abstract: Standard Gibbs sampling applied to a multivariate normal distribution with a specified precision matrix is equivalent in fundamental ways to the Gauss–Seidel iterative solution of linear equations in the precision matrix. Specifically, the iteration operators, the conditions under which convergence occurs, and geometric convergence factors (and rates) are identical. These results hold for arbitrary matrix splittings from classical iterative methods in numerical linear algebra giving easy access to mature results in that field, including existing convergence results for antithetic-variable Gibbs sampling, REGS sampling, and generalizations. Hence, efficient deterministic stationary relaxation schemes lead to efficient generalizations of Gibbs sampling. The technique of polynomial acceleration that significantly improves the convergence rate of an iterative solver derived from a symmetric matrix splitting may be applied to accelerate the equivalent generalized Gibbs sampler. Identicality of error polynomials guarantees convergence of the inhomogeneous Markov chain, while equality of convergence factors ensures that the optimal solver leads to the optimal sampler. Numerical examples are presented, including a Chebyshev accelerated SSOR Gibbs sampler applied to a stylized demonstration of low-level Bayesian image reconstruction in a large 3-dimensional linear inverse problem.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this article, a tensor train approximation to the target probability density function using a small number of function evaluations is proposed, which is based on low-rank surrogates in the tensor-train format, a methodology that has been exploited for many years for scalable, high-dimensional density function approximation in quantum physics and chemistry.
Abstract: General multivariate distributions are notoriously expensive to sample from, particularly the high-dimensional posterior distributions in PDE-constrained inverse problems. This paper develops a sampler for arbitrary continuous multivariate distributions that is based on low-rank surrogates in the tensor train format, a methodology that has been exploited for many years for scalable, high-dimensional density function approximation in quantum physics and chemistry. We build upon recent developments of the cross approximation algorithms in linear algebra to construct a tensor train approximation to the target probability density function using a small number of function evaluations. For sufficiently smooth distributions, the storage required for accurate tensor train approximations is moderate, scaling linearly with dimension. In turn, the structure of the tensor train surrogate allows sampling by an efficient conditional distribution method since marginal distributions are computable with linear complexity in dimension. Expected values of non-smooth quantities of interest, with respect to the surrogate distribution, can be estimated using transformed independent uniformly-random seeds that provide Monte Carlo quadrature or transformed points from a quasi-Monte Carlo lattice to give more efficient quasi-Monte Carlo quadrature. Unbiased estimates may be calculated by correcting the transformed random seeds using a Metropolis–Hastings accept/reject step, while the quasi-Monte Carlo quadrature may be corrected either by a control-variate strategy or by importance weighting. We show that the error in the tensor train approximation propagates linearly into the Metropolis–Hastings rejection rate and the integrated autocorrelation time of the resulting Markov chain; thus, the integrated autocorrelation time may be made arbitrarily close to 1, implying that, asymptotic in sample size, the cost per effectively independent sample is one target density evaluation plus the cheap tensor train surrogate proposal that has linear cost with dimension. These methods are demonstrated in three computed examples: fitting failure time of shock absorbers; a PDE-constrained inverse diffusion problem; and sampling from the Rosenbrock distribution. The delayed rejection adaptive Metropolis (DRAM) algorithm is used as a benchmark. In all computed examples, the importance weight-corrected quasi-Monte Carlo quadrature performs best and is more efficient than DRAM by orders of magnitude across a wide range of approximation accuracies and sample sizes. Indeed, all the methods developed here significantly outperform DRAM in all computed examples.

38 citations

Posted Content
TL;DR: A sampler for arbitrary continuous multivariate distributions that is based on low-rank surrogates in the tensor train format, a methodology that has been exploited for many years for scalable, high-dimensional density function approximation in quantum physics and chemistry.
Abstract: General multivariate distributions are notoriously expensive to sample from, particularly the high-dimensional posterior distributions in PDE-constrained inverse problems. This paper develops a sampler for arbitrary continuous multivariate distributions that is based on low-rank surrogates in the tensor-train format. We construct a tensor-train approximation to the target probability density function using the cross interpolation, which requires a small number of function evaluations. For sufficiently smooth distributions the storage required for the TT approximation is moderate, scaling linearly with dimension. The structure of the tensor-train surrogate allows efficient sampling by the conditional distribution method. Unbiased estimates may be calculated by correcting the transformed random seeds using a Metropolis--Hastings accept/reject step. Moreover, one can use a more efficient quasi-Monte Carlo quadrature that may be corrected either by a control-variate strategy, or by importance weighting. We show that the error in the tensor-train approximation propagates linearly into the Metropolis--Hastings rejection rate and the integrated autocorrelation time of the resulting Markov chain. These methods are demonstrated in three computed examples: fitting failure time of shock absorbers; a PDE-constrained inverse diffusion problem; and sampling from the Rosenbrock distribution. The delayed rejection adaptive Metropolis (DRAM) algorithm is used as a benchmark. We find that the importance-weight corrected quasi-Monte Carlo quadrature performs best in all computed examples, and is orders-of-magnitude more efficient than DRAM across a wide range of approximation accuracies and sample sizes. Indeed, all the methods developed here significantly outperform DRAM in all computed examples.

34 citations

Journal ArticleDOI
TL;DR: Morzfeld et al. as discussed by the authors used covariance localization in numerical weather prediction to sample high-dimensional posterior distributions arising in Bayesian inverse problems, and showed that the convergence rate is independent of dimension for localized linear problems.

27 citations

Posted Content
TL;DR: This paper proposes a unifying Gaussian simulation framework by deriving a stochastic counterpart of the celebrated proximal point algorithm in optimization and offers a novel and unifying revisit of most of the existing MCMC approaches while extending them.
Abstract: Efficient sampling from a high-dimensional Gaussian distribution is an old but high-stake issue. Vanilla Cholesky samplers imply a computational cost and memory requirements which can rapidly become prohibitive in high dimension. To tackle these issues, multiple methods have been proposed from different communities ranging from iterative numerical linear algebra to Markov chain Monte Carlo (MCMC) approaches. Surprisingly, no complete review and comparison of these methods have been conducted. This paper aims at reviewing all these approaches by pointing out their differences, close relations, benefits and limitations. In addition to this state of the art, this paper proposes a unifying Gaussian simulation framework by deriving a stochastic counterpart of the celebrated proximal point algorithm in optimization. This framework offers a novel and unifying revisit of most of the existing MCMC approaches while extending them. Guidelines to choose the appropriate Gaussian simulation method for a given sampling problem in high dimension are proposed and illustrated with numerical examples.

17 citations


Cites background from "Accelerated Gibbs sampling of norma..."

  • ...As pointed out in [28], the simpler M, the denser M +N and the more difficult the sampling of z̃....

    [...]

  • ...Since the matrix Q is strictly diagonally dominant, that is |Qii| > ∑ j 6=i |Qij | for all i ∈ [d], the convergence of the MCMC sampler based on Jacobi splitting is ensured [28,37]....

    [...]

  • ..., based on Chebyshev polynomials [28]....

    [...]

  • ...This radius is particularly important since it is directly related to the convergence factor of the corresponding MCMC sampler [28]....

    [...]

  • ...Related instances of this precision matrix have also been considered in [28,45,75] in order to show the benefits of both direct and MCMC samplers....

    [...]

Posted Content
TL;DR: A-priori rank bounds for approximations in the functional Tensor-Train representation for the case of a Gaussian (normally distributed) model are developed and shown that under suitable conditions on the precision matrix, the Gaussian density can be represented to high accuracy without suffering from an exponential growth of complexity as the dimension increases.
Abstract: Low-rank tensor approximations have shown great potential for uncertainty quantification in high dimensions, for example, to build surrogate models that can be used to speed up large-scale inference problems (Eigel et al., Inverse Problems 34, 2018; Dolgov et al., Statistics & Computing 30, 2020). The feasibility and efficiency of such approaches depends critically on the rank that is necessary to represent or approximate the underlying distribution. In this paper, a-priori rank bounds for approximations in the functional tensor-train representation for the case of Gaussian models are developed. It is shown that under suitable conditions on the precision matrix, the Gaussian density can be approximated to high accuracy without suffering from an exponential growth of complexity as the dimension increases. These results provide a rigorous justification of the suitability and the limitations of low-rank tensor methods in a simple but important model case. Numerical experiments confirm that the rank bounds capture the qualitative behavior of the rank structure when varying the parameters of the precision matrix and the accuracy of the approximation. Finally, the practical relevance of the theoretical results is demonstrated in the context of a Bayesian filtering problem.

11 citations


Cites background from "Accelerated Gibbs sampling of norma..."

  • ...In the linear Gaussian case, direct samplers based on factorizations of the posterior covariance matrix are available [1, 3, 21, 44] and polynomial-accelerated Markov chains can be used [22]....

    [...]

References
More filters
Book
01 Jan 1985
TL;DR: In this article, the authors present results of both classic and recent matrix analyses using canonical forms as a unifying theme, and demonstrate their importance in a variety of applications, such as linear algebra and matrix theory.
Abstract: Linear algebra and matrix theory are fundamental tools in mathematical and physical science, as well as fertile fields for research. This new edition of the acclaimed text presents results of both classic and recent matrix analyses using canonical forms as a unifying theme, and demonstrates their importance in a variety of applications. The authors have thoroughly revised, updated, and expanded on the first edition. The book opens with an extended summary of useful concepts and facts and includes numerous new topics and features, such as: - New sections on the singular value and CS decompositions - New applications of the Jordan canonical form - A new section on the Weyr canonical form - Expanded treatments of inverse problems and of block matrices - A central role for the Von Neumann trace theorem - A new appendix with a modern list of canonical forms for a pair of Hermitian matrices and for a symmetric-skew symmetric pair - Expanded index with more than 3,500 entries for easy reference - More than 1,100 problems and exercises, many with hints, to reinforce understanding and develop auxiliary themes such as finite-dimensional quantum systems, the compound and adjugate matrices, and the Loewner ellipsoid - A new appendix provides a collection of problem-solving hints.

23,986 citations

Journal ArticleDOI
TL;DR: The analogy between images and statistical mechanics systems is made and the analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations, creating a highly parallel ``relaxation'' algorithm for MAP estimation.
Abstract: We make an analogy between images and statistical mechanics systems. Pixel gray levels and the presence and orientation of edges are viewed as states of atoms or molecules in a lattice-like physical system. The assignment of an energy function in the physical system determines its Gibbs distribution. Because of the Gibbs distribution, Markov random field (MRF) equivalence, this assignment also determines an MRF image model. The energy function is a more convenient and natural mechanism for embodying picture attributes than are the local characteristics of the MRF. For a range of degradation mechanisms, including blurring, nonlinear deformations, and multiplicative or additive noise, the posterior distribution is an MRF with a structure akin to the image model. By the analogy, the posterior distribution defines another (imaginary) physical system. Gradual temperature reduction in the physical system isolates low energy states (``annealing''), or what is the same thing, the most probable states under the Gibbs distribution. The analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations. The result is a highly parallel ``relaxation'' algorithm for MAP estimation. We establish convergence properties of the algorithm and we experiment with some simple pictures, for which good restorations are obtained at low signal-to-noise ratios.

18,761 citations


"Accelerated Gibbs sampling of norma..." refers background or methods in this paper

  • ...…around 1990 (Robert and Casella, 2011), though prior to that the Gibbs sampler provided a coherent approach to investigating distributions with Markov random field structure (Turčin, 1971; Grenander, 1983; Geman and Geman, 1984; Gelfand and Smith, 1990; Besag and Green, 1993; Sokal, 1993)....

    [...]

  • ...One of the simplest iterative sampling methods is the component-sweep Gibbs sampler (Geman and Geman, 1984; Gelman et al., 1995; Gilks et al., 1996; Rue and Held, 2005)....

    [...]

  • ...The Metropolis–Hastings algorithm for MCMC was introduced to main-stream statistics around 1990 (Robert and Casella [48]), though prior to that the Gibbs sampler provided a coherent approach to investigating distributions with Markov random field structure (Turčin [60], Grenander [32], Geman and Geman [25], Gelfand and Smith [23], Besag and Green [11], Sokal [58])....

    [...]

  • ...The Metropolis-Hastings algorithm for MCMC was introduced to main-stream statistics around 1990 (Robert and Casella, 2011), though prior to that the Gibbs sampler provided a coherent approach to investigating distributions with Markov random field structure (Turčin, 1971; Grenander, 1983; Geman and Geman, 1984; Gelfand and Smith, 1990; Besag and Green, 1993; Sokal, 1993)....

    [...]

  • ...One of the simplest iterative sampling methods is the component-sweep Gibbs sampler (Geman and Geman [25], Gelman et al....

    [...]

Book
01 Jan 1995
TL;DR: Detailed notes on Bayesian Computation Basics of Markov Chain Simulation, Regression Models, and Asymptotic Theorems are provided.
Abstract: FUNDAMENTALS OF BAYESIAN INFERENCE Probability and Inference Single-Parameter Models Introduction to Multiparameter Models Asymptotics and Connections to Non-Bayesian Approaches Hierarchical Models FUNDAMENTALS OF BAYESIAN DATA ANALYSIS Model Checking Evaluating, Comparing, and Expanding Models Modeling Accounting for Data Collection Decision Analysis ADVANCED COMPUTATION Introduction to Bayesian Computation Basics of Markov Chain Simulation Computationally Efficient Markov Chain Simulation Modal and Distributional Approximations REGRESSION MODELS Introduction to Regression Models Hierarchical Linear Models Generalized Linear Models Models for Robust Inference Models for Missing Data NONLINEAR AND NONPARAMETRIC MODELS Parametric Nonlinear Models Basic Function Models Gaussian Process Models Finite Mixture Models Dirichlet Process Models APPENDICES A: Standard Probability Distributions B: Outline of Proofs of Asymptotic Theorems C: Computation in R and Stan Bibliographic Notes and Exercises appear at the end of each chapter.

16,079 citations


"Accelerated Gibbs sampling of norma..." refers methods in this paper

  • ...One of the simplest iterative sampling methods is the component-sweep Gibbs sampler (Geman and Geman, 1984; Gelman et al., 1995; Gilks et al., 1996; Rue and Held, 2005)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, three sampling-based approaches, namely stochastic substitution, the Gibbs sampler, and the sampling-importance-resampling algorithm, are compared and contrasted in relation to various joint probability structures frequently encountered in applications.
Abstract: Stochastic substitution, the Gibbs sampler, and the sampling-importance-resampling algorithm can be viewed as three alternative sampling- (or Monte Carlo-) based approaches to the calculation of numerical estimates of marginal probability distributions. The three approaches will be reviewed, compared, and contrasted in relation to various joint probability structures frequently encountered in applications. In particular, the relevance of the approaches to calculating Bayesian posterior densities for a variety of structured models will be discussed and illustrated.

6,294 citations