What are the contributions in "Convergence in variance of chebyshev accelerated" ?

Q: What are the contributions in "Convergence in variance of chebyshev accelerated" ?

In particular, the authors show that the error polynomials are identical and hence so are the convergence rates. The authors give an algorithm for the stochastic version of the second-order Chebyshev accelerated SSOR ( symmetric successive overrelaxation ) iteration and provide numerical examples of sampling from multivariate Gaussian distributions to confirm that the desired convergence properties are achieved in finite precision.

(Open Access) Convergence in Variance of Chebyshev Accelerated Gibbs Samplers (2014) | Colin Fox

SIAM J. SCI. COMPUT.



2014 Society for Industrial and Applied Mathematics

Vol. 36, No. 1, pp. A124–A147

CONVERGENCE IN VARIANCE OF CHEBYSHEV ACCELERATED

GIBBS SAMPLERS

∗

COLIN FOX

†

AND ALBERT PARKER

‡

Abstract. A stoc hastic version of a stationary linear iterativ e solver may be designed to converge

in distribution to a probability distribution with a speciﬁed mean μ and covariance matrix A

−1

A common example is Gibbs sampling applied to a multivariate Gaussian distribution which is a

stochastic version of the Gauss–Seidel linear solver. The iteration operator that acts on the error in

mean and covariance in the stochastic iteration is the same iteration operator that acts on the solution

error in the linear solver, and thus both the stationary sampler and the stationary solv er have the

same error polynomial and geometric convergence rate. The polynomial acceleration techniques that

are well known in numerical analysis for accelerating the linear solver may also be used to accelerate

the stochastic iteration. We derive ﬁrst-order and second-order Chebyshev polynomial acceleration

for the stochastic iteration to accelerate convergence in the mean and covariance by mimicking the

derivation for the linear solver. In particular, we show that the error polynomials are identical and

hence so are the con vergence rates. Thus, optimality of the Chebyshev accelerated solver implies

optimality of the Chebyshev accelerated sampler. We give an algorithm for the stochastic version of

the second-order Cheb yshev accelerated SSOR (symmetric successive overrelaxation) iteration and

provide numerical examples of sampling from multivariate Gaussian distributions to conﬁrm that

the desired convergence properties are achieved in ﬁnite precision.

Key words. Chebyshev polynomial acceleration, Gauss–Seidel, Gibbs sampling, geometric

convergence, linear solver, stochastic iteration, SSOR

AMS subject classiﬁcations. 65F10, 65B99, 62E17, 60G15, 60G60

DOI. 10.1137/120900940

1. Introduction. Iterations of the form

(1.1) x

l+1

= Gx

+ g, l =1, 2,...,

where G is a ﬁxed iteration operator and g is a ﬁxed vector, are commonplace in

numerical computation. For example, they occur in the stationary linear iterative

methods used to solve systems of linear equations [1, 10, 17, 23]. We often refer to the

associated algorithm as a solver. We consider these iterations, and also the related

stochastic iteration

(1.2) y

l+1

= Gy

+ g

,l=1, 2,...,

where now g

is a “noise” vector given by an independent draw from some ﬁxed

probability distribution with ﬁnite variance. Just as the deterministic iteration (1.1)

can be designed to converge to the solution of a linear system that is too large or

complex to solve directly, the stochastic iteration (1.2) may be designed to converge

in distribution to a target distribution that is too high dimensional, or complex, to

sample from directly. Since the stochastic iteration may be used to generate samples

∗

Submitted to the journal’s Methods and Algorithms for Scientiﬁc Computing section Decem-

ber 3, 2012; accepted for publication (in revised form) November 14, 2013; published electronically

February 4, 2014. This work was supported by the New Zealand Institute for Mathematics and its

Applications thematic programme on PDEs and Marsden contract UOO1015.

http://www.siam.org/journals/sisc/36-1/90094.html

†

Department of Physics, University of Otago, Dunedin, New Zealand (fox@physics.otago.ac.nz).

‡

Cent er for Bioﬁlm Engineering and Department of Mathematical Sciences, Montana State Uni-

versit y, Bozeman, MT 59715 (parker@math.montana.edu).

A124

Downloaded 03/07/14 to 153.90.118.169. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

CHEBYSHEV SAMPLING A125

from a desired target distribution, we often refer to the associated algorithm as a

sampler. An example is the conventional Gibbs sampling algorithm [21] applied to

sampling from a high-dimensional Gaussian distribution. In that case the iteration

operator G is identical to the iteration operator in the Gauss–Seidel iterative method

[5, 7].

Novel Gibbs samplers may be designed by considering matrix splittings other than

the Gauss–Seidel splitting [5]. Matrix splittings are considered further in section 2.

Interestingly, the deterministic and stochastic iterations converge under exactly the

same conditions, with a necessary and suﬃcient condition being that the spectral

radius of G be strictly less than 1, that is, ρ (G) < 1 [4, 26]. Convergence in both cases

is geometric, with the asymptotic average reduction factor given by ρ (G) (though this

is called the “convergence rate” in the statistics literature [19]).

A standard method of reducing the asymptotic average reduction factor is by

polynomial acceleration, particularly using Chebyshev polynomials [1, 6, 10, 23]. The

original formulation used a modiﬁed ﬁrst-order iteration, as above, though the result-

ing algorithm is impractical due to numerical diﬃculties [1]. Practical implementa-

tions use a nonstationary second-order iteration that can give optimal reduction of

error at each iteration.

In this paper, we develop polynomial acceleration for the stochastic iteration.

In particular, we develop nonstationary ﬁrst- and second-order iterations that give

optimal convergence in mean and variance to a desired target distribution. Since

convergence in mean is achieved by using exactly the linear iteration for solving a linear

system, polynomial acceleration of the mean is exactly as in the existing treatments.

Hence we focus on optimal convergence in variance that requires modiﬁcation to the

noise term. Correspondingly, we focus throughout the development on sampling from

a target distribution that has zero mean and some ﬁnite covariance matrix, and hence

the noise distribution always has zero mean. Extension to target distributions with

nonzero mean is achieved simply by adding the deterministic iteration or, equivalently,

adding a ﬁxed vector to the noise term.

We develop the sampling algorithms and demonstrate the equivalence to linear

solvers by investigating a sequence of linear iterative solvers, essentially following the

historical development in sophistication and speed, and show that exactly the same

ideas used to establish properties of the solver can be used to establish the equivalent

properties for a sampler. In particular, convergence of the solver implies convergence

of the sampler, and the convergence factors are identical, because they are given by

the same expression.

We follow the development and derivations of convergence, given in Axelsson [1],

for stationary and nonstationary (Chebyshev) ﬁrst-order and second-order methods,

set out in sect. 5.2 (Stationary Iterative Methods) and sect. 5.3 (The Chebyshev It-

erative Method). We could have equally followed the excellent presentations of the

same methods in Golub and Van Loan [10] or Saad [23]. Our own work and compu-

tational implementation actually take a route that switches between the formalism

used in these three texts. By following here the route of a single exposition, we hope

to show how establishing convergence of the stochastic versions can be made very

straightforward.

The most straightforward application of the methods we develop is to sample from

a high-dimensional Gaussian distribution, deﬁned by the mean vector μ and covariance

matrix A

−1

. We present an example which shows the convergence of the Chebyshev

sampler in ﬁnite precision applied to a Gaussian Markov random ﬁeld (GMRF) with

a known sparse precision matrix corresponding to a Mat´ern-class covariance function

Downloaded 03/07/14 to 153.90.118.169. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

A126 COLIN FOX AND ALBERT PARKER

[12, 15]. This example allows eﬃcient numerical calculation since operation by A has

reduced numerical cost.

Although we focus on the Gaussian in our numerical example, the accelerated

algorithms we give are more generally applicable to any distribution where the focus

is on the mean as a “best” estimate and the covariance as a measure of uncertainties,

with higher moments not of primary concern. This is typical in inferential methods

applied to solving inverse problems or in the growing ﬁeld of uncertainty quantiﬁca-

tion, where the mean and variance of the distribution over parameters or predicted

quantities are the primary summary statistics of interest.

1.1. Some links between sampling from distributions and solving sys-

tems of equations. Consider a probability distribution with probability density

function π(x) and the two tasks of drawing x ∼ π (x distributed as π)andcom-

puting x =argmaxπ (or solving −∇ log π = 0). We use the notation x

−i

,...,x

i−1

i+1

,...,x

)todenotealln − 1componentsofx other than x

and π (x

−i

) to denote the univariate conditional distribution over x

conditioned

on the (ﬁxed) value of all other components.

The classical Gibbs sampler or “stochastic relaxation” (also known as Glauber

dynamics and the local heat bath algorithm) for generating a sample from π is an

iterative algorithm in which one sweep consists of updating each component in se-

quence by drawing from the conditional distribution for the component with all other

components ﬁxed at the most recent value, as in Algorithm 1. Repeating this sweep

indeﬁnitely produces distributions over iterates that are guaranteed to converge (ge-

ometrically) to π under mild conditions [11, ref. 84], [19], though distributions with

nonconnected support for which Algorithm 1 fails are easy to ﬁnd [19].

Algorithm 1: One sweep of the componentwise Gibbs sampler targeting π(x)

for i =1,...,n do

sample z ∼ π (x

−i

);

= z;

end

It is not hard to see a connection between the Gibbs sampler in Algorithm 1 and

the traditional Gauss–Seidel algorithm for maximizing π which consists of repeatedly

applying the sweep over componentwise solvers with all other components ﬁxed at the

most recent value, as in Algorithm 2: Whereas the Gibbs sampler performs a compo-

nentwise conditional sampling, Gauss–Seidel performs componentwise optimization.

Algorithm 2: One sweep of Gauss–Seidel relaxation for maximizing π(x)

for i =1,...,n do

set z =argmax

π (x

−i

);

= z;

end

In statistical physics, distributions often arise with the form

π(x)=k(β)e

−βH(x)

Downloaded 03/07/14 to 153.90.118.169. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

CHEBYSHEV SAMPLING A127

where H(x) is an energy function (the Hamiltonian), β is inversely proportional to

temperature, and k is a normalizing constant. It is often noted that a sampling

algorithm may be used to minimize H(x) in the zero temperature limit, i.e., by taking

the limit β →∞. Then sampling degenerates to optimization since the distribution

is localized at the mode. In particular, Algorithm 1 reduces to Algorithm 2.

In this paper, we exploit an equivalence that operates at ﬁnite β to show how the

minimizer (or solver) may be adapted to become a sampling algorithm. For example,

in the simplest case that β =1andH is quadratic, i.e.,

H(x)=

Ax − b

for some symmetric positive deﬁnite (precision matrix) A, π is Gaussian and the

Gauss–Seidel minimizer of H becomes the Gibbs sampler for π when coordinatewise

minimization is replaced by coordinatewise conditional sampling. One sweep of the

Gibbs sampler may be written in the matrix form (1.2) with

G = M

−1

N and g

= M

−1

, where c

iid

∼ N(0,D).

Here M = L+ D and N = −L

is a splitting of the (symmetric) precision matrix A in

which L is the strictly lower triangular part of A and D is the diagonal of A [11]. This

is the same splitting used to write the Gauss–Seidel algorithm for solving Ax = b in

matrix form (1.1), with g = M

−1

b. What makes this correspondence important is that

the convergence properties of the solver are inherited by the sampler (and vice versa),

which means that acceleration techniques developed for the solver may be applied

to the sampler. The main purpose of this paper is to establish the equivalence of

convergence in mean and covariance in the case of Chebyshev polynomial acceleration,

without the assumption of the target distribution being Gaussian.

2. Matrix splitting and iteration operators. Consider the splitting

(2.1) A = M − N,

where A is a symmetric positive deﬁnite (SPD) matrix and M is invertible. For

example, for the Gauss–Seidel iteration, M is set to the lower triangular part of

A (including the diagonal). We will often consider the case where the splitting is

symmetric, which means that M is symmetric, and hence so is N. We will utilize the

family of iteration operators

(2.2) G



I − τM

−1



parameterized by the relaxation parameter τ = 0. The natural iteration operator

induced by the splitting (2.1) is the case τ = 1, which we denote by G. The nonsta-

tionary iterative methods that we consider use a sequence of iteration operators with

parameters τ

, l =0, 1, 2,..., where l denotes iteration number. We will abbreviate

by G

, where possible, to avoid subscripts on subscripts.

The iteration operator G

may also be thought of as being induced by the splitting

A = M

− N

,(2.3)

where

M and N

= N +



1 − τ



Downloaded 03/07/14 to 153.90.118.169. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

A128 COLIN FOX AND ALBERT PARKER

In the remainder of this section we list some lemmas about iteration operators

that we will use. Throughout the rest of the paper, proofs to lemmas and some

theorems have been deferred to the appendix.

Lemma 2.1. The itera tion operators G

and G

commute, that is, G

= G

for all τ, κ.

Lemma 2.2. For a symmetric splitting, G

−1

is symmetric.

The following lemma determines the variance of noise terms in sampling algo-

rithms.

Lemma 2.3. A

−1

− G

−1

= M

−1



+ N



−T

3. First-order iterative methods. We ﬁrst consider iterative solvers of the

equation

∗

= b,

where A is a given SPD matrix, b is a given vector, and the solution we seek is denoted

by x

∗

3.1. First-order stationary iterative solver. The ﬁrst-order stationary iter-

ative solver uses the iteration

l+1

= x

− τM

−1

= G

+ g

,(3.1)

where r

= Ax

− b for l =1, 2,..., the iteration operator G

is given by (2.2)

and g

= τM

−1

b. In the remainder of this section, we derive the ﬁxed point, error

polynomial, and average reduction factor for this iteration.

Lemma 3.1. The iteration in (3.1) has x

∗

as its unique ﬁxed point, i.e.,

(3.2) x

∗

= G

∗

+ g

⇔ Ax

∗

= b.

Deﬁne the error at the lth iteration by

(3.3) e

= x

− x

∗

Subtract (3.2) from (3.1) to get the iteration for error

l+1

= x

l+1

− x

∗

= G

+ g

− G

∗

− g

= G



− x

∗



= G

By recursion we prove the following theorem.

Theorem 3.2.

= G



I − τM

−1



= P



−1



where P

is the (simple) mth-order polynomial P

(λ)=(1− τλ)

Note that P

(0) = 1 and P

(1/τ) = 0. The convergence and convergence rate

of the stationary iterative solver follow from Theorem 3.2.

Axelsson [1, p. 176] gives the optimal relaxation parameter

opt

+ λ

where λ

<λ

are the extreme (positive) eigenvalues of M

−1

A, giving the average

reduction factor

(3.4) ρ

1 − λ

/λ

1+λ

/λ

Note that this implies that the iterative solver (3.1) converges for some value of τ.To

be more precise, as long as M

−1

A has all positive eigenvalues, then λ

/λ

∈ (0, 1),

which means that ρ

∈ (0, 1) and the iteration is guaranteed to converge.

Downloaded 03/07/14 to 153.90.118.169. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Convergence in Variance of Chebyshev Accelerated Gibbs Samplers

Citations

Optimal low-rank approximations of Bayesian linear inverse problems

Analyzing Hogwild Parallel Gaussian Gibbs Sampling

Accelerated Gibbs sampling of normal distributions using matrix splittings and polynomials

Accelerated Gibbs sampling of normal distributions using matrix splittings and

High-dimensional Gaussian sampling: a review and a unifying approach based on a stochastic proximal point algorithm

References

Multigrid Monte Carlo method. Conceptual foundations.

Bayesian calibration of a large-scale geothermal reservoir model by a new adaptive delayed acceptance Metropolis Hastings algorithm

The Lanczos and Conjugate Gradient Algorithms: From Theory to Finite Precision Computations

Sampling Gaussian distributions in Krylov spaces with conjugate gradients

On the Whittle-Matérn correlation family

Related Papers (5)

Sampling Gaussian distributions in Krylov spaces with conjugate gradients

Iterative Solution Methods

Updating Schemes, Correlation Structure, Blocking and Parameterization for the Gibbs Sampler

An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach

Over-relaxation method for the Monte Carlo evaluation of the partition function for multiquadratic actions

Frequently Asked Questions (1)

Q1. What are the contributions in "Convergence in variance of chebyshev accelerated" ?