scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Convergence in Variance of Chebyshev Accelerated Gibbs Samplers

04 Feb 2014-SIAM Journal on Scientific Computing (Society for Industrial and Applied Mathematics)-Vol. 36, Iss: 1
TL;DR: An algorithm for the stochastic version of the second-order Chebyshev accelerated SSOR (symmetric successive overrelaxation) iteration is given and numerical examples of sampling from multivariate Gaussian distributions are provided to confirm that the desired convergence properties are achieved in finite precision.
Abstract: A stochastic version of a stationary linear iterative solver may be designed to converge in distribution to a probability distribution with a specified mean μ and covariance matrix A−1. A common example is Gibbs sampling applied to a multivariate Gaussian distribution which is a stochastic version of the Gauss-Seidel linear solver. The iteration operator that acts on the error in mean and covariance in the stochastic iteration is the same iteration operator that acts on the solution error in the linear solver, and thus both the stationary sampler and the stationary solver have the same error polynomial and geometric convergence rate. The polynomial acceleration techniques that are well known in numerical analysis for accelerating the linear solver may also be used to accelerate the stochastic iteration. We derive first-order and second-order Chebyshev polynomial acceleration for the stochastic iteration to accelerate convergence in the mean and covariance by mimicking the derivation for the linear solver. In particular, we show that the error polynomials are identical and hence so are the convergence rates. Thus, optimality of the Chebyshev accelerated solver implies optimality of the Chebyshev accelerated sampler. We give an algorithm for the stochastic version of the second-order Chebyshev accelerated SSOR (symmetric successive overrelaxation) iteration and provide numerical examples of sampling from multivariate Gaussian distributions to confirm that the desired convergence properties are achieved in finite precision.

Summary (1 min read)

1. Introduction. Iterations of the form (1.1)

  • What makes this correspondence important is that the convergence properties of the solver are inherited by the sampler (and vice versa), which means that acceleration techniques developed for the solver may be applied to the sampler.
  • The main purpose of this paper is to establish the equivalence of convergence in mean and covariance in the case of Chebyshev polynomial acceleration, without the assumption of the target distribution being Gaussian.

12). Along the horizontal axis are values of the ratio of the extreme eigenvalues of M

  • By recursion the authors prove the following statement.
  • Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php.
  • Axelsson points out [1, Rem. 5.11] two deficiencies of the first-order Chebyshev iterative method as a solver:.
  • First, the number of steps p needs to be selected in advance, with the method not being optimal for any other number of steps.
  • The solution for iterative solvers, and hence for iterative samplers, is to develop the second-order methods, which have neither of these deficiencies.

5. Numerical examples sampling from Gaussians at different resolutions.

  • The following examples both use a cubic-element discretization of the cubic domain [0, 1] 3 , with trilinear interpolation from nodal values within each element.
  • The examples also both use R = 1/4, though they differ in the number of nodes (or elements) in each coordinate direction.

6. Discussion.

  • While performing this research the authors have recognized the debt they othey to (the late) Gene Golub, who pioneered first-and second-order Chebyshev acceleration for linear solvers [9] , which they have built upon.
  • The authors are pleased to demonstrate the connection between Gene's work and the sampling algorithms from statistics by publishing in this journal that Gene had wanted to remain titled the Journal on Scientific and Statistical Computing [22] .

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SIAM J. SCI. COMPUT.
c
2014 Society for Industrial and Applied Mathematics
Vol. 36, No. 1, pp. A124–A147
CONVERGENCE IN VARIANCE OF CHEBYSHEV ACCELERATED
GIBBS SAMPLERS
COLIN FOX
AND ALBERT PARKER
Abstract. A stoc hastic version of a stationary linear iterativ e solver may be designed to converge
in distribution to a probability distribution with a specified mean μ and covariance matrix A
1
.
A common example is Gibbs sampling applied to a multivariate Gaussian distribution which is a
stochastic version of the Gauss–Seidel linear solver. The iteration operator that acts on the error in
mean and covariance in the stochastic iteration is the same iteration operator that acts on the solution
error in the linear solver, and thus both the stationary sampler and the stationary solv er have the
same error polynomial and geometric convergence rate. The polynomial acceleration techniques that
are well known in numerical analysis for accelerating the linear solver may also be used to accelerate
the stochastic iteration. We derive first-order and second-order Chebyshev polynomial acceleration
for the stochastic iteration to accelerate convergence in the mean and covariance by mimicking the
derivation for the linear solver. In particular, we show that the error polynomials are identical and
hence so are the con vergence rates. Thus, optimality of the Chebyshev accelerated solver implies
optimality of the Chebyshev accelerated sampler. We give an algorithm for the stochastic version of
the second-order Cheb yshev accelerated SSOR (symmetric successive overrelaxation) iteration and
provide numerical examples of sampling from multivariate Gaussian distributions to confirm that
the desired convergence properties are achieved in finite precision.
Key words. Chebyshev polynomial acceleration, Gauss–Seidel, Gibbs sampling, geometric
convergence, linear solver, stochastic iteration, SSOR
AMS subject classifications. 65F10, 65B99, 62E17, 60G15, 60G60
DOI. 10.1137/120900940
1. Introduction. Iterations of the form
(1.1) x
l+1
= Gx
l
+ g, l =1, 2,...,
where G is a fixed iteration operator and g is a fixed vector, are commonplace in
numerical computation. For example, they occur in the stationary linear iterative
methods used to solve systems of linear equations [1, 10, 17, 23]. We often refer to the
associated algorithm as a solver. We consider these iterations, and also the related
stochastic iteration
(1.2) y
l+1
= Gy
l
+ g
l
,l=1, 2,...,
where now g
l
is a “noise” vector given by an independent draw from some fixed
probability distribution with finite variance. Just as the deterministic iteration (1.1)
can be designed to converge to the solution of a linear system that is too large or
complex to solve directly, the stochastic iteration (1.2) may be designed to converge
in distribution to a target distribution that is too high dimensional, or complex, to
sample from directly. Since the stochastic iteration may be used to generate samples
Submitted to the journal’s Methods and Algorithms for Scientific Computing section Decem-
ber 3, 2012; accepted for publication (in revised form) November 14, 2013; published electronically
February 4, 2014. This work was supported by the New Zealand Institute for Mathematics and its
Applications thematic programme on PDEs and Marsden contract UOO1015.
http://www.siam.org/journals/sisc/36-1/90094.html
Department of Physics, University of Otago, Dunedin, New Zealand (fox@physics.otago.ac.nz).
Cent er for Biofilm Engineering and Department of Mathematical Sciences, Montana State Uni-
versit y, Bozeman, MT 59715 (parker@math.montana.edu).
A124
Downloaded 03/07/14 to 153.90.118.169. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
CHEBYSHEV SAMPLING A125
from a desired target distribution, we often refer to the associated algorithm as a
sampler. An example is the conventional Gibbs sampling algorithm [21] applied to
sampling from a high-dimensional Gaussian distribution. In that case the iteration
operator G is identical to the iteration operator in the Gauss–Seidel iterative method
[5, 7].
Novel Gibbs samplers may be designed by considering matrix splittings other than
the Gauss–Seidel splitting [5]. Matrix splittings are considered further in section 2.
Interestingly, the deterministic and stochastic iterations converge under exactly the
same conditions, with a necessary and sufficient condition being that the spectral
radius of G be strictly less than 1, that is, ρ (G) < 1 [4, 26]. Convergence in both cases
is geometric, with the asymptotic average reduction factor given by ρ (G) (though this
is called the “convergence rate” in the statistics literature [19]).
A standard method of reducing the asymptotic average reduction factor is by
polynomial acceleration, particularly using Chebyshev polynomials [1, 6, 10, 23]. The
original formulation used a modified first-order iteration, as above, though the result-
ing algorithm is impractical due to numerical difficulties [1]. Practical implementa-
tions use a nonstationary second-order iteration that can give optimal reduction of
error at each iteration.
In this paper, we develop polynomial acceleration for the stochastic iteration.
In particular, we develop nonstationary first- and second-order iterations that give
optimal convergence in mean and variance to a desired target distribution. Since
convergence in mean is achieved by using exactly the linear iteration for solving a linear
system, polynomial acceleration of the mean is exactly as in the existing treatments.
Hence we focus on optimal convergence in variance that requires modification to the
noise term. Correspondingly, we focus throughout the development on sampling from
a target distribution that has zero mean and some finite covariance matrix, and hence
the noise distribution always has zero mean. Extension to target distributions with
nonzero mean is achieved simply by adding the deterministic iteration or, equivalently,
adding a fixed vector to the noise term.
We develop the sampling algorithms and demonstrate the equivalence to linear
solvers by investigating a sequence of linear iterative solvers, essentially following the
historical development in sophistication and speed, and show that exactly the same
ideas used to establish properties of the solver can be used to establish the equivalent
properties for a sampler. In particular, convergence of the solver implies convergence
of the sampler, and the convergence factors are identical, because they are given by
the same expression.
We follow the development and derivations of convergence, given in Axelsson [1],
for stationary and nonstationary (Chebyshev) first-order and second-order methods,
set out in sect. 5.2 (Stationary Iterative Methods) and sect. 5.3 (The Chebyshev It-
erative Method). We could have equally followed the excellent presentations of the
same methods in Golub and Van Loan [10] or Saad [23]. Our own work and compu-
tational implementation actually take a route that switches between the formalism
used in these three texts. By following here the route of a single exposition, we hope
to show how establishing convergence of the stochastic versions can be made very
straightforward.
The most straightforward application of the methods we develop is to sample from
a high-dimensional Gaussian distribution, defined by the mean vector μ and covariance
matrix A
1
. We present an example which shows the convergence of the Chebyshev
sampler in finite precision applied to a Gaussian Markov random field (GMRF) with
a known sparse precision matrix corresponding to a Mat´ern-class covariance function
Downloaded 03/07/14 to 153.90.118.169. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A126 COLIN FOX AND ALBERT PARKER
[12, 15]. This example allows efficient numerical calculation since operation by A has
reduced numerical cost.
Although we focus on the Gaussian in our numerical example, the accelerated
algorithms we give are more generally applicable to any distribution where the focus
is on the mean as a “best” estimate and the covariance as a measure of uncertainties,
with higher moments not of primary concern. This is typical in inferential methods
applied to solving inverse problems or in the growing field of uncertainty quantifica-
tion, where the mean and variance of the distribution over parameters or predicted
quantities are the primary summary statistics of interest.
1.1. Some links between sampling from distributions and solving sys-
tems of equations. Consider a probability distribution with probability density
function π(x) and the two tasks of drawing x π (x distributed as π)andcom-
puting x =argmaxπ (or solving −∇ log π = 0). We use the notation x
i
=
(x
1
,x
2
,...,x
i1
,x
i+1
,...,x
n
)todenotealln 1componentsofx other than x
i
,
and π (x
i
|x
i
) to denote the univariate conditional distribution over x
i
conditioned
on the (fixed) value of all other components.
The classical Gibbs sampler or “stochastic relaxation” (also known as Glauber
dynamics and the local heat bath algorithm) for generating a sample from π is an
iterative algorithm in which one sweep consists of updating each component in se-
quence by drawing from the conditional distribution for the component with all other
components fixed at the most recent value, as in Algorithm 1. Repeating this sweep
indefinitely produces distributions over iterates that are guaranteed to converge (ge-
ometrically) to π under mild conditions [11, ref. 84], [19], though distributions with
nonconnected support for which Algorithm 1 fails are easy to find [19].
Algorithm 1: One sweep of the componentwise Gibbs sampler targeting π(x)
for i =1,...,n do
sample z π (x
i
|x
i
);
x
i
= z;
end
It is not hard to see a connection between the Gibbs sampler in Algorithm 1 and
the traditional Gauss–Seidel algorithm for maximizing π which consists of repeatedly
applying the sweep over componentwise solvers with all other components fixed at the
most recent value, as in Algorithm 2: Whereas the Gibbs sampler performs a compo-
nentwise conditional sampling, Gauss–Seidel performs componentwise optimization.
Algorithm 2: One sweep of Gauss–Seidel relaxation for maximizing π(x)
for i =1,...,n do
set z =argmax
x
i
π (x
i
|x
i
);
x
i
= z;
end
In statistical physics, distributions often arise with the form
π(x)=k(β)e
βH(x)
,
Downloaded 03/07/14 to 153.90.118.169. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
CHEBYSHEV SAMPLING A127
where H(x) is an energy function (the Hamiltonian), β is inversely proportional to
temperature, and k is a normalizing constant. It is often noted that a sampling
algorithm may be used to minimize H(x) in the zero temperature limit, i.e., by taking
the limit β →∞. Then sampling degenerates to optimization since the distribution
is localized at the mode. In particular, Algorithm 1 reduces to Algorithm 2.
In this paper, we exploit an equivalence that operates at finite β to show how the
minimizer (or solver) may be adapted to become a sampling algorithm. For example,
in the simplest case that β =1andH is quadratic, i.e.,
H(x)=
1
2
x
T
Ax b
T
x
for some symmetric positive definite (precision matrix) A, π is Gaussian and the
Gauss–Seidel minimizer of H becomes the Gibbs sampler for π when coordinatewise
minimization is replaced by coordinatewise conditional sampling. One sweep of the
Gibbs sampler may be written in the matrix form (1.2) with
G = M
1
N and g
l
= M
1
c
l
, where c
l
iid
N(0,D).
Here M = L+ D and N = L
T
is a splitting of the (symmetric) precision matrix A in
which L is the strictly lower triangular part of A and D is the diagonal of A [11]. This
is the same splitting used to write the Gauss–Seidel algorithm for solving Ax = b in
matrix form (1.1), with g = M
1
b. What makes this correspondence important is that
the convergence properties of the solver are inherited by the sampler (and vice versa),
which means that acceleration techniques developed for the solver may be applied
to the sampler. The main purpose of this paper is to establish the equivalence of
convergence in mean and covariance in the case of Chebyshev polynomial acceleration,
without the assumption of the target distribution being Gaussian.
2. Matrix splitting and iteration operators. Consider the splitting
(2.1) A = M N,
where A is a symmetric positive definite (SPD) matrix and M is invertible. For
example, for the Gauss–Seidel iteration, M is set to the lower triangular part of
A (including the diagonal). We will often consider the case where the splitting is
symmetric, which means that M is symmetric, and hence so is N. We will utilize the
family of iteration operators
(2.2) G
τ
=
I τM
1
A
parameterized by the relaxation parameter τ = 0. The natural iteration operator
induced by the splitting (2.1) is the case τ = 1, which we denote by G. The nonsta-
tionary iterative methods that we consider use a sequence of iteration operators with
parameters τ
l
, l =0, 1, 2,..., where l denotes iteration number. We will abbreviate
G
τ
l
by G
l
, where possible, to avoid subscripts on subscripts.
The iteration operator G
τ
may also be thought of as being induced by the splitting
A = M
τ
N
τ
,(2.3)
where
M
τ
=
1
τ
M and N
τ
= N +
1 τ
τ
M.
Downloaded 03/07/14 to 153.90.118.169. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A128 COLIN FOX AND ALBERT PARKER
In the remainder of this section we list some lemmas about iteration operators
that we will use. Throughout the rest of the paper, proofs to lemmas and some
theorems have been deferred to the appendix.
Lemma 2.1. The itera tion operators G
τ
and G
κ
commute, that is, G
τ
G
κ
= G
κ
G
τ
for all τ, κ.
Lemma 2.2. For a symmetric splitting, G
τ
A
1
is symmetric.
The following lemma determines the variance of noise terms in sampling algo-
rithms.
Lemma 2.3. A
1
G
τ
A
1
G
T
τ
= M
1
τ
M
T
τ
+ N
τ
M
T
τ
.
3. First-order iterative methods. We first consider iterative solvers of the
equation
Ax
= b,
where A is a given SPD matrix, b is a given vector, and the solution we seek is denoted
by x
.
3.1. First-order stationary iterative solver. The first-order stationary iter-
ative solver uses the iteration
x
l+1
= x
l
τM
1
r
l
= G
τ
x
l
+ g
τ
,(3.1)
where r
l
= Ax
l
b for l =1, 2,..., the iteration operator G
τ
is given by (2.2)
and g
τ
= τM
1
b. In the remainder of this section, we derive the fixed point, error
polynomial, and average reduction factor for this iteration.
Lemma 3.1. The iteration in (3.1) has x
as its unique fixed point, i.e.,
(3.2) x
= G
τ
x
+ g
τ
Ax
= b.
Define the error at the lth iteration by
(3.3) e
l
= x
l
x
.
Subtract (3.2) from (3.1) to get the iteration for error
e
l+1
= x
l+1
x
= G
τ
x
l
+ g
τ
G
τ
x
g
τ
= G
τ
x
l
x
= G
τ
e
l
.
By recursion we prove the following theorem.
Theorem 3.2.
e
m
= G
m
τ
e
0
=
I τM
1
A
m
e
0
= P
m
M
1
A
e
0
,
where P
m
is the (simple) mth-order polynomial P
m
(λ)=(1 τλ)
m
.
Note that P
m
(0) = 1 and P
m
(1) = 0. The convergence and convergence rate
of the stationary iterative solver follow from Theorem 3.2.
Axelsson [1, p. 176] gives the optimal relaxation parameter
τ
opt
=
2
λ
1
+ λ
n
,
where λ
1
n
are the extreme (positive) eigenvalues of M
1
A, giving the average
reduction factor
(3.4) ρ
0
=
1 λ
1
n
1+λ
1
n
.
Note that this implies that the iterative solver (3.1) converges for some value of τ.To
be more precise, as long as M
1
A has all positive eigenvalues, then λ
1
n
(0, 1),
which means that ρ
0
(0, 1) and the iteration is guaranteed to converge.
Downloaded 03/07/14 to 153.90.118.169. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, a low-rank update of the prior covariance matrix is proposed to characterize and approximate the posterior distribution of the parameters in inverse problems, based on the leading eigendirections of the matrix pencil defined by the Hessian of the negative log-likelihood and the prior precision.
Abstract: In the Bayesian approach to inverse problems, data are often informative, relative to the prior, only on a low-dimensional subspace of the parameter space. Significant computational savings can be achieved by using this subspace to characterize and approximate the posterior distribution of the parameters. We first investigate approximation of the posterior covariance matrix as a low-rank update of the prior covariance matrix. We prove optimality of a particular update, based on the leading eigendirections of the matrix pencil defined by the Hessian of the negative log-likelihood and the prior precision, for a broad class of loss functions. This class includes the Forstner metric for symmetric positive definite matrices, as well as the Kullback--Leibler divergence and the Hellinger distance between the associated distributions. We also propose two fast approximations of the posterior mean and prove their optimality with respect to a weighted Bayes risk under squared-error loss. These approximations are dep...

124 citations

Proceedings Article
05 Dec 2013
TL;DR: It is shown that if the Gaussian precision matrix is generalized diagonally dominant, then any Hogwild Gibbs sampler, with any update schedule or allocation of variables to processors, yields a stable sampling process with the correct sample mean.
Abstract: Sampling inference methods are computationally difficult to scale for many models in part because global dependencies can reduce opportunities for parallel computation. Without strict conditional independence structure among variables, standard Gibbs sampling theory requires sample updates to be performed sequentially, even if dependence between most variables is not strong. Empirical work has shown that some models can be sampled effectively by going "Hogwild" and simply running Gibbs updates in parallel with only periodic global communication, but the successes and limitations of such a strategy are not well understood. As a step towards such an understanding, we study the Hogwild Gibbs sampling strategy in the context of Gaussian distributions. We develop a framework which provides convergence conditions and error bounds along with simple proofs and connections to methods in numerical linear algebra. In particular, we show that if the Gaussian precision matrix is generalized diagonally dominant, then any Hogwild Gibbs sampler, with any update schedule or allocation of variables to processors, yields a stable sampling process with the correct sample mean.

63 citations

Journal ArticleDOI
TL;DR: Standard Gibbs sampling applied to a multivariate normal distribution with a specified precision matrix is equivalent in fundamental ways to the Gauss-Seidel iterative solution of linear equations in the precision matrix, giving easy access to mature results in numerical linear algebra.
Abstract: Standard Gibbs sampling applied to a multivariate normal distribution with a specified precision matrix is equivalent in fundamental ways to the Gauss–Seidel iterative solution of linear equations in the precision matrix. Specifically, the iteration operators, the conditions under which convergence occurs, and geometric convergence factors (and rates) are identical. These results hold for arbitrary matrix splittings from classical iterative methods in numerical linear algebra giving easy access to mature results in that field, including existing convergence results for antithetic-variable Gibbs sampling, REGS sampling, and generalizations. Hence, efficient deterministic stationary relaxation schemes lead to efficient generalizations of Gibbs sampling. The technique of polynomial acceleration that significantly improves the convergence rate of an iterative solver derived from a symmetric matrix splitting may be applied to accelerate the equivalent generalized Gibbs sampler. Identicality of error polynomials guarantees convergence of the inhomogeneous Markov chain, while equality of convergence factors ensures that the optimal solver leads to the optimal sampler. Numerical examples are presented, including a Chebyshev accelerated SSOR Gibbs sampler applied to a stylized demonstration of low-level Bayesian image reconstruction in a large 3-dimensional linear inverse problem.

21 citations


Cites background or methods from "Convergence in Variance of Chebyshe..."

  • ...The Chebyshev accelerated SSOR solver and corresponding Chebyshev accelerated SSOR sampler (Fox and Parker, 2014) are depicted in panels C and D of Figure 1....

    [...]

  • ...But even sooner, after k∗∗ = k∗/2 iterations, the Chebyshev error reduction for the variance is predicted to be smaller than ε (Fox and Parker [19])....

    [...]

  • ...Using Theorem 5, we derived the Chebyshev accelerated SSOR sampler (Fox and Parker, 2014) by iteratively updating parameters via (13) and then generating a sampler via (17)....

    [...]

  • ...But even sooner, after k∗∗ = k∗/2 iterations, the Chebyshev error reduction for the variance is predicted to be smaller than ε (Fox and Parker, 2014). imsart-bj ver....

    [...]

  • ...Fox and Parker (2014) considered point-wise convergence of the mean and variance of a Gibbs SSOR sampler accelerated by Chebyshev polynomials....

    [...]

Posted Content
TL;DR: This paper proposes a unifying Gaussian simulation framework by deriving a stochastic counterpart of the celebrated proximal point algorithm in optimization and offers a novel and unifying revisit of most of the existing MCMC approaches while extending them.
Abstract: Efficient sampling from a high-dimensional Gaussian distribution is an old but high-stake issue. Vanilla Cholesky samplers imply a computational cost and memory requirements which can rapidly become prohibitive in high dimension. To tackle these issues, multiple methods have been proposed from different communities ranging from iterative numerical linear algebra to Markov chain Monte Carlo (MCMC) approaches. Surprisingly, no complete review and comparison of these methods have been conducted. This paper aims at reviewing all these approaches by pointing out their differences, close relations, benefits and limitations. In addition to this state of the art, this paper proposes a unifying Gaussian simulation framework by deriving a stochastic counterpart of the celebrated proximal point algorithm in optimization. This framework offers a novel and unifying revisit of most of the existing MCMC approaches while extending them. Guidelines to choose the appropriate Gaussian simulation method for a given sampling problem in high dimension are proposed and illustrated with numerical examples.

17 citations

References
More filters
Book
01 Jan 1971
TL;DR: The ASM preconditioner B is characterized by three parameters: C0, ρ(E) , and ω , which enter via assumptions on the subspaces Vi and the bilinear forms ai(·, ·) (the approximate local problems).
Abstract: theory for ASM. In the following we characterize the ASM preconditioner B by three parameters: C0 , ρ(E) , and ω , which enter via assumptions on the subspaces Vi and the bilinear forms ai(·, ·) (the approximate local problems). Assumption 14.6 (stable decomposition) There exists a constant C0 > 0 such that every u ∈ V admits a decomposition u = ∑N i=0 ui with ui ∈ Vi such that N ∑ i=0 ai(ui, ui) ≤ C 0 a(u, u) (14.29) Assumption 14.7 (strengthened Cauchy-Schwarz inequality) For i, j = 1 . . . N , let Ei,j = Ej,i ∈ [0, 1] be defined by the inequalities |a(ui, uj)| ≤ Ei,j a(ui, ui) a(uj, uj) ∀ ui ∈ Vi, uj ∈ Vj (14.30) By ρ(E) we denote the spectral radius of the symmetric matrix E = (Ei,j) ∈ RN×N . The particular assumption is that we have a nontrivial bound for ρ(E) to our disposal. Note that due to Ei,j ≤ 1 (Cauchy-Schwarz inequality), the trivial bound ρ(E) = ∥E∥2 ≤ √ ∥E∥1 ∥E∥∞ ≤ N always holds; for particular Schwarz methods one usually aims at bounds for ρ(E) which are independent of N . Ed. 2011 Iterative Solution of Large Linear Systems 14.2 Additive Schwarz methods (ASM) 159 Assumption 14.8 (local stability) There exists ω > 0 such that for all i = 1 . . . N : a(ui, ui) ≤ ω ai(ui, ui) ∀ ui ∈ Vi (14.31) Remark 14.9 The space V0 is not included in the definition of E ; as we will see below, this space is allowed to play a special role. Ei,j = 0 implies that the spaces Vi and Vj are orthogonal (in the a(·, ·) inner product). We will see below that small ρ(E) is desirable. We will also see below that a small C0 is desirable. The parameter ω represents a one-sided measure of the approximation properties of the approximate solvers ai . If the local solver is of (exact) Galerkin type, i.e, ai(u, v) ≡ a(u, v) for u, v ∈ Vi , then ω = 1 . However, this does not necessarily imply that Assumptions 14.6 and 14.7 are satisfied. Lemma 14.10 (P. L. Lions) Let PASM be defined by (14.23) resp. (14.24). Then, under Assumption 14.6, (i) PASM : V → V is a bijection, and a(u, u) ≤ C 0 a(PASM u, u) ∀ u ∈ V (14.32) (ii) Characterization of b(u, u) : b(u, u) = a(P−1 ASM u, u) = min { N ∑ i=0 ai(ui, ui) : u = N ∑ i=0 ui, ui ∈ Vi } (14.33) Proof: We make use of the fundamental identity (14.27) and Cauchy-Schwarz inequalites. Proof of (i): Let u ∈ V and u = ∑ i ui be a decomposition of the type guaranteed by Assumption 14.6. Then: a(u, u) = a(u, ∑ i ui) = ∑ i a(u, ui) = ∑ i ai(Pi u, ui) ≤ ∑ i √ ai(Pi u, Pi u) ai(ui, ui) = ∑ i √ a(u, Pi u) ai(ui, ui) ≤ √∑ i a(u, Pi u) √∑ i ai(ui, ui) = √ a(u, PASM u) √∑ i ai(ui, ui) ≤ √ a(u, PASM u)C0 √ a(u, u) This implies the estimate (14.32). In particular, it follows that PASM is injective, because with (14.32), PASM u = 0 implies a(u, u) = 0 , hence u = 0 . Due to finite dimension, we conclude that PASM is bijective. Proof of (ii): We first show that the minimum on the right-hand side of (14.33) cannot be smaller than a(P−1 ASM u, u) . To this end, we consider an arbitrary decomposition u = ∑ i ui with ui ∈ Vi and estimate a(P−1 ASM u, u) = ∑ i a(P −1 ASM u, ui) = ∑ i ai(PiP −1 ASM u, ui) ≤ √∑ i ai(PiP −1 ASM u, PiP −1 ASM u) √∑ i ai(ui, ui) = √∑ i a(P −1 ASM u, PiP −1 ASM u) √∑ i ai(ui, ui) = √ a(P−1 ASM u, u) √∑ i ai(ui, ui) In order to see that a(P−1 ASM u, u) is indeed the minimum of the right-hand side of (14.33), we define ui = PiP −1 ASM u . Obviously, ui ∈ Vi and ∑ i ui = u . Furthermore, ∑ i ai(ui, ui) = ∑ i ai(PiP −1 ASM u, PiP −1 ASM u) = ∑ i a(P −1 ASM u, PiP −1 ASM u) = a(P−1 ASM u, ∑ i PiP −1 ASM u) = a(P −1 ASM u, u) This concludes the proof. Iterative Solution of Large Linear Systems Ed. 2011 160 14 SUBSTRUCTURING METHODS The matrix P ′ ASM = B −1A from (14.23) is the matrix representation of the operator PASM . Since PASM is self-adjoint in the A -inner product (see Lemma 14.2), we can estimate the smallest and the largest eigenvalue of B−1A by: λmin(B −1A) = inf 0 ̸=u ∈V a(PASM u, u) a(u, u) , λmax(B −1A) = sup 0 ̸=u ∈V a(PASM u, u) a(u, u) (14.34) Lemma 14.10, (i) in conjunction with Assumption 14.6 readily yields λmin(B −1A) ≥ 1 C 0 An upper bound for λmax(B −1A) is obtained with the help of the following lemma. Lemma 14.11 Under Assumptions 14.7 and 14.8 we have ∥Pi∥A ≤ ω, i = 0 . . . N (14.35) a(PASM u, u) ≤ ω (1 + ρ(E)) a(u, u) for all u ∈ V (14.36) Proof: Again we make use of identity (14.27). We start with the proof of (14.35): From Assumption 14.8, (14.31) we infer for all u ∈ V : ∥Pi u∥2A = a(Pi u, Pi u) ≤ ω ai(Pi u, Pi u) = ω a(u, Pi u) ≤ ω ∥u∥A ∥Pi u∥A which implies (14.35). For the proof of (14.36), we observe that the space V0 is assumed to play a special role. We define

2,527 citations


"Convergence in Variance of Chebyshe..." refers background in this paper

  • ...Interestingly, the deterministic and stochastic iterations converge under exactly the same conditions, with a necessary and sufficient condition being that the spectral radius of G be strictly less than 1, that is, ρ (G) < 1 [4, 26]....

    [...]

Journal ArticleDOI
TL;DR: An adaptive Metropolis (AM) algorithm, where the Gaussian proposal distribution is updated along the process using the full information cumulated so far, which establishes here that it has the correct ergodic properties.
Abstract: A proper choice of a proposal distribution for Markov chain Monte Carlo methods, for example for the Metropolis-Hastings algorithm, is well known to be a crucial factor for the convergence of the algorithm. In this paper we introduce an adaptive Metropolis (AM) algorithm, where the Gaussian proposal distribution is updated along the process using the full information cumulated so far. Due to the adaptive nature of the process, the AM algorithm is non-Markovian, but we establish here that it has the correct ergodic properties. We also include the results of our numerical tests, which indicate that the AM algorithm competes well with traditional Metropolis-Hastings algorithms, and demonstrate that the AM algorithm is easy to use in practical computation.

2,511 citations


"Convergence in Variance of Chebyshe..." refers methods in this paper

  • ...The recent advent of adaptive Monte Carlo methods [13, 20] does offer the possibility of adapting to the mean and covariance within the iteration, as in the adaptive Metropolis (AM) algorithm....

    [...]

Journal ArticleDOI
TL;DR: It is shown that, using an approximate stochastic weak solution to (linear) stochastically partial differential equations, some Gaussian fields in the Matérn class can provide an explicit link, for any triangulation of , between GFs and GMRFs, formulated as a basis function representation.
Abstract: Continuously indexed Gaussian fields (GFs) are the most important ingredient in spatial statistical modelling and geostatistics. The specification through the covariance function gives an intuitive interpretation of the field properties. On the computational side, GFs are hampered with the big n problem, since the cost of factorizing dense matrices is cubic in the dimension. Although computational power today is at an all time high, this fact seems still to be a computational bottleneck in many applications. Along with GFs, there is the class of Gaussian Markov random fields (GMRFs) which are discretely indexed. The Markov property makes the precision matrix involved sparse, which enables the use of numerical algorithms for sparse matrices, that for fields in R-2 only use the square root of the time required by general algorithms. The specification of a GMRF is through its full conditional distributions but its marginal properties are not transparent in such a parameterization. We show that, using an approximate stochastic weak solution to (linear) stochastic partial differential equations, we can, for some GFs in the Matern class, provide an explicit link, for any triangulation of R-d, between GFs and GMRFs, formulated as a basis function representation. The consequence is that we can take the best from the two worlds and do the modelling by using GFs but do the computations by using GMRFs. Perhaps more importantly, our approach generalizes to other covariance functions generated by SPDEs, including oscillating and non-stationary GFs, as well as GFs on manifolds. We illustrate our approach by analysing global temperature data with a non-stationary model defined on a sphere. (Less)

2,212 citations


"Convergence in Variance of Chebyshe..." refers methods in this paper

  • ...Our example uses the relationship between stationary GMRFs and stochastic PDEs that was noted by Whittle [25] for the Matérn (or Whittle–Matérn; see [12]) class of covariance functions and that was also exploited in [3, 15]....

    [...]

Book
05 Aug 2012
TL;DR: This paper presents a meta-analyses of matrix eigenvalues and condition numbers for preconditional matrices using the framework of the Perron-Frobenius theory for nonnegative matrices and some simple iterative methods.
Abstract: Preface Acknowledgements 1. Direct solution methods 2. Theory of matrix eigenvalues 3. Positive definite matrices, Schur complements, and generalized eigenvalue problems 4. Reducible and irreducible matrices and the Perron-Frobenius theory for nonnegative matrices 5. Basic iterative methods and their rates of convergence 6. M-matrices, convergent splittings, and the SOR method 7. Incomplete factorization preconditioning methods 8. Approximate matrix inverses and corresponding preconditioning methods 9. Block diagonal and Schur complement preconditionings 10. Estimates of eigenvalues and condition numbers for preconditional matrices 11. Conjugate gradient and Lanczos-type methods 12. Generalized conjugate gradient methods 13. The rate of convergence of the conjugate gradient method Appendices.

2,043 citations


"Convergence in Variance of Chebyshe..." refers methods in this paper

  • ...A standard method of reducing the asymptotic average reduction factor is by polynomial acceleration, particularly using Chebyshev polynomials [1, 6, 10, 23]....

    [...]

  • ...The original formulation used a modified first-order iteration, as above, though the resulting algorithm is impractical due to numerical difficulties [1]....

    [...]

  • ...We follow Axelsson by considering the splitting with M = I, from which the general case follows by considering the (preconditioned) equations with M−1A in place of A....

    [...]

  • ...Axelsson’s result in (4.4) that specifies the required number of iterations to achieve a desired error reduction in the solver suggests that, for any ε > 0, after (4.13) p∗ = ⌈ ln(ε/2) ln(σ2) ⌉ iterations, the variance error reduction is smaller than ε. 4.3....

    [...]

  • ...Axelsson gives this result [1, p. 183], and it is interesting to note that a little good fortune happens; the three equations can be satisfied with just two coefficients because the recursion for the Chebyshev polynomials turns one equation into a second....

    [...]

Book
01 Jan 1962
TL;DR: This inexpensive paperback edition of a groundbreaking classic is unique in its emphasis on the frequency approach and its use in the solution of problems.
Abstract: From the Publisher: For this inexpensive paperback edition of a groundbreaking classic, the author has extensively rearranged, rewritten and enlarged the material. Book is unique in its emphasis on the frequency approach and its use in the solution of problems. Contents include: Fundamentals and Algorithms; Polynomial Approximation— Classical Theory; Fourier Approximation—Modern Therory; Exponential Approximation.

1,612 citations

Frequently Asked Questions (1)
Q1. What are the contributions in "Convergence in variance of chebyshev accelerated" ?

In particular, the authors show that the error polynomials are identical and hence so are the convergence rates. The authors give an algorithm for the stochastic version of the second-order Chebyshev accelerated SSOR ( symmetric successive overrelaxation ) iteration and provide numerical examples of sampling from multivariate Gaussian distributions to confirm that the desired convergence properties are achieved in finite precision.