scispace - formally typeset
Search or ask a question
Journal ArticleDOI

On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators

TL;DR: This paper shows, by means of an operator called asplitting operator, that the Douglas—Rachford splitting method for finding a zero of the sum of two monotone operators is a special case of the proximal point algorithm, which allows the unification and generalization of a variety of convex programming algorithms.
Abstract: This paper shows, by means of an operator called asplitting operator, that the Douglas--Rachford splitting method for finding a zero of the sum of two monotone operators is a special case of the proximal point algorithm. Therefore, applications of Douglas--Rachford splitting, such as the alternating direction method of multipliers for convex programming decomposition, are also special cases of the proximal point algorithm. This observation allows the unification and generalization of a variety of convex programming algorithms. By introducing a modified version of the proximal point algorithm, we derive a new,generalized alternating direction method of multipliers for convex programming. Advances of this sort illustrate the power and generality gained by adopting monotone operator theory as a conceptual framework.

Summary (2 min read)

I. Introduction

  • The theory of maximal set-valued monotone operators (see, for example, [4] ) provides a powerful general framework for the study of convex programming and variational inequalities.
  • Some preliminary versions of their results have also appeared in [10] .
  • This paper is organized as follows: Section 2 introduces the basic theory of monotone operators in Hilbert space, while Section 3 proves the convergence of a generalized form of the proximal point algorithm.
  • Section 4 discusses Douglas-Rachford splitting, showing it to be a special case of the proximal point algorithm by means of a specially-constructed splitting operator.

2. Monotone operators

  • The following lemma summarizes some well-known properties of firmly nonexpansive operators.
  • The authors now give a critical theorem, The "only if" part of the following theorem has been well known for some time [48] , but the "if" part, just as easily obtained, appears to have been obscure.
  • The purpose here is to stress the complete symmetry that exists between monotone operators and (full-domained) firmly nonexpansive operators over any Hilbert space. [37] , but is not identical (Minty did not use the concept of firm nonexpansiveness; but see also [28] ).
  • In the case that T is the subdifferential map Of of a convex function f, zer(T) is the set of all global minima off.
  • The zeroes of a monotone operator precisely coincide with the fixed points of its resolvents:.

3. A generalized proximal point algorithm

  • Gol'shtein and Tret'yakov also allow resolvents to be evaluated approximately, but, unlike Rockafellar, do not allow the stepsize c to vary with k, restrict Y{ to be finitedimensional, and do not consider the case in which zer(T)=0.
  • The following theorem effectively combines the results of Rockafellar and Gol'shtein-Tret'yakov.
  • The notation "-~" denotes convergence in the weak topology on Y(, where "--*" denotes convergence in the strong topology induced by the usual norm (x, x} ~/2.

Vk>~O.

  • In at least one real example [11, Section 7.2.3], using the generalized Douglas-Rachford splitting method with relaxation factors.
  • Pk other than 1 has been shown to converge faster than regular Douglas-Rachford splitting.
  • This example involved highly parallel algorithm for linear programming which will be described in a later paper.
  • Thus, the inclusion of over-relaxation factors is of some practical significance.
  • In addition, the convergence of Douglas-Rachford splitting with approximate calculation of resolvents had not been formerly established.

5. Some interesting special eases

  • The authors can now state a new variation o~ the alternating direction method of multipliers for (P):.
  • Then if (P) has a Kuhn-Tucker pair, {x k} converges to a solution of (P) and {pk} converges to a solution of the dual problem (D).

6. Concerning finite termination

  • All operators of this form are staircase (in fact, for any y 6 V ~, 8(y) may be taken arbitrarily large).
  • Define the following linear subspaces of E2:.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Mathematical Programming 55 (1992) 293-318 293
North-Holland
On the Douglas-Rachford splitting method
and the proximal point algorithm for
maximal monotone operators
Jonathan Eckstein
Mathematical Sciences Research Group, Thinking Machines Corporation, Cambridge, MA 02142, USA
Dimitri P. Bertsekas
Laboratory for Information and Decision Systems, Massachusetts Institute of Technology,
Cambridge, MA 02139, USA
Received 20 November 1989
Revised manuscript received 9 July 1990
This paper shows, by means of an operator called a
splitting operator,
that the Douglas-Rachford splitting
method for finding a zero of the sum of two monotone operators is a special case of the proximal point
algorithm, Therefore, applications of Douglas-Rachford splitting, such as the alternating direction method
of multipliers for convex programming decomposition, are also special cases of the proximal point
algorithm. This observation allows the unification and generalization of a variety of convex programming
algorithms. By introducing a modified version of the proximal point algorithm, we derive a new,
generalized
alternating direction method of multipliers for convex programming. Advances of this sort illustrate the
power and generality gained by adopting monotone operator theory as a conceptual framework.
Key words:
Monotone operators, proximal point algorithm, decomposition.
I. Introduction
The theory of maximal set-valued monotone operators (see, for example, [4])
provides a powerful general framework for the study of convex programming and
variational inequalities. A fundamental algorithm for finding a root of a monotone
operator is the proximal point algorithm [48]. The well-known method of multipliers
[23, 41] for constrained convex programming is known to be a special case of the
proximal point algorithm [49J. This paper will reemphasize the power and generality
of the monotone operator framework in the analysis and derivation of convex
optimization algorithms, with an emphasis on decomposition algorithms.
The proximal point algorithm requires evaluation of resolvent operators of the
form (I+AT) -1, where T is monotone and set-valued, h is a positive scalar, and I
This paper is drawn largely from the dissertation research of the first author. The dissertation was
performed at M.I.T. under the supervision of the second author, and was supported in part by the Army
Research Office under grant number DAAL03-86-K-01710 and by the National Science Foundation under
grant number ECS-8519058.

294 J.
Eckstein, D.P. Bertsekas / On Douglas-Rachford splitting
denotes the identity mapping. The main difficulty with the method is that I +AT
may be hard to invert, depending on the nature of T. One alternative is to find
maximal monotone operators A and B such that A + B = T, but I + AA and I + AB
are easier to invert that I+ AT. One can then devise an algorithm that uses only
operators of the form (I+AA) -1 and (I+AB) -1, rather than
(I+A(A+B)) -~=
(/+AT) -~. Such an approach is called a
splitting method,
and is inspired by
well-established techniques from numerical linear algebra (for example, see [33]).
A number of authors, mainly in the French mathematical community, have
extensively studied monotone operator splitting methods, which fall into four
principal classes: forward-backward [40, 13, 56], double-backward [30, 40], Peace-
man-Rachford [31], and Douglas-Rachford [31]. For a survey, readers may wish
to refer to [1 I, Chapter 3]. We will focus on the "Douglas-Rachford" class, which
appears to have the most general convergence properties. Gabay [13] has shown
that the
alternating direction
method of multipliers, a variation on the method of
multipliers designed to be more conducive to decomposition, is a special case of
Douglas-Rachford splitting. The alternating direction method of multipliers was
first introduced in [16] and [14]; additional contributions appear in [12]. An
interesting presentation can be found in [15], and [3] provides a relative accessible
exposition. Despite Gabay's result, most developments of the alternating direction
method multipliers rely on a lengthy analysis from first principles. Here, we seek
to demonstrate the benefit of using the operator-theoretic approach.
This paper hinges on a demonstration that Douglas-Rachford splitting is an
application of the proximal point algorithm. As a consequence, much of the theory
of the proximal point and related algorithms may be carried over to the context of
Douglas-Rachford splitting and its special cases, including the alternating direction
method of multipliers. As one example of this carryover, we present a
generalized
form of the proximal point algorithm -- created by synthesizing the work of
Rockafellar [48] with that of Gol'shtein and Tret'yakov [22] -- and show how it
gives rise to a new method,
generalized
Douglas-Rachford splitting. This in turn
allows the derivation of a new augmented Lagrangian method for convex program-
ming, the
generalized
alternating direction method of multipliers. This result illus-
trates the benefits of adopting the monotone operator analytic approach. Because
it allows over-relaxation factors, which are often found to accelerate proximal
point-based methods in practice, the generalized alternating direction method of
multipliers may prove to be faster than the alternating direction method of multipliers
in some applications. Because it permits approximate computation, it may also be
more widely applicable.
While the current paper was under review, [28] was brought to our attention.
There, Lawrence and Spingarn briefly draw the connection between the proximal
point algorithm and Douglas-Rachford splitting in a somewhat different -- and
very elegant -- manner. However, the implications for extensions to the Douglas-
Rachford splitting methodology and for convex programming decomposition theory
were not pursued.

J. Eckstein, D.P. Bertsekas / On Douglas-Rachford splitting
295
Most of the results presented here are refinements of those in the recent thesis
by Eckstein [11], which contains more detailed development, and also relates the
theory to the work of Gol'shtein [17, 18, 19, 20, 21, 22]. Some preliminary versions
of our results have also appeared in [10]. Subsequent papers will introduce applica-
tions of the development given here to parallel optimization algorithms, again
capitalizing on the underpinnings provided by monotone operator theory.
This paper is organized as follows: Section 2 introduces the basic theory of
monotone operators in Hilbert space, while Section 3 proves the convergence of a
generalized form of the proximal point algorithm. Section 4 discusses Douglas-
Rachford splitting, showing it to be a special case of the proximal point algorithm
by means of a specially-constructed
splitting operator.
This notion is combined with
the result of Section 3 to yield
generalized
Douglas-Rachford splitting. Section 5
applies this theory, generalizing the alternating direction method of multipliers. It
also discusses Spingarn's [52, 54]
method of partial inverses,
with a minor extension.
Section 6 briefly presents a negative result concerning finite termination of Douglas-
Rachford splitting methods.
2. Monotone operators
An
operator T
on a Hilbert space Y( is a (possibly null-valued) point-to-set map
T: Y(~2 ~. We will make no distinction between an operator T and its graph, that
is, the set {(x,
y)[y ~
T(x)}. Thus, we may simply say that an operator is any subset
T of Text, and define T(x)=
Tx={y](x,y)c T}.
If T is single-valued, that is, the cardinality of
Tx
is at most 1 for all x c ~, we
will by slight abuse of notation allow
Tx
and
T(x)
to stand for the unique y c Y
such that (x, y) c T, rather than the singleton set {y}. The intended meaning should
be clear from the context.
The
domain
of a mapping T is its "projection" onto the first coordinate,
dom
T={x E Ygl3y6 Y(: (x, y)c
T} ={xc ~[
Tx#O}.
We say that T has
full domain
if dora T --- Yg. The
range
or
image
of T is similarly
defined as its projection onto the second coordinate,
im
T= {y c YfI 3x 6 Y(: (x, y) ~ T}.
The
inverse T -1
of T is
{(y,x)l(x,y)6 T}.
For any real number e and operator T, we let
cT
be the operator {(x,
cy) ] (x, y) ~ T},
and if A and B are any operators, we let
A + B= {(x, y + z)l(x, y)c A, (x, z)E B}.
We will use the symbol I to denote the
identity
operator {(x, x) [x ~ ~}. Let (., }
denote the inner product on ~. Then an operator T is
monotone
if
(x'-x,y'-y}>~O V(x,y),(x',y')~T.

296 £ Eckstein, D,P. Bertsekas / On Douglas-Rachford splitting
A monotone operator is maximal if (considered as a graph) it is not strictly contained
in any other monotone operator on Y(. Note that an operator is (maximal) monotone
if and only if its inverse is (maximal) monotone. The best-known example of maximal
monotone operator is the subgradient mapping af of a closed proper convex function
f: Y~-~ ~ ~ {+co} [42, 44, 45]. The following theorem, originally due to Minty [36, 37],
provides a crucial characterization of maximal monotone operators:
Theorem 1. A monotone operator T on ~( is maximal if and only if im(I + T) -- Y(. []
For alternative proofs of Theorem 1, or stronger related theorems, see [45, 4, 6,
or 24]. All proofs of the theorem require Zorn's lemma, or, equivalently, the axiom
of choice.
Given any operator A, let JA denote the operator (I + A) -~. Given any positive
scalar e and operator T, Jcr = (I+ cT) -1 is called a resolvent of T. An operator C
on Y( is said to be nonexpansive if
Ily'-yll<~ ]]x'-xlJ V(x,y), (x',y')c C.
Note that nonexpansive operators are necessarily single-valued and Lipschitz con-
tinuous. An operator J on ~ is said to be firmly nonexpansive if
liy'-y[12<~(x'-x,y'-y) V(x,y),(x',y')6J.
The following lemma summarizes some well-known properties of firmly nonexpan-
sive operators. The proof is straightforward and is omitted (or see, for example,
[48] or [11, Section 3.2.4]). Figure 1 illustrates the lemma.
Lemma 1. (i) All firmly nonexpansive operators are nonexpansive. (ii) An operator J
is firmly nonexpansive if and only if 2J- I is nonexpansive. (iii) An operator is firmly
nonexpansive if and only if it is of the form ½(C+I), where C is nonexpansive. (iv)
An operator J is firmly nonexpansive if and only if I - J is firmly nonexpansive. []
We now give a critical theorem, The "only if" part of the following theorem has
been well known for some time [48], but the "if" part, just as easily obtained,
appears to have been obscure. The purpose here is to stress the complete symmetry
that exists between (maximal) monotone operators and (full-domained) firmly
nonexpansive operators over any Hilbert space.
Theorem 2. Let c be any positive scalar. An operator T on Y( is monotone ~ and only
if its resolvent JeT = (I + cT) ~ is firmly nonexpansive. Furthermore, T is maximal
monotone if and only if J~r is firmly nonexpansive and dom(J~.r) = ~.
Proof. By the definition of the scaling, addition, and inversion operations,
(x,y)c T <=5 (x+ey, x) c(I+cT) -I.

J. Eckstein, D.P. Bertsekas / On Douglas-Rachford splitting
/
\
\,. /
X'--X
297
Fig, I. Illustration of the action of firmly nonexpansive operators in Hilbert space. If J is nonexpansive,
then
J(x')-J(x) must
lie in the larger sphere, which has radius
Ilx'-xl[
and is centered at 0. If J is
firmly
nonexpansive, then
J(x')-J(x)
must lie in the smaller sphere, which has radius ½[[x'-x H and is
centered at
½(x'-x).
This characterization follows directly from J being of the form ~1l +vC,l where
C is nonexpansive. Note that if
J(x')-J(x)
lies in the smaller sphere, so must
(1 -J)(x')- (1 -J)(x),
illustrating Lemma l(iv).
Therefore,
T monotone ¢:>
(x'-x,y'-y)>~O V(x,y),
(x',y')c T,
¢:> (x'-x, cy'-cy)>~O V(x,y),(x',y')cT,
¢:> (x'-x+cy'-cy, x'-x)>~[Ix'-xll z V(x,y),(x',y')c T,
¢:> (I + cT) -~
firmly nonexpansive.
The first claim is established. Clearly, T is maximal if and only if
cT
is maximal.
So, by Theorem 1, T is maximal if and only if
im(I+eT)=-Y(.
This is in turn true
if and only if
(I+ cT) -~
has domain Y(, establishing the second statement. []
Corollary 2.1.
An operator K is firmly nonexpansive if and only if K -l - I is monotone.
K is firmly nonexpansive with full domain if and only if K -~- I is maximal
monotone. []
Corollary 2.2.
For any
c > 0,
the resolvent JeT of a monotone operator T is single-
valued. If T is also maximal, then J~T has full domain. []
Corollary
2.3 (The Representation Lemma).
Let e > 0 and let T be monotone on ~.
Then every element z of Y{ can be written in at most one way as x+ cy, where y c Tx.

Citations
More filters
Book
23 May 2011
TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

17,433 citations


Cites background from "On the Douglas-Rachford splitting m..."

  • ...Viewed as a function of v, the righthand side is denoted proxf,ρ(v) and is called the proximity operator of f with penalty ρ [Mor62]....

    [...]

Journal ArticleDOI
TL;DR: A first-order primal-dual algorithm for non-smooth convex optimization problems with known saddle-point structure can achieve O(1/N2) convergence on problems, where the primal or the dual objective is uniformly convex, and it can show linear convergence, i.e. O(ωN) for some ω∈(0,1), on smooth problems.
Abstract: In this paper we study a first-order primal-dual algorithm for non-smooth convex optimization problems with known saddle-point structure. We prove convergence to a saddle-point with rate O(1/N) in finite dimensions for the complete class of problems. We further show accelerations of the proposed algorithm to yield improved rates on problems with some degree of smoothness. In particular we show that we can achieve O(1/N 2) convergence on problems, where the primal or the dual objective is uniformly convex, and we can show linear convergence, i.e. O(? N ) for some ??(0,1), on smooth problems. The wide applicability of the proposed algorithm is demonstrated on several imaging problems such as image denoising, image deconvolution, image inpainting, motion estimation and multi-label image segmentation.

4,487 citations


Cites background from "On the Douglas-Rachford splitting m..."

  • ...Let us focus here on the Douglas-Rachford splitting algorithm (DRS) [19], which is known to be a special case of the proximal point algorithm (23), see [10]....

    [...]

Journal ArticleDOI
TL;DR: It is shown that the convex program associated with LRR solves the subspace clustering problem in the following sense: When the data is clean, LRR exactly recovers the true subspace structures; when the data are contaminated by outliers, it is proved that under certain conditions LRR can exactly recover the row space of the original data.
Abstract: In this paper, we address the subspace clustering problem. Given a set of data samples (vectors) approximately drawn from a union of multiple subspaces, our goal is to cluster the samples into their respective subspaces and remove possible outliers as well. To this end, we propose a novel objective function named Low-Rank Representation (LRR), which seeks the lowest rank representation among all the candidates that can represent the data samples as linear combinations of the bases in a given dictionary. It is shown that the convex program associated with LRR solves the subspace clustering problem in the following sense: When the data is clean, we prove that LRR exactly recovers the true subspace structures; when the data are contaminated by outliers, we prove that under certain conditions LRR can exactly recover the row space of the original data and detect the outlier as well; for data corrupted by arbitrary sparse errors, LRR can also approximately recover the row space with theoretical guarantees. Since the subspace membership is provably determined by the row space, these further imply that LRR can perform robust subspace clustering and error correction in an efficient and effective way.

3,085 citations


Cites background from "On the Douglas-Rachford splitting m..."

  • ...According to the theoretical results in [41], two conditions are sufficient (but may not necessary) for Algorithm 1 to converge: the first condition is that the dictionary matrixA is of full column rank; the second one is that the optimality gap produced in each iteration step is monotonically decreasing, namely th e error...

    [...]

  • ...For the monotonically decreasing condit ion, although it is not easy tostrictly prove it, the convexity of the Lagrange function could guarantee its validity to some exte nt [41]....

    [...]

Posted Content
Abstract: The proximity operator of a convex function is a natural extension of the notion of a projection operator onto a convex set. This tool, which plays a central role in the analysis and the numerical solution of convex optimization problems, has recently been introduced in the arena of signal processing, where it has become increasingly important. In this paper, we review the basic properties of proximity operators which are relevant to signal processing and present optimization methods based on these operators. These proximal splitting methods are shown to capture and extend several well-known algorithms in a unifying framework. Applications of proximal methods in signal recovery and synthesis are discussed.

2,095 citations

References
More filters
Book
01 Jan 1989
TL;DR: This work discusses parallel and distributed architectures, complexity measures, and communication and synchronization issues, and it presents both Jacobi and Gauss-Seidel iterations, which serve as algorithms of reference for many of the computational approaches addressed later.
Abstract: gineering, computer science, operations research, and applied mathematics. It is essentially a self-contained work, with the development of the material occurring in the main body of the text and excellent appendices on linear algebra and analysis, graph theory, duality theory, and probability theory and Markov chains supporting it. The introduction discusses parallel and distributed architectures, complexity measures, and communication and synchronization issues, and it presents both Jacobi and Gauss-Seidel iterations, which serve as algorithms of reference for many of the computational approaches addressed later. After the introduction, the text is organized in two parts: synchronous algorithms and asynchronous algorithms. The discussion of synchronous algorithms comprises four chapters, with Chapter 2 presenting both direct methods (converging to the exact solution within a finite number of steps) and iterative methods for linear

5,597 citations


"On the Douglas-Rachford splitting m..." refers background in this paper

  • ...An interesting presentation can be found in [15], and [3] provides a relative accessible exposition....

    [...]

Book
01 Jan 1973
TL;DR: In this article, Operateurs Maximaux Monotones: Et Semi-Groupes De Contractions Dans Les Espaces De Hllbert are described and discussed. But the focus is not on the performance of the operators.
Abstract: Front Cover; Operateurs Maximaux Monotones: Et Semi-Groupes De Contractions Dans Les Espaces De Hllbert; Copyright Page; Table des Matieres; Introduction; CHAPTER I. QUELQUES RESULTATS PRELIMINAIRES; CHAPTER II. OPERATEURS MAXIMAUX MONOTONES; CHAPTER III. EQUATIONS D'EVOLUTION ASSOCIEES AUX OPERATEURS MONOTONES; CHAPTER IV. PROPRIETES DES SEMI-GROUPES DE CONTRACTIONS NON LINEAIRES; APPENDICE : FONCTIONS VECTGRIELLES D ' U N E VARIASLE REELLE -; REFERENCES BIBLIOGRAPHIQUES COMPLEMENTS ET PROBLEMES OUVERTS; BIBLIOGRAPHIE.

3,447 citations

Journal ArticleDOI
TL;DR: In this paper, the proximal point algorithm in exact form is investigated in a more general form where the requirement for exact minimization at each iteration is weakened, and the subdifferential $\partial f$ is replaced by an arbitrary maximal monotone operator T.
Abstract: For the problem of minimizing a lower semicontinuous proper convex function f on a Hilbert space, the proximal point algorithm in exact form generates a sequence $\{ z^k \} $ by taking $z^{k + 1} $ to be the minimizes of $f(z) + ({1 / {2c_k }})\| {z - z^k } \|^2 $, where $c_k > 0$. This algorithm is of interest for several reasons, but especially because of its role in certain computational methods based on duality, such as the Hestenes-Powell method of multipliers in nonlinear programming. It is investigated here in a more general form where the requirement for exact minimization at each iteration is weakened, and the subdifferential $\partial f$ is replaced by an arbitrary maximal monotone operator T. Convergence is established under several criteria amenable to implementation. The rate of convergence is shown to be “typically” linear with an arbitrarily good modulus if $c_k $ stays large enough, in fact superlinear if $c_k \to \infty $. The case of $T = \partial f$ is treated in extra detail. Applicati...

3,238 citations