On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators

doi:10.1007/BF01581204

Mathematical Programming 55 (1992) 293-318 293

North-Holland

On the Douglas-Rachford splitting method

and the proximal point algorithm for

maximal monotone operators

Jonathan Eckstein

Mathematical Sciences Research Group, Thinking Machines Corporation, Cambridge, MA 02142, USA

Dimitri P. Bertsekas

Laboratory for Information and Decision Systems, Massachusetts Institute of Technology,

Cambridge, MA 02139, USA

Received 20 November 1989

Revised manuscript received 9 July 1990

This paper shows, by means of an operator called a

splitting operator,

that the Douglas-Rachford splitting

method for finding a zero of the sum of two monotone operators is a special case of the proximal point

algorithm, Therefore, applications of Douglas-Rachford splitting, such as the alternating direction method

of multipliers for convex programming decomposition, are also special cases of the proximal point

algorithm. This observation allows the unification and generalization of a variety of convex programming

algorithms. By introducing a modified version of the proximal point algorithm, we derive a new,

generalized

alternating direction method of multipliers for convex programming. Advances of this sort illustrate the

power and generality gained by adopting monotone operator theory as a conceptual framework.

Key words:

Monotone operators, proximal point algorithm, decomposition.

I. Introduction

The theory of maximal set-valued monotone operators (see, for example, [4])

provides a powerful general framework for the study of convex programming and

variational inequalities. A fundamental algorithm for finding a root of a monotone

operator is the proximal point algorithm [48]. The well-known method of multipliers

[23, 41] for constrained convex programming is known to be a special case of the

proximal point algorithm [49J. This paper will reemphasize the power and generality

of the monotone operator framework in the analysis and derivation of convex

optimization algorithms, with an emphasis on decomposition algorithms.

The proximal point algorithm requires evaluation of resolvent operators of the

form (I+AT) -1, where T is monotone and set-valued, h is a positive scalar, and I

This paper is drawn largely from the dissertation research of the first author. The dissertation was

performed at M.I.T. under the supervision of the second author, and was supported in part by the Army

Research Office under grant number DAAL03-86-K-01710 and by the National Science Foundation under

grant number ECS-8519058.

294 J.

Eckstein, D.P. Bertsekas / On Douglas-Rachford splitting

denotes the identity mapping. The main difficulty with the method is that I +AT

may be hard to invert, depending on the nature of T. One alternative is to find

maximal monotone operators A and B such that A + B = T, but I + AA and I + AB

are easier to invert that I+ AT. One can then devise an algorithm that uses only

operators of the form (I+AA) -1 and (I+AB) -1, rather than

(I+A(A+B)) -~=

(/+AT) -~. Such an approach is called a

splitting method,

and is inspired by

well-established techniques from numerical linear algebra (for example, see [33]).

A number of authors, mainly in the French mathematical community, have

extensively studied monotone operator splitting methods, which fall into four

principal classes: forward-backward [40, 13, 56], double-backward [30, 40], Peace-

man-Rachford [31], and Douglas-Rachford [31]. For a survey, readers may wish

to refer to [1 I, Chapter 3]. We will focus on the "Douglas-Rachford" class, which

appears to have the most general convergence properties. Gabay [13] has shown

that the

alternating direction

method of multipliers, a variation on the method of

multipliers designed to be more conducive to decomposition, is a special case of

Douglas-Rachford splitting. The alternating direction method of multipliers was

first introduced in [16] and [14]; additional contributions appear in [12]. An

interesting presentation can be found in [15], and [3] provides a relative accessible

exposition. Despite Gabay's result, most developments of the alternating direction

method multipliers rely on a lengthy analysis from first principles. Here, we seek

to demonstrate the benefit of using the operator-theoretic approach.

This paper hinges on a demonstration that Douglas-Rachford splitting is an

application of the proximal point algorithm. As a consequence, much of the theory

of the proximal point and related algorithms may be carried over to the context of

Douglas-Rachford splitting and its special cases, including the alternating direction

method of multipliers. As one example of this carryover, we present a

generalized

form of the proximal point algorithm -- created by synthesizing the work of

Rockafellar [48] with that of Gol'shtein and Tret'yakov [22] -- and show how it

gives rise to a new method,

generalized

Douglas-Rachford splitting. This in turn

allows the derivation of a new augmented Lagrangian method for convex program-

ming, the

generalized

alternating direction method of multipliers. This result illus-

trates the benefits of adopting the monotone operator analytic approach. Because

it allows over-relaxation factors, which are often found to accelerate proximal

point-based methods in practice, the generalized alternating direction method of

multipliers may prove to be faster than the alternating direction method of multipliers

in some applications. Because it permits approximate computation, it may also be

more widely applicable.

While the current paper was under review, [28] was brought to our attention.

There, Lawrence and Spingarn briefly draw the connection between the proximal

point algorithm and Douglas-Rachford splitting in a somewhat different -- and

very elegant -- manner. However, the implications for extensions to the Douglas-

Rachford splitting methodology and for convex programming decomposition theory

were not pursued.

J. Eckstein, D.P. Bertsekas / On Douglas-Rachford splitting

295

Most of the results presented here are refinements of those in the recent thesis

by Eckstein [11], which contains more detailed development, and also relates the

theory to the work of Gol'shtein [17, 18, 19, 20, 21, 22]. Some preliminary versions

of our results have also appeared in [10]. Subsequent papers will introduce applica-

tions of the development given here to parallel optimization algorithms, again

capitalizing on the underpinnings provided by monotone operator theory.

This paper is organized as follows: Section 2 introduces the basic theory of

monotone operators in Hilbert space, while Section 3 proves the convergence of a

generalized form of the proximal point algorithm. Section 4 discusses Douglas-

Rachford splitting, showing it to be a special case of the proximal point algorithm

by means of a specially-constructed

splitting operator.

This notion is combined with

the result of Section 3 to yield

generalized

Douglas-Rachford splitting. Section 5

applies this theory, generalizing the alternating direction method of multipliers. It

also discusses Spingarn's [52, 54]

method of partial inverses,

with a minor extension.

Section 6 briefly presents a negative result concerning finite termination of Douglas-

Rachford splitting methods.

2. Monotone operators

An

operator T

on a Hilbert space Y( is a (possibly null-valued) point-to-set map

T: Y(~2 ~. We will make no distinction between an operator T and its graph, that

is, the set {(x,

y)[y ~

T(x)}. Thus, we may simply say that an operator is any subset

T of Text, and define T(x)=

Tx={y](x,y)c T}.

If T is single-valued, that is, the cardinality of

Tx

is at most 1 for all x c ~, we

will by slight abuse of notation allow

Tx

and

T(x)

to stand for the unique y c Y

such that (x, y) c T, rather than the singleton set {y}. The intended meaning should

be clear from the context.

The

domain

of a mapping T is its "projection" onto the first coordinate,

dom

T={x E Ygl3y6 Y(: (x, y)c

T} ={xc ~[

Tx#O}.

We say that T has

full domain

if dora T --- Yg. The

range

or

image

of T is similarly

defined as its projection onto the second coordinate,

im

T= {y c YfI 3x 6 Y(: (x, y) ~ T}.

The

inverse T -1

of T is

{(y,x)l(x,y)6 T}.

For any real number e and operator T, we let

cT

be the operator {(x,

cy) ] (x, y) ~ T},

and if A and B are any operators, we let

A + B= {(x, y + z)l(x, y)c A, (x, z)E B}.

We will use the symbol I to denote the

identity

operator {(x, x) [x ~ ~}. Let (., • }

denote the inner product on ~. Then an operator T is

monotone

if

(x'-x,y'-y}>~O V(x,y),(x',y')~T.

296 £ Eckstein, D,P. Bertsekas / On Douglas-Rachford splitting

A monotone operator is maximal if (considered as a graph) it is not strictly contained

in any other monotone operator on Y(. Note that an operator is (maximal) monotone

if and only if its inverse is (maximal) monotone. The best-known example of maximal

monotone operator is the subgradient mapping af of a closed proper convex function

f: Y~-~ ~ ~ {+co} [42, 44, 45]. The following theorem, originally due to Minty [36, 37],

provides a crucial characterization of maximal monotone operators:

Theorem 1. A monotone operator T on ~( is maximal if and only if im(I + T) -- Y(. []

For alternative proofs of Theorem 1, or stronger related theorems, see [45, 4, 6,

or 24]. All proofs of the theorem require Zorn's lemma, or, equivalently, the axiom

of choice.

Given any operator A, let JA denote the operator (I + A) -~. Given any positive

scalar e and operator T, Jcr = (I+ cT) -1 is called a resolvent of T. An operator C

on Y( is said to be nonexpansive if

Ily'-yll<~ ]]x'-xlJ V(x,y), (x',y')c C.

Note that nonexpansive operators are necessarily single-valued and Lipschitz con-

tinuous. An operator J on ~ is said to be firmly nonexpansive if

liy'-y[12<~(x'-x,y'-y) V(x,y),(x',y')6J.

The following lemma summarizes some well-known properties of firmly nonexpan-

sive operators. The proof is straightforward and is omitted (or see, for example,

[48] or [11, Section 3.2.4]). Figure 1 illustrates the lemma.

Lemma 1. (i) All firmly nonexpansive operators are nonexpansive. (ii) An operator J

is firmly nonexpansive if and only if 2J- I is nonexpansive. (iii) An operator is firmly

nonexpansive if and only if it is of the form ½(C+I), where C is nonexpansive. (iv)

An operator J is firmly nonexpansive if and only if I - J is firmly nonexpansive. []

We now give a critical theorem, The "only if" part of the following theorem has

been well known for some time [48], but the "if" part, just as easily obtained,

appears to have been obscure. The purpose here is to stress the complete symmetry

that exists between (maximal) monotone operators and (full-domained) firmly

nonexpansive operators over any Hilbert space.

Theorem 2. Let c be any positive scalar. An operator T on Y( is monotone ~ and only

if its resolvent JeT = (I + cT) ~ is firmly nonexpansive. Furthermore, T is maximal

monotone if and only if J~r is firmly nonexpansive and dom(J~.r) = ~.

Proof. By the definition of the scaling, addition, and inversion operations,

(x,y)c T <=5 (x+ey, x) c(I+cT) -I.

J. Eckstein, D.P. Bertsekas / On Douglas-Rachford splitting

/

\

\,. /

X'--X

297

Fig, I. Illustration of the action of firmly nonexpansive operators in Hilbert space. If J is nonexpansive,

then

J(x')-J(x) must

lie in the larger sphere, which has radius

Ilx'-xl[

and is centered at 0. If J is

firmly

nonexpansive, then

J(x')-J(x)

must lie in the smaller sphere, which has radius ½[[x'-x H and is

centered at

½(x'-x).

This characterization follows directly from J being of the form ~1l +vC,l where

C is nonexpansive. Note that if

J(x')-J(x)

lies in the smaller sphere, so must

(1 -J)(x')- (1 -J)(x),

illustrating Lemma l(iv).

Therefore,

T monotone ¢:>

(x'-x,y'-y)>~O V(x,y),

(x',y')c T,

¢:> (x'-x, cy'-cy)>~O V(x,y),(x',y')cT,

¢:> (x'-x+cy'-cy, x'-x)>~[Ix'-xll z V(x,y),(x',y')c T,

¢:> (I + cT) -~

firmly nonexpansive.

The first claim is established. Clearly, T is maximal if and only if

cT

is maximal.

So, by Theorem 1, T is maximal if and only if

im(I+eT)=-Y(.

This is in turn true

if and only if

(I+ cT) -~

has domain Y(, establishing the second statement. []

Corollary 2.1.

An operator K is firmly nonexpansive if and only if K -l - I is monotone.

K is firmly nonexpansive with full domain if and only if K -~- I is maximal

monotone. []

Corollary 2.2.

For any

c > 0,

the resolvent JeT of a monotone operator T is single-

valued. If T is also maximal, then J~T has full domain. []

Corollary

2.3 (The Representation Lemma).

Let e > 0 and let T be monotone on ~.

Then every element z of Y{ can be written in at most one way as x+ cy, where y c Tx.

On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators

Citations

Total Variation Spatial Regularization for Sparse Hyperspectral Unmixing

Coordinate Descent Algorithms

The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent

Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding

A Fast Alternating Direction Method for TVL1-L2 Signal Reconstruction From Partial Fourier Data

References

Convex Analysisの二,三の進展について

Parallel and Distributed Computation: Numerical Methods

Constrained Optimization and Lagrange Multiplier Methods

Opérateurs maximaux monotones et semi-groupes de contractions dans les espaces de Hilbert

Monotone Operators and the Proximal Point Algorithm

Related Papers (5)

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

Monotone Operators and the Proximal Point Algorithm

A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

Nonlinear total variation based noise removal algorithms

Convex Analysis and Monotone Operator Theory in Hilbert Spaces