On the convergence of the exponential multiplier method for convex programming

doi:10.1007/BF01580598

Mathematical Programming 60 ( 1993 ) 1 - 19 1

North-Holland

On the convergence of the exponential

multiplier method for convex programming

Paul Tseng

Department of Mathematics, University of Washington, Seattle, WA, USA

Dimitri P. Bertsekas

Department of Electrical Engineering and Computer Science, M.I.T., Cambridge, MA, USA

Received 8 October 1990

Revised manuscript received 21 January 1992

In this paper, we analyze the exponential method of multipliers for convex constrained minimization

problems, which operates like the usual Augmented Lagrangian method, except that it uses an exponential

penalty function in place of the usual quadratic. We also analyze a dual counterpart, the entropy

minimization algorithm, which operates like the proximal minimization algorithm, except that it uses a

logarithmic/entropy "proximal" term in place of a quadratic. We strengthen substantially the available

convergence results for these methods, and we derive the convergence rate of these methods when applied

to linear programs.

Key words:

Convex programming, linear programming, multiplier method, exponential penalty,

Augmented Lagrangian.

1. Introduction

Let f:~ n ~ (-oc, oe] and g~ :En ~ (-0% oc],j = 1,..., m, be closed, proper, convex

functions in En, the n-dimensional Euclidean space. Consider the following convex

program associated with f and the g/s:

(p) minimize

f(x)

subject to

gj(x)<~O, j=l,...,m.

(1.1)

We make the following standing assumption about (P):

Assumption A. (a)

The optimal solution set for (P) is nonempty and bounded.

(b) The effective domain off, that is, the set

{xlf(x)<~}

is contained in the

effective domain {x I gj (x) < ~ ) of each gj. Furthermore, the relative interior of the

effective domain of f is contained in the relative interior of the effective domain of

each gj.

Correspondence to:

Prof. Dimitri P. Bertsekas, Laboratory for Information and Decision Systems,

M.I.T., Cambridge, MA 02139, USA.

Research supported by the National Science Foundation under Grant DDM-8903385, and the Army

Research Office under Grant DAAL03-86-K-0171.

2 P. Tseng, D.P. Bertsekas/Exponential multiplier method

(c) There exists a vector ~ in the relative interior of the effective domain of f,

which satisfies

&(2)<

0 for all non-affine &.

The boundedness assumption in part (a) of Assumption A will be needed to

ensure that our method is well-defined. Part (b) of Assumption A is satisfied in

particular if all the constraint functions are real-valued. Parts (b) and (c) of Assump-

tion A are constraint qualification conditions, which are needed to guarantee the

existence of a Kuhn-Tucker vector for the problem (see [25, p. 277]).

We now describe the exponential multiplier method proposed by Kort and

Bertsekas [ 15 ] for solving problem (P) (see also [ 5, Section 5.1.2 ] ). Let ~: ~-, ~ be

the exponential penalty function given by

O(t) = e'- 1. (1.2)

We associate a multiplier /xj > 0 with the jth constraint. The method performs a

sequence of unconstrained minimizations, and iterates on the multipliers at the end

of each minimization. At the kth iteration (k~0) we are given positive /x k,

j = 1, ..., m (with the initial/x°,j = 1,..., m, chosen arbitrarily); we compute x k as

xkc argmin/ m k }

f(x)+ Y, ~ ~b(c~&(x)) ,

(1.3)

x~' ~. j=l

Cj

where each c~ is a positive penalty parameter, and then we update the multipliers

according to

k+l .,

(1.4)

= e,, , j=l,., m.

Notice that for a fixed /z~> 0, as c~-+ o% the "penalty" term

(la.)/c))~(c)gj(x))

tends to oo for all infeasible x (&(x)> 0) and to zero for all feasible

x (gj(x)<~ 0).

On the other hand, for a fixed c), as/,)-+ 0 (which is expected to occur if the jth

constraint is inactive at the optimum), the penalty term goes to zero for all

x,

feasible

or infeasible. This is contrary to what happens in usual exterior penalty methods

[11, 17], and for this reason, much of the standard analysis for exterior penalty and

multiplier methods cannot be applied to the exponential method of multipliers.

It can be shown that the minimum in (1.3) is attained for all k (see [5, p. 337]).

For a brief justification, note that if this minimum were not attained, then f

and the functions ga would share a direction of recession, in which case the

optimal solution set of (P) is unbounded (see [25, Section 8]), thus contradicting

Assumption A.

We will consider two rules for choosing the penalty parameters e). In the first

rule, which is common in muliplier methods, the c~'s are independent ofj and are

bounded from below, that is,

c~=w k Vk, (1.5a)

where {~o k} is some sequence of positive scalars satisfying

w k >1 (h Vk,

(1.5b)

P. Tseng, D.P. Bertsekas/Exponential multiplier method 3

with o3 a fixed positive scalar. Note that with this rule, we can still provide for

different penalization of different constraints, by multiplying the constraints with

different scaling constants at the start of the computation.

In the second rule, the penalty parameters depend on the current values of the

multipliers, becoming larger as these multipliers become smaller; for inactive con-

straints for which the associated multipliers tend to zero, the corresponding penalty

parameters tend to infinity. In particular, each c~ is set inversely proportional to

/z~, that is,

cf = c/Ix~ Vj,

(1.6)

where c is a fixed positive constant. The second rule is interesting because for linear

programs, it leads to a superlinear rate of convergence, even though the penalty

parameters corresponding to active constraints with positive multipliers remain

bounded.

The principal motivation for the exponential method of multipliers is that in

contrast with the usual quadratic Augmented Lagrangian function for inequality

constraints [26], the minimized function in (1.3) is twice differentiable if the functions

f and

gj

are. As a result, Newton-type methods can be used for the corresponding

unconstrained minimization more effectively, and with guaranteed superlinear con-

vergence. This is not just a theoretical advantage; in the experience of the second

author, serious difficulties arise with Newton's method when the usual quadratic

Augmented Lagrangian method is used to solve linear programs [4]. By contrast,

the exponential multiplier method has been used to solve fast and with consistency

very large linear programs arising in production scheduling of power systems [ 1, 16];

simplex methods as well as the more recent interior point methods are unsuitable

for the solution of these problems.

Some aspects of the convergence analysis of the exponential multiplier method

have proved surprisingly difficult, even though the method has been known to be

reliable in practice [1]. For nonconvex problems under second order sufficiency

conditions, convergence can be analyzed using fairly standard techniques; see [22].

However, for convex problems, the sharpest result available so far, due to Kort and

Bertsekas, and given in [5, p. 336], assumes (in addition to Assumption A) a mild

but fairly complicated and hard to verify assumption, and asserts that when the

penalty parameters c{'

j are selected according to the first rule (1.5), all cluster points

of {/z k} are optimal solutions of an associated dual problem. One of the contributions

of the present paper, is to show using an unusual proof technique, that the entire

sequence {k~ k} converges to an optimal solution of the dual problem, without

assuming the complex assumption of [5]. The corresponding sequence {x k} is shown

to approach optimality in an ergodic sense. As an indication of the difficulty of the

analysis, we note that we have been unable to show a corresponding result when

c~ is selected according to the second rule (1.6), even though the method in practice

seems equally reliable with the rule ( 1.5 ) or ( 1.6 ).

4 P. Tseng, D.P. Bertsekas/Exponential multiplier method

A second contribution of the present paper is the analysis of the convergence

rate of the exponential method of multipliers as applied to linear programs. The

usual quadratic Augmented Lagrangian method converges in a finite number of

iterations for linear programs, as shown independently by Poljak and Tretjakov

[23], and Bertsekas [3] (see also [5, Section 5.4]). This is not true for the exponential

method of multipliers, but we show that the rate of convergence is linear for the

penalty parameter selection rule (1.5a), and quadratic for the rule (1.6).

It has been shown by Rockafellar [27] that when the quadratic Augmented

Lagrangian method is dualized using the Fenchel duality theorem, one obtains the

proximal minimization algorithm of Martinet [21], which is a special case of the

proximal point algorithm of Rockafellar [28]. By similarly dualizing the exponential

method of multipliers one obtains a method, called entropy minimization algorithm,

which involves a logarithmic/entropy "proximal" term; see Section 2. The entropy

minimization algorithm is mathematically equivalent to the exponential method of

multipliers, so it is covered by our convergence results. This equivalence is also

used in a substantial way in our analysis similar to several past works, which have

studied nonquadratic versions of Augmented Lagrangian, proximal minimization,

and proximal point algorithms [5, 12, 15, 18, 19].

Several recent works have also drawn attention to nonquadratic proximal point

algorithms and the entropy minimization algorithm in particular. In particular,

Censor and Zenios [7], have proposed a broad class of algorithms generalizing the

proximal minimization algorithm by using Bregman functions. Eckstein [10] has

generalized in an analogous manner the proximal point algorithm; see also [13].

None of these works provides a convergence or rate of convergence result

for the exponential method of multipliers or its equivalent entropy minimization

algorithm, although some of the analysis of [7] and [10] was helpful to us (see

Section 3). 1

Regarding notation, all our vectors are column vectors, and superscript "T"

denotes transposition. For a function h :Rn ~-~ N, we denote by Vh(x) and oh(x)

the gradient and the subdifferential of h at the vector x, respectively. For any set S

and any positive integer m, we denote by S" the m-fold Cartesian product of S

with itself.

2. The entropy minimization algorithm

In this section we focus on the dual interpretation of the exponential mutiplier

method (1.3)-(1.4), as worked out in [5, pp. 315-327]. Let d : [0, oc) m -~ [-ec, oo) be

While this paper was under review, convergence results for the dual sequence {ix k }, which are similar

to ours have been obtained by Censor and Zenios in a revision of their paper [7], and by Chen and

Teboulle [8] by using different methods of analysis. These works have not considered rate of convergence

issues or the convergence of the primal sequence {xk}.

P. Tseng, D.P. Bertsekas/Exponential multiplier method 5

the dual functional associated with (P) given by

d(~) = min

I~jgj(x) .

(2.1)

xc~" 1

The function d is closed, proper, and concave under Assumption A, and is the cost

function of the dual problem of (P), given by

(D) maximize d(~)

subject to ~/>0.

The weak duality theorem, asserts that the value d(/~) of any dual feasible vector

is less than or equal to the cost

f(x)

of any primal feasible vector x. Assumption

A implies that there is no duality gap, that is, the optimal value of (D) is equal to

f*, the optimal cost of (P); furthermore, there exists a dual optimal solution (see

[25, Theorem 28.2]).

The exponential method of multipliers (1.3) and (1.4) may be viewed alternatively

as the following algorithm for solving the dual problem (D):

"k+' = arg max/d(/~)-,>o ~ j=l~ ~--~-~ ~* (~.k) }¢' , (2.2)

where 4~* denotes the conjugate function of ~, which is the entropy function

O*(s) = s

ln(s) - s + 1. (2.3)

It can be shown that the maximum is uniquely attained in (2.2) by using the strict

convexity and differentiability of 0", and the fact lims~o Vq~*(s)= ~.

One way to show the equivalence of the two methods is to use the Fenchel duality

theorem. For a direct derivation, notice that, by definition, x k satisfies the Kuhn-

Tucker optimality conditions for the minimization in ( 1.3 ), so

Oc Of(xk)+ ~ I~Vt~(ckgj(xk))ogj(xk).

j=l

(This equation can be justified by using Assumption A; see the subgradient calculus

developed in [25, Section 23].) Then, from the multiplier update formula (1.4), we

obtain

OcOf(xk)+ ~ I~k+'ogj(xk),

j=l

implying that x k attains the minimum in the dual function definition (2.1), with/~

set to/z k+l. Hence,

d(I ~k+') =f(xk)+ ~ I-Lk+'gj(xk).

(2.4)

On the convergence of the exponential multiplier method for convex programming

Citations

Convex Analysisの二,三の進展について

Prediction, learning, and games

Network Optimization: Continuous and Discrete Models

Dual Averaging Method for Regularized Stochastic Learning and Online Optimization

Convex Optimization Theory

References

Convex Analysisの二,三の進展について

Linear Programming and Extensions

Parallel and Distributed Computation: Numerical Methods

Linear and nonlinear programming

Constrained Optimization and Lagrange Multiplier Methods

Related Papers (5)

Constrained Optimization and Lagrange Multiplier Methods

Monotone Operators and the Proximal Point Algorithm

Augmented Lagrangians and Applications of the Proximal Point Algorithm in Convex Programming

Multiplier and gradient methods

Introduction to optimization