scispace - formally typeset
Open AccessJournal ArticleDOI

On the convergence of the exponential multiplier method for convex programming

TLDR
This paper analyzes the exponential method of multipliers for convex constrained minimization problems, which operates like the usual Augmented Lagrangian method, except that it uses an exponential penalty function in place of the usual quadratic.
Abstract
In this paper, we analyze the exponential method of multipliers for convex constrained minimization problems, which operates like the usual Augmented Lagrangian method, except that it uses an exponential penalty function in place of the usual quadratic We also analyze a dual counterpart, the entropy minimization algorithm, which operates like the proximal minimization algorithm, except that it uses a logarithmic/entropy “proximal” term in place of a quadratic We strengthen substantially the available convergence results for these methods, and we derive the convergence rate of these methods when applied to linear programs

read more

Content maybe subject to copyright    Report

Mathematical Programming 60 ( 1993 ) 1 - 19 1
North-Holland
On the convergence of the exponential
multiplier method for convex programming
Paul Tseng
Department of Mathematics, University of Washington, Seattle, WA, USA
Dimitri P. Bertsekas
Department of Electrical Engineering and Computer Science, M.I.T., Cambridge, MA, USA
Received 8 October 1990
Revised manuscript received 21 January 1992
In this paper, we analyze the exponential method of multipliers for convex constrained minimization
problems, which operates like the usual Augmented Lagrangian method, except that it uses an exponential
penalty function in place of the usual quadratic. We also analyze a dual counterpart, the entropy
minimization algorithm, which operates like the proximal minimization algorithm, except that it uses a
logarithmic/entropy "proximal" term in place of a quadratic. We strengthen substantially the available
convergence results for these methods, and we derive the convergence rate of these methods when applied
to linear programs.
Key words:
Convex programming, linear programming, multiplier method, exponential penalty,
Augmented Lagrangian.
1. Introduction
Let f:~ n ~ (-oc, oe] and g~ :En ~ (-0% oc],j = 1,..., m, be closed, proper, convex
functions in En, the n-dimensional Euclidean space. Consider the following convex
program associated with f and the g/s:
(p) minimize
f(x)
subject to
gj(x)<~O, j=l,...,m.
(1.1)
We make the following standing assumption about (P):
Assumption A. (a)
The optimal solution set for (P) is nonempty and bounded.
(b) The effective domain off, that is, the set
{xlf(x)<~}
is contained in the
effective domain {x I gj (x) < ~ ) of each gj. Furthermore, the relative interior of the
effective domain of f is contained in the relative interior of the effective domain of
each gj.
Correspondence to:
Prof. Dimitri P. Bertsekas, Laboratory for Information and Decision Systems,
M.I.T., Cambridge, MA 02139, USA.
Research supported by the National Science Foundation under Grant DDM-8903385, and the Army
Research Office under Grant DAAL03-86-K-0171.

2 P. Tseng, D.P. Bertsekas/Exponential multiplier method
(c) There exists a vector ~ in the relative interior of the effective domain of f,
which satisfies
&(2)<
0 for all non-affine &.
The boundedness assumption in part (a) of Assumption A will be needed to
ensure that our method is well-defined. Part (b) of Assumption A is satisfied in
particular if all the constraint functions are real-valued. Parts (b) and (c) of Assump-
tion A are constraint qualification conditions, which are needed to guarantee the
existence of a Kuhn-Tucker vector for the problem (see [25, p. 277]).
We now describe the exponential multiplier method proposed by Kort and
Bertsekas [ 15 ] for solving problem (P) (see also [ 5, Section 5.1.2 ] ). Let ~: ~-, ~ be
the exponential penalty function given by
O(t) = e'- 1. (1.2)
We associate a multiplier /xj > 0 with the jth constraint. The method performs a
sequence of unconstrained minimizations, and iterates on the multipliers at the end
of each minimization. At the kth iteration (k~0) we are given positive /x k,
j = 1, ..., m (with the initial/x°,j = 1,..., m, chosen arbitrarily); we compute x k as
xkc argmin/ m k }
f(x)+ Y, ~ ~b(c~&(x)) ,
(1.3)
x~' ~. j=l
Cj
where each c~ is a positive penalty parameter, and then we update the multipliers
according to
k+l .,
(1.4)
= e,, , j=l,., m.
Notice that for a fixed /z~> 0, as c~-+ o% the "penalty" term
(la.)/c))~(c)gj(x))
tends to oo for all infeasible x (&(x)> 0) and to zero for all feasible
x (gj(x)<~ 0).
On the other hand, for a fixed c), as/,)-+ 0 (which is expected to occur if the jth
constraint is inactive at the optimum), the penalty term goes to zero for all
x,
feasible
or infeasible. This is contrary to what happens in usual exterior penalty methods
[11, 17], and for this reason, much of the standard analysis for exterior penalty and
multiplier methods cannot be applied to the exponential method of multipliers.
It can be shown that the minimum in (1.3) is attained for all k (see [5, p. 337]).
For a brief justification, note that if this minimum were not attained, then f
and the functions ga would share a direction of recession, in which case the
optimal solution set of (P) is unbounded (see [25, Section 8]), thus contradicting
Assumption A.
We will consider two rules for choosing the penalty parameters e). In the first
rule, which is common in muliplier methods, the c~'s are independent ofj and are
bounded from below, that is,
c~=w k Vk, (1.5a)
where {~o k} is some sequence of positive scalars satisfying
w k >1 (h Vk,
(1.5b)

P. Tseng, D.P. Bertsekas/Exponential multiplier method 3
with o3 a fixed positive scalar. Note that with this rule, we can still provide for
different penalization of different constraints, by multiplying the constraints with
different scaling constants at the start of the computation.
In the second rule, the penalty parameters depend on the current values of the
multipliers, becoming larger as these multipliers become smaller; for inactive con-
straints for which the associated multipliers tend to zero, the corresponding penalty
parameters tend to infinity. In particular, each c~ is set inversely proportional to
/z~, that is,
cf = c/Ix~ Vj,
(1.6)
where c is a fixed positive constant. The second rule is interesting because for linear
programs, it leads to a superlinear rate of convergence, even though the penalty
parameters corresponding to active constraints with positive multipliers remain
bounded.
The principal motivation for the exponential method of multipliers is that in
contrast with the usual quadratic Augmented Lagrangian function for inequality
constraints [26], the minimized function in (1.3) is twice differentiable if the functions
f and
gj
are. As a result, Newton-type methods can be used for the corresponding
unconstrained minimization more effectively, and with guaranteed superlinear con-
vergence. This is not just a theoretical advantage; in the experience of the second
author, serious difficulties arise with Newton's method when the usual quadratic
Augmented Lagrangian method is used to solve linear programs [4]. By contrast,
the exponential multiplier method has been used to solve fast and with consistency
very large linear programs arising in production scheduling of power systems [ 1, 16];
simplex methods as well as the more recent interior point methods are unsuitable
for the solution of these problems.
Some aspects of the convergence analysis of the exponential multiplier method
have proved surprisingly difficult, even though the method has been known to be
reliable in practice [1]. For nonconvex problems under second order sufficiency
conditions, convergence can be analyzed using fairly standard techniques; see [22].
However, for convex problems, the sharpest result available so far, due to Kort and
Bertsekas, and given in [5, p. 336], assumes (in addition to Assumption A) a mild
but fairly complicated and hard to verify assumption, and asserts that when the
penalty parameters c{'
j are selected according to the first rule (1.5), all cluster points
of {/z k} are optimal solutions of an associated dual problem. One of the contributions
of the present paper, is to show using an unusual proof technique, that the entire
sequence {k~ k} converges to an optimal solution of the dual problem, without
assuming the complex assumption of [5]. The corresponding sequence {x k} is shown
to approach optimality in an ergodic sense. As an indication of the difficulty of the
analysis, we note that we have been unable to show a corresponding result when
c~ is selected according to the second rule (1.6), even though the method in practice
seems equally reliable with the rule ( 1.5 ) or ( 1.6 ).

4 P. Tseng, D.P. Bertsekas/Exponential multiplier method
A second contribution of the present paper is the analysis of the convergence
rate of the exponential method of multipliers as applied to linear programs. The
usual quadratic Augmented Lagrangian method converges in a finite number of
iterations for linear programs, as shown independently by Poljak and Tretjakov
[23], and Bertsekas [3] (see also [5, Section 5.4]). This is not true for the exponential
method of multipliers, but we show that the rate of convergence is linear for the
penalty parameter selection rule (1.5a), and quadratic for the rule (1.6).
It has been shown by Rockafellar [27] that when the quadratic Augmented
Lagrangian method is dualized using the Fenchel duality theorem, one obtains the
proximal minimization algorithm of Martinet [21], which is a special case of the
proximal point algorithm of Rockafellar [28]. By similarly dualizing the exponential
method of multipliers one obtains a method, called entropy minimization algorithm,
which involves a logarithmic/entropy "proximal" term; see Section 2. The entropy
minimization algorithm is mathematically equivalent to the exponential method of
multipliers, so it is covered by our convergence results. This equivalence is also
used in a substantial way in our analysis similar to several past works, which have
studied nonquadratic versions of Augmented Lagrangian, proximal minimization,
and proximal point algorithms [5, 12, 15, 18, 19].
Several recent works have also drawn attention to nonquadratic proximal point
algorithms and the entropy minimization algorithm in particular. In particular,
Censor and Zenios [7], have proposed a broad class of algorithms generalizing the
proximal minimization algorithm by using Bregman functions. Eckstein [10] has
generalized in an analogous manner the proximal point algorithm; see also [13].
None of these works provides a convergence or rate of convergence result
for the exponential method of multipliers or its equivalent entropy minimization
algorithm, although some of the analysis of [7] and [10] was helpful to us (see
Section 3). 1
Regarding notation, all our vectors are column vectors, and superscript "T"
denotes transposition. For a function h :Rn ~-~ N, we denote by Vh(x) and oh(x)
the gradient and the subdifferential of h at the vector x, respectively. For any set S
and any positive integer m, we denote by S" the m-fold Cartesian product of S
with itself.
2. The entropy minimization algorithm
In this section we focus on the dual interpretation of the exponential mutiplier
method (1.3)-(1.4), as worked out in [5, pp. 315-327]. Let d : [0, oc) m -~ [-ec, oo) be
While this paper was under review, convergence results for the dual sequence {ix k }, which are similar
to ours have been obtained by Censor and Zenios in a revision of their paper [7], and by Chen and
Teboulle [8] by using different methods of analysis. These works have not considered rate of convergence
issues or the convergence of the primal sequence {xk}.

P. Tseng, D.P. Bertsekas/Exponential multiplier method 5
the dual functional associated with (P) given by
d(~) = min
I~jgj(x) .
(2.1)
xc~" 1
The function d is closed, proper, and concave under Assumption A, and is the cost
function of the dual problem of (P), given by
(D) maximize d(~)
subject to ~/>0.
The weak duality theorem, asserts that the value d(/~) of any dual feasible vector
is less than or equal to the cost
f(x)
of any primal feasible vector x. Assumption
A implies that there is no duality gap, that is, the optimal value of (D) is equal to
f*, the optimal cost of (P); furthermore, there exists a dual optimal solution (see
[25, Theorem 28.2]).
The exponential method of multipliers (1.3) and (1.4) may be viewed alternatively
as the following algorithm for solving the dual problem (D):
"k+' = arg max/d(/~)-,>o ~ j=l~ ~--~-~ ~* (~.k) }¢' , (2.2)
where 4~* denotes the conjugate function of ~, which is the entropy function
O*(s) = s
ln(s) - s + 1. (2.3)
It can be shown that the maximum is uniquely attained in (2.2) by using the strict
convexity and differentiability of 0", and the fact lims~o Vq~*(s)= ~.
One way to show the equivalence of the two methods is to use the Fenchel duality
theorem. For a direct derivation, notice that, by definition, x k satisfies the Kuhn-
Tucker optimality conditions for the minimization in ( 1.3 ), so
Oc Of(xk)+ ~ I~Vt~(ckgj(xk))ogj(xk).
j=l
(This equation can be justified by using Assumption A; see the subgradient calculus
developed in [25, Section 23].) Then, from the multiplier update formula (1.4), we
obtain
OcOf(xk)+ ~ I~k+'ogj(xk),
j=l
implying that x k attains the minimum in the dual function definition (2.1), with/~
set to/z k+l. Hence,
d(I ~k+') =f(xk)+ ~ I-Lk+'gj(xk).
(2.4)

Citations
More filters
Book

Prediction, learning, and games

TL;DR: In this paper, the authors provide a comprehensive treatment of the problem of predicting individual sequences using expert advice, a general framework within which many related problems can be cast and discussed, such as repeated game playing, adaptive data compression, sequential investment in the stock market, sequential pattern analysis, and several other problems.
Proceedings Article

Dual Averaging Method for Regularized Stochastic Learning and Online Optimization

TL;DR: A new online algorithm is developed, the regularized dual averaging (RDA) method, that can explicitly exploit the regularization structure in an online setting and can be very effective for sparse online learning with l1-regularization.
Book

Convex Optimization Theory

TL;DR: An insightful, concise, and rigorous treatment of the basic theory of convex sets and functions in finite dimensions, and the Dual problem the feasible if it is they, and how to relax the hessian matrix in terms of linear programming.
References
More filters
Book

Linear Programming and Extensions

TL;DR: This classic book looks at a wealth of examples and develops linear programming methods for their solutions and begins by introducing the basic theory of linear inequalities and describes the powerful simplex method used to solve them.
Book

Parallel and Distributed Computation: Numerical Methods

TL;DR: This work discusses parallel and distributed architectures, complexity measures, and communication and synchronization issues, and it presents both Jacobi and Gauss-Seidel iterations, which serve as algorithms of reference for many of the computational approaches addressed later.
Book

Linear and nonlinear programming

TL;DR: Strodiot and Zentralblatt as discussed by the authors introduced the concept of unconstrained optimization, which is a generalization of linear programming, and showed that it is possible to obtain convergence properties for both standard and accelerated steepest descent methods.