scispace - formally typeset
Open AccessJournal ArticleDOI

Optimal Scaling of a Gradient Method for Distributed Resource Allocation

Reads0
Chats0
TLDR
A class of weighted gradient methods for distributed resource allocation over a network is considered and sufficient conditions on the edge weights for the algorithm to converge monotonically to the optimal solution have the form of a linear matrix inequality.
Abstract
We consider a class of weighted gradient methods for distributed resource allocation over a network. Each node of the network is associated with a local variable and a convex cost function; the sum of the variables (resources) across the network is fixed. Starting with a feasible allocation, each node updates its local variable in proportion to the differences between the marginal costs of itself and its neighbors. We focus on how to choose the proportional weights on the edges (scaling factors for the gradient method) to make this distributed algorithm converge and on how to make the convergence as fast as possible. We give sufficient conditions on the edge weights for the algorithm to converge monotonically to the optimal solution; these conditions have the form of a linear matrix inequality. We give some simple, explicit methods to choose the weights that satisfy these conditions. We derive a guaranteed convergence rate for the algorithm and find the weights that minimize this rate by solving a semidefinite program. Finally, we extend the main results to problems with general equality constraints and problems with block separable objective function.

read more

Content maybe subject to copyright    Report

Optimal Scaling of A Gradient Method for Distributed
Resource Allocation
Lin Xiao
Stephen Boyd
Revised February 15, 2005
Abstract
We consider a class of weighted gradient methods for distributed resource allocation
over a network. Each node of the network is associated with a local variable and a
convex cost function; the sum of the variables (resources) across the network is fixed.
Starting with a feasible allocation, each node updates its local variable in proportion
to the differences between the marginal costs of itself and its neighbors. We focus on
how to choose the proportional weights on the edges (scaling factors for the gradient
method) to make this distributed algorithm converge, and how to make the convergence
as fast as possible.
We give sufficient conditions on the edge weights for the algorithm to converge
monotonically to the optimal solution; these conditions have the form of a linear ma-
trix inequality. We give some simple, explicit methods to choose the weights that satisfy
these sufficient conditions. We also derive a guaranteed convergence rate for the algo-
rithm, and find the weights that minimize this rate by solving a semidefinite program.
Finally, we extend the main results to problems with general equality constraints, and
problems with block separable objective function.
Key words: distributed optimization, resource allocation, weighted gradient method,
convergence rate, semidefinite programming.
To appear in Journal of Optimization Theory and Applications, vol. 129, no. 3, 2006.
Center for the Mathematics of Information, Mail Code 136-93, California Institute of Technology,
Pasadena, CA 91125-9300. Email: lxiao@caltech.edu.
Department of Electrical Engineering, Stanford University, Stanford, CA 94305-9510. Email:
boyd@stanford.edu.
1

1 Introduction
We consider an optimal resource allocation problem over a network of autonomous agents.
The network is modeled as a directed graph (V, E) with node set V = {1, . . . , n} and edge
set E V × V. Each edge (i, j) is an ordered pair of distinct nodes. We define N
i
, the set
of (oriented) neighbors of node i, as N
i
= {j | (i, j) E} (in other words, j N
i
if there is
an edge from node i to node j).
With node i we associate a variable x
i
R and a corresponding convex cost function f
i
:
R R. We consider the following optimization problem:
minimize
P
n
i=1
f
i
(x
i
)
subject to
P
n
i=1
x
i
= c,
(1)
where c R is a given constant. We can think of x
i
as the amount of some resource located
at node i, and interpret f
i
as the local (concave) utility function. The problem (1) is
to find an allocation of the resource that maximizes the total utility
P
n
i=1
f
i
(x
i
). In this
paper, we are interested in distributed algorithms for solving this problem, where each node
is only allowed to communicate with its neighbors and conduct local computation. Thus
the local information structure imposed by the graph should be considered as part of the
problem formulation. This simple model for distributed resource allocation and its variations
have many applications in economic systems, e.g., [AH60, Hea69], and distributed computer
systems [KS89].
We assume that the functions f
i
are convex and twice continuously differentiable with
second derivatives that are bounded below and above:
l
i
f
00
i
(x
i
) u
i
, x
i
R, i = 1, . . . , n, (2)
where l
i
> 0 and u
i
are known (the functions are strictly convex). Let x = (x
1
, . . . , x
n
) R
n
denote the vector of the variables and f(x) =
P
n
i=1
f
i
(x
i
) denote the objective function.
We use f
?
to denote the optimal value of this problem; i.e., f
?
= inf{f(x) | 1
T
x = c},
where 1 denotes the vector with all components one. Under the above assumption, the convex
optimization problem (1) has a unique optimal solution x
?
. Let f(x) = (f
0
1
(x
1
), . . . , f
0
n
(x
n
))
denote the gradient of f at x. The optimality conditions for this problem are
1
T
x
?
= c, f(x
?
) = p
?
1, (3)
where p
?
is the (unique) optimal Lagrange multiplier.
In a centralized setup (i.e., all functions and their derivatives can be evaluated at a central
agent), many methods can be used to solve the problem (1), or equivalently, the optimality
conditions (3). If the functions f
i
are all quadratic, the optimality conditions (3) are a set
of linear equations in x
?
and p
?
, and can be solved directly. In the more general case, the
problem can be solved by iterative methods, e.g., the projected gradient method, Newton’s
method, quasi-Newton methods (e.g., BFGS method), and many others. Detailed accounts
of these algorithms (and others) can be found in, e.g., [Ber99] and [BV03].
2

The design of decentralized mechanisms for resource allocation has a long history in
economics [Hur73], and there are two main classes of mechanisms: price-directed [AH60]
and resource-directed [Hea69]. However, most of the methods are not fully distributed
because they either need a central price coordinator or need a central resource dispatcher.
So they cannot be applied to the problem we consider. In this paper we will focus on a class
of center-free algorithms first proposed in [HSS80].
1.1 The center-free algorithm for resource allocation
Assume that we have an initial allocation of the resource x(0) that satisfies 1
T
x(0) = c. The
center-free algorithm for solving problem (1) has the following iterative form:
x
i
(t + 1) = x
i
(t) W
ii
f
0
i
(x
i
(t))
X
j∈N
i
W
ij
f
0
j
(x
j
(t)), i = 1, . . . , n, (4)
for t = 0, 1, . . .. In other words, at each iteration, each node computes the derivative of its
local function, queries the derivative values from its neighbors, and then updates its local
variable by a weighted sum of the values of derivatives. Here W
ii
is the self-weight at node i,
and W
ij
(j N
i
) is the weight associated with the edge (i, j) E. Setting W
ij
= 0 for
j / N
i
, this algorithm can be written in vector form as
x(t + 1) = x(t) W f(x(t)), (5)
where W R
n×n
is the weight matrix. Thus the center-free algorithm can be thought of as
a weighted gradient descent method, in which the weight matrix W has a sparsity constraint
given by the graph:
W S = {Z R
n×n
| Z
ij
= 0 if i 6= j and (i, j) / E}. (6)
Throughout this paper we focus on the following question: How should we choose the weight
matrix W ?
We first consider two basic requirements on W . First, we require that all iterates x(t) of
the algorithm are feasible, i.e., satisfy 1
T
x(t) = c for all t. With the assumption that x(0)
is feasible, this requirement will be met provided the weight matrix satisfies
1
T
W = 0, (7)
since we then have
1
T
x(t + 1) = 1
T
x(t) 1
T
W f(x(t)) = 1
T
x(t).
We will also require, naturally, that the optimal point x
?
is a fixed point of the algorithm (5),
i.e.,
x
?
= x
?
W f(x
?
) = x
?
p
?
W 1.
3

This will hold in the general case (with p
?
6= 0) provided
W 1 = 0. (8)
The requirements (7) and (8) show that the vector 1 must be both a left and right eigenvector
of W , associated with the eigenvalue zero. One special case of interest is when the weight
matrix W is symmetric. In this case, of course, the requirements (7) and (8) are the same,
and simply state that 1 is in the nullspace of W .
Assuming the weights satisfy (8), we have W
ii
=
P
j∈N
i
W
ij
, which can be substituted
into equation (4) to get
x
i
(t + 1) = x
i
(t)
X
j∈N
i
W
ij
f
0
j
(x
j
(t)) f
0
i
(x
i
(t))
, i = 1, . . . , n. (9)
Thus, the change in the local variable at each step is given by a weighted sum of the differences
between its own derivative value and those of its neighbors. The equation (9) has a simple
interpretation: at each iteration, each connected pair of nodes shifts resources from the node
with higher marginal cost to the one with lower marginal cost, in proportion to the difference
in marginal costs. The weight W
ij
gives the proportionality constant on the edge (i, j) E.
(This interpretation suggests that the weights on edges should be negative, but we will see
examples where a few positive edge weights actually enhance the convergence rate.)
1.2 Previous work
Distributed resource allocation algorithms of the form (9) were first proposed and studied by
Ho, Servi and Suri in [HSS80]. They considered an undirected graph with symmetric weights
on the edges, and called algorithms of this form center-free algorithms. (‘Center-free’ refers
to the absence of a central coordinating entity.) In the notation of this paper, they assumed
W = W
T
and W 1 = 0, and derived the following additional conditions on W that are
sufficient for the algorithm (9) to converge to the optimal solution x
?
:
(a) W is irreducible
(b) W
ij
0, (i, j) E
(c)
P
j∈N
i
|W
ij
| < 1/u
max
, i = 1, . . . , n
(10)
where u
max
is an upper bound on the second derivatives of the functions f
i
, i.e., u
max
max
i
u
i
. The first condition, that W is irreducible, is equivalent to the statement that the
subgraph consisting of all the nodes and edges with nonzero weights is connected. We will
show that these conditions are implied by those established in this paper.
It should be noted that the problem considered in [HSS80] has nonnegativity constraints
on the variables: x
i
0, i = 1, . . . , n (with c > 0). They gave a separate initialization proce-
dure, which identifies and eliminates some nodes that will have zero value at optimality (not
necessarily all such nodes). As a result of this initialization procedure and some additional
conditions on the initial point x(0), all following iterates of the center-free algorithm (9)
4

automatically satisfy the nonnegativity constraints. In [KS89], second derivatives of the
functions f
i
are used to modify the algorithm (9) (with a constant weight on all edges) to
obtain faster convergence. An interesting analogy between various iterative algorithms for
solving problem (1) and the dynamics of several electrical networks can be found in [Ser80].
Many interesting similarities exist between the resource allocation problem and network
flow problems with convex separable cost (see, e.g., [Roc84, BT89, Ber98] and references
therein). In particular, by ignoring the local information structure, problem (1) can be
formulated as a simple network flow problem with two nodes and n links connecting them.
Thus many distributed algorithms for network flow problems such as those in [TB86, Baz96]
can be used; see also [LT94] for a convergence rate analysis of such an algorithm. However,
with the imposed local information structure on a graph, the resource allocation problem
has a distinct nature, and the above mentioned algorithms cannot be applied directly. The
center-free algorithm considered in this paper belongs to a more general class of gradient-like
algorithms studied in [TBA86].
In this paper, we give weaker sufficient conditions than (10) for the center-free algorithm
to convergence, and optimize the edge weights to get fast convergence. Our method is closely
related to the approach in [BDX03], where the problem of finding the fastest mixing Markov
chain on a graph is considered. In [XB03], the same approach was used to find fast linear
iterations for a distributed average consensus problem.
1.3 Outline
In §2, we give sufficient conditions on the weight matrix W under which the algorithm (5)
converges to the optimal solution monotonically. These conditions involve a linear matrix
inequality (LMI) in the weight matrix. Moreover, we quantify the convergence by deriving a
guaranteed convergence rate for the algorithm. In §3, we give some simple, explicit choices
for the weight matrix W that satisfy the convergence conditions. In §4, we propose to
minimize the guaranteed convergence rate obtained in §2 in order to get fast convergence
of the algorithm (5). We observe that the optimal weights (in the sense of minimizing the
guaranteed convergence rate) can be found by solving a semidefinite program (SDP). In §5,
we show some numerical examples that demonstrate the effectiveness of the proposed weight
selection methods. Finally, in §6, we extend the main results to problems with general
equality constraints, and problems with block separable objective functions. We give our
conclusions and some final remarks in §7.
2 Convergence conditions
In this section, we state and prove the main theorem. We use the following notation: L and U
denote diagonal matrices in R
n×n
whose diagonal entries are the lower bounds l
i
and upper
bounds u
i
given in (2). Note that L and U are positive definite. For a symmetric matrix Z,
we list its eigenvalues (all real) in nonincreasing order, as λ
1
(Z) λ
2
(Z) · · · λ
n
(Z),
where λ
i
(Z) denotes the ith largest eigenvalue of Z.
5

Citations
More filters
Journal ArticleDOI

Initialization-free distributed algorithms for optimal resource allocation with feasibility constraints and application to economic dispatch of power systems

TL;DR: The proposed algorithm is applied to the distributed economic dispatch problem in power grids, to demonstrate how it can achieve the global optimum in a scalable way, even when the generation cost, or system load, or network configuration, is changing.
Journal ArticleDOI

Distributed Generator Coordination for Initialization and Anytime Optimization in Economic Dispatch

TL;DR: A class of distributed Laplacian-gradient dynamics that are guaranteed to asymptotically find the solution to the economic dispatch problem with and without generator constraints are proposed.
Journal ArticleDOI

Newton-Raphson Consensus for Distributed Convex Optimization

TL;DR: A design methodology that combines average consensus algorithms and separation of time-scales ideas is proposed and this strategy is proved, under suitable hypotheses, to be globally convergent to the true minimizer.
Journal ArticleDOI

Online Optimal Generation Control Based on Constrained Distributed Gradient Algorithm

TL;DR: This paper proposed a multi-agent system based distributed control solution that can realize optimal generation control and is designed based upon an improved distributed gradient algorithm, which can address both equality and inequality constraints.

Convex optimization of graph Laplacian eigenvalues

Stephen Boyd
TL;DR: This work considers the problem of choosing the edge weights of an undirected graph so as to maximize or minimize some function of the eigenvalues of the associated Laplacian matrix, subject to some constraints on the weights, such as nonnegativity, or a given total value.
References
More filters
Journal ArticleDOI

Equation of state calculations by fast computing machines

TL;DR: In this article, a modified Monte Carlo integration over configuration space is used to investigate the properties of a two-dimensional rigid-sphere system with a set of interacting individual molecules, and the results are compared to free volume equations of state and a four-term virial coefficient expansion.
Book

Convex Optimization

TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.
Book

Matrix Analysis

TL;DR: In this article, the authors present results of both classic and recent matrix analyses using canonical forms as a unifying theme, and demonstrate their importance in a variety of applications, such as linear algebra and matrix theory.
Book

Nonlinear Programming

Related Papers (5)
Frequently Asked Questions (17)
Q1. What have the authors contributed in "Optimal scaling of a gradient method for distributed resource allocation" ?

The authors consider a class of weighted gradient methods for distributed resource allocation over a network. 

The center-free algorithm considered in this paper belongs to a more general class of gradient-like algorithms studied in [TBA86]. 

The simplest and most commonly used method is to have constant weight on all the edges of the graph, and obtain the self-weights Wii from the equality constraint W1 = 0:Wij = α (i, j) ∈ E −diα i = j 0 otherwise,where di = |Ni| is the degree of node i. 

When the lower and upper bounds L and U are the only information available, it is reasonable to choose the weight matrix to minimize the guaranteed convergence rate η established in theorem 1. 

For symmetric weight matrices, each edge of the graph is bidirectional and has the same weight in both directions, so each can be considered as an undirected edge with a single weight. 

In the special case of choosing the constant edge weight to minimize the guaranteed rate, the authors show that with appropriate scaling of the objective functions, the solution can be directly given in terms of the eigenvalues of the Laplacian matrix of the graph. 

The authors observe that the optimal weights (in the sense of minimizing the guaranteed convergence rate) can be found by solving a semidefinite program (SDP). 

They considered an undirected graph with symmetric weights on the edges, and called algorithms of this form center-free algorithms. 

When the weight matrix W is symmetric, the convergence conditions reduce toW = W T , W1 = 0 (20) 2W + (1/n)11T 0 (21) 2U−1 − W 0. (22)To see this, the authors first rewrite the LMI (14) for symmetric W :[2W + (1/n)11T W W U−1]0. 

While these weights evidently yield faster convergence of the method (compared to, say, a maximum degree or Metropolis choice of weights), it requires real computation, i.e., the solution of an SDP. 

unless all the nodes have the same value of diui, the authors can always setWij = −min{1diui ,1djuj}, (i, j) ∈ E . (26)The authors call these weights the Metropolis weights, because the main idea of this method relates to the Metropolis algorithms for choosing transition probabilities on a graph to make the associated Markov chain mix rapidly ([MRR+53]; see also, e.g., [DSC98, BDX03]). 

The derivation of the equation (19) holds without assuming the convergence condition (11), so long as the authors interpret λn−1(V ) as the smallest eigenvalue of V excluding the zero eigenvalue associated with the eigenvector 1. 

(The nonsymmetric weight matrix W found by solving the SDP (31) has two positive off-diagonal entries; see the comment in the paragraph after equation (9).) 

This is equivalent to maximizing the second smallest eigenvalue of the matrix V , i.e.,maximize λn−1 ( L1/2(W + W T − W T UW )L1/2 ) subject to W ∈ S, 1T W = 0, W1 = 0, (27)where the optimization variable is W . 

From the condition (23), it is straightforward to obtain the following ranges for the edge weights that guarantee the convergence of the algorithm:−min{1diui ,1djuj}< Wij < 0, (i, j) ∈ E . 

The authors discuss how to exploit sparsity in interior-point methods and a simple subgradient method for solving a similar class of SDPs in [XB03]. 

(28)Using Schur complements, the above quadratic matrix inequality is equivalent to the LMI[ W + W T − s (L−1 − 1 1T L−11L−111T L−1 ) W TW U−1]0. (29)Therefore the eigenvalue optimization problem (27) is equivalent to the SDPmaximize ssubject to W ∈ S, 1T W = 0, W1 = 0 [W + W T − s (L−1 − 1 1T L−11L−111T L−1 ) W TW U−1]0(30)with optimization variables s and W .