What is the common method to select the weight matrix?

The simplest and most commonly used method is to have constant weight on all the edges of the graph, and obtain the self-weights Wii from the equality constraint W1 = 0:Wij = α (i, j) ∈ E −diα i = j 0 otherwise,where di = |Ni| is the degree of node i.

What is the way to minimize the convergence rate of the algorithm?

When the lower and upper bounds L and U are the only information available, it is reasonable to choose the weight matrix to minimize the guaranteed convergence rate η established in theorem 1.

What is the way to determine the weight of a graph?

For symmetric weight matrices, each edge of the graph is bidirectional and has the same weight in both directions, so each can be considered as an undirected edge with a single weight.

What is the way to minimize the rate of convergence?

In the special case of choosing the constant edge weight to minimize the guaranteed rate, the authors show that with appropriate scaling of the objective functions, the solution can be directly given in terms of the eigenvalues of the Laplacian matrix of the graph.

What is the LMI for symmetric W?

When the weight matrix W is symmetric, the convergence conditions reduce toW = W T , W1 = 0 (20) 2W + (1/n)11T 0 (21) 2U−1 − W 0. (22)To see this, the authors first rewrite the LMI (14) for symmetric W :[2W + (1/n)11T W W U−1]0.

What is the cost of computing the optimal weights in the SDP-based weight selection method?

While these weights evidently yield faster convergence of the method (compared to, say, a maximum degree or Metropolis choice of weights), it requires real computation, i.e., the solution of an SDP.

What is the way to select the weight matrix?

unless all the nodes have the same value of diui, the authors can always setWij = −min{1diui ,1djuj}, (i, j) ∈ E . (26)The authors call these weights the Metropolis weights, because the main idea of this method relates to the Metropolis algorithms for choosing transition probabilities on a graph to make the associated Markov chain mix rapidly ([MRR+53]; see also, e.g., [DSC98, BDX03]).

What is the smallest eigenvalue of V?

The derivation of the equation (19) holds without assuming the convergence condition (11), so long as the authors interpret λn−1(V ) as the smallest eigenvalue of V excluding the zero eigenvalue associated with the eigenvector 1.

What is the weight matrix W found by solving the SDP?

(The nonsymmetric weight matrix W found by solving the SDP (31) has two positive off-diagonal entries; see the comment in the paragraph after equation (9).)

What is the way to maximize the eigenvalue of the matrix V?

This is equivalent to maximizing the second smallest eigenvalue of the matrix V , i.e.,maximize λn−1 ( L1/2(W + W T − W T UW )L1/2 ) subject to W ∈ S, 1T W = 0, W1 = 0, (27)where the optimization variable is W .

What is the condition to obtain the weights of the graph?

From the condition (23), it is straightforward to obtain the following ranges for the edge weights that guarantee the convergence of the algorithm:−min{1diui ,1djuj}< Wij < 0, (i, j) ∈ E .

What is the way to solve a similar class of SDPs?

The authors discuss how to exploit sparsity in interior-point methods and a simple subgradient method for solving a similar class of SDPs in [XB03].

What is the way to solve the eigenvalue optimization problem?

(28)Using Schur complements, the above quadratic matrix inequality is equivalent to the LMI[ W + W T − s (L−1 − 1 1T L−11L−111T L−1 ) W TW U−1]0. (29)Therefore the eigenvalue optimization problem (27) is equivalent to the SDPmaximize ssubject to W ∈ S, 1T W = 0, W1 = 0 [W + W T − s (L−1 − 1 1T L−11L−111T L−1 ) W TW U−1]0(30)with optimization variables s and W .

(Open Access) Optimal Scaling of a Gradient Method for Distributed Resource Allocation (2006) | Lin Xiao

Q: What have the authors contributed in "Optimal scaling of a gradient method for distributed resource allocation" ?

The authors consider a class of weighted gradient methods for distributed resource allocation over a network.

Q: How can the authors find the optimal weights for the algorithm?

The authors observe that the optimal weights (in the sense of minimizing the guaranteed convergence rate) can be found by solving a semidefinite program (SDP).

Optimal Scaling of A Gradient Method for Distributed

Resource Allocation

∗

Lin Xiao

†

Stephen Boyd

‡

Revised February 15, 2005

Abstract

We consider a class of weighted gradient methods for distributed resource allocation

over a network. Each node of the network is associated with a local variable and a

convex cost function; the sum of the variables (resources) across the network is ﬁxed.

Starting with a feasible allocation, each node updates its local variable in proportion

to the diﬀerences between the marginal costs of itself and its neighbors. We focus on

how to choose the proportional weights on the edges (scaling factors for the gradient

method) to make this distributed algorithm converge, and how to make the convergence

as fast as possible.

We give suﬃcient conditions on the edge weights for the algorithm to converge

monotonically to the optimal solution; these conditions have the form of a linear ma-

trix inequality. We give some simple, explicit methods to choose the weights that satisfy

these suﬃcient conditions. We also derive a guaranteed convergence rate for the algo-

rithm, and ﬁnd the weights that minimize this rate by solving a semideﬁnite program.

Finally, we extend the main results to problems with general equality constraints, and

problems with block separable objective function.

Key words: distributed optimization, resource allocation, weighted gradient method,

convergence rate, semideﬁnite programming.

∗

To appear in Journal of Optimization Theory and Applications, vol. 129, no. 3, 2006.

†

Center for the Mathematics of Information, Mail Code 136-93, California Institute of Technology,

Pasadena, CA 91125-9300. Email: lxiao@caltech.edu.

‡

Department of Electrical Engineering, Stanford University, Stanford, CA 94305-9510. Email:

boyd@stanford.edu.

1 Introduction

We consider an optimal resource allocation problem over a network of autonomous agents.

The network is modeled as a directed graph (V, E) with node set V = {1, . . . , n} and edge

set E ⊆ V × V. Each edge (i, j) is an ordered pair of distinct nodes. We deﬁne N

, the set

of (oriented) neighbors of node i, as N

= {j | (i, j) ∈ E} (in other words, j ∈ N

if there is

an edge from node i to node j).

With node i we associate a variable x

∈ R and a corresponding convex cost function f

R → R. We consider the following optimization problem:

minimize

i=1

)

subject to

i=1

= c,

(1)

where c ∈ R is a given constant. We can think of x

as the amount of some resource located

at node i, and interpret −f

as the local (concave) utility function. The problem (1) is

to ﬁnd an allocation of the resource that maximizes the total utility

i=1

−f

). In this

paper, we are interested in distributed algorithms for solving this problem, where each node

is only allowed to communicate with its neighbors and conduct local computation. Thus

the local information structure imposed by the graph should be considered as part of the

problem formulation. This simple model for distributed resource allocation and its variations

have many applications in economic systems, e.g., [AH60, Hea69], and distributed computer

systems [KS89].

We assume that the functions f

are convex and twice continuously diﬀerentiable with

second derivatives that are bounded below and above:

≤ f

) ≤ u

, x

∈ R, i = 1, . . . , n, (2)

where l

> 0 and u

are known (the functions are strictly convex). Let x = (x

, . . . , x

) ∈ R

denote the vector of the variables and f(x) =

i=1

) denote the objective function.

We use f

to denote the optimal value of this problem; i.e., f

= inf{f(x) | 1

x = c},

where 1 denotes the vector with all components one. Under the above assumption, the convex

optimization problem (1) has a unique optimal solution x

. Let ∇f(x) = (f

), . . . , f

))

denote the gradient of f at x. The optimality conditions for this problem are

= c, ∇f(x

) = p

1, (3)

where p

is the (unique) optimal Lagrange multiplier.

In a centralized setup (i.e., all functions and their derivatives can be evaluated at a central

agent), many methods can be used to solve the problem (1), or equivalently, the optimality

conditions (3). If the functions f

are all quadratic, the optimality conditions (3) are a set

of linear equations in x

and p

, and can be solved directly. In the more general case, the

problem can be solved by iterative methods, e.g., the projected gradient method, Newton’s

method, quasi-Newton methods (e.g., BFGS method), and many others. Detailed accounts

of these algorithms (and others) can be found in, e.g., [Ber99] and [BV03].

The design of decentralized mechanisms for resource allocation has a long history in

economics [Hur73], and there are two main classes of mechanisms: price-directed [AH60]

and resource-directed [Hea69]. However, most of the methods are not fully distributed

because they either need a central price coordinator or need a central resource dispatcher.

So they cannot be applied to the problem we consider. In this paper we will focus on a class

of center-free algorithms ﬁrst proposed in [HSS80].

1.1 The center-free algorithm for resource allocation

Assume that we have an initial allocation of the resource x(0) that satisﬁes 1

x(0) = c. The

center-free algorithm for solving problem (1) has the following iterative form:

(t + 1) = x

(t) − W

(t)) −

j∈N

(t)), i = 1, . . . , n, (4)

for t = 0, 1, . . .. In other words, at each iteration, each node computes the derivative of its

local function, queries the derivative values from its neighbors, and then updates its local

variable by a weighted sum of the values of derivatives. Here W

is the self-weight at node i,

and W

(j ∈ N

) is the weight associated with the edge (i, j) ∈ E. Setting W

= 0 for

j /∈ N

, this algorithm can be written in vector form as

x(t + 1) = x(t) − W ∇f(x(t)), (5)

where W ∈ R

n×n

is the weight matrix. Thus the center-free algorithm can be thought of as

a weighted gradient descent method, in which the weight matrix W has a sparsity constraint

given by the graph:

W ∈ S = {Z ∈ R

n×n

| Z

= 0 if i 6= j and (i, j) /∈ E}. (6)

Throughout this paper we focus on the following question: How should we choose the weight

matrix W ?

We ﬁrst consider two basic requirements on W . First, we require that all iterates x(t) of

the algorithm are feasible, i.e., satisfy 1

x(t) = c for all t. With the assumption that x(0)

is feasible, this requirement will be met provided the weight matrix satisﬁes

W = 0, (7)

since we then have

x(t + 1) = 1

x(t) − 1

W ∇f(x(t)) = 1

x(t).

We will also require, naturally, that the optimal point x

is a ﬁxed point of the algorithm (5),

i.e.,

= x

− W ∇f(x

) = x

− p

W 1.

This will hold in the general case (with p

6= 0) provided

W 1 = 0. (8)

The requirements (7) and (8) show that the vector 1 must be both a left and right eigenvector

of W , associated with the eigenvalue zero. One special case of interest is when the weight

matrix W is symmetric. In this case, of course, the requirements (7) and (8) are the same,

and simply state that 1 is in the nullspace of W .

Assuming the weights satisfy (8), we have W

= −

j∈N

, which can be substituted

into equation (4) to get

(t + 1) = x

(t) −

j∈N



(t)) − f

(t))



, i = 1, . . . , n. (9)

Thus, the change in the local variable at each step is given by a weighted sum of the diﬀerences

between its own derivative value and those of its neighbors. The equation (9) has a simple

interpretation: at each iteration, each connected pair of nodes shifts resources from the node

with higher marginal cost to the one with lower marginal cost, in proportion to the diﬀerence

in marginal costs. The weight −W

gives the proportionality constant on the edge (i, j) ∈ E.

(This interpretation suggests that the weights on edges should be negative, but we will see

examples where a few positive edge weights actually enhance the convergence rate.)

1.2 Previous work

Distributed resource allocation algorithms of the form (9) were ﬁrst proposed and studied by

Ho, Servi and Suri in [HSS80]. They considered an undirected graph with symmetric weights

on the edges, and called algorithms of this form center-free algorithms. (‘Center-free’ refers

to the absence of a central coordinating entity.) In the notation of this paper, they assumed

W = W

and W 1 = 0, and derived the following additional conditions on W that are

suﬃcient for the algorithm (9) to converge to the optimal solution x

(a) W is irreducible

(b) W

≤ 0, (i, j) ∈ E

(c)

j∈N

| < 1/u

max

, i = 1, . . . , n

(10)

where u

max

is an upper bound on the second derivatives of the functions f

, i.e., u

max

≥

max

. The ﬁrst condition, that W is irreducible, is equivalent to the statement that the

subgraph consisting of all the nodes and edges with nonzero weights is connected. We will

show that these conditions are implied by those established in this paper.

It should be noted that the problem considered in [HSS80] has nonnegativity constraints

on the variables: x

≥ 0, i = 1, . . . , n (with c > 0). They gave a separate initialization proce-

dure, which identiﬁes and eliminates some nodes that will have zero value at optimality (not

necessarily all such nodes). As a result of this initialization procedure and some additional

conditions on the initial point x(0), all following iterates of the center-free algorithm (9)

automatically satisfy the nonnegativity constraints. In [KS89], second derivatives of the

functions f

are used to modify the algorithm (9) (with a constant weight on all edges) to

obtain faster convergence. An interesting analogy between various iterative algorithms for

solving problem (1) and the dynamics of several electrical networks can be found in [Ser80].

Many interesting similarities exist between the resource allocation problem and network

ﬂow problems with convex separable cost (see, e.g., [Roc84, BT89, Ber98] and references

therein). In particular, by ignoring the local information structure, problem (1) can be

formulated as a simple network ﬂow problem with two nodes and n links connecting them.

Thus many distributed algorithms for network ﬂow problems such as those in [TB86, Baz96]

can be used; see also [LT94] for a convergence rate analysis of such an algorithm. However,

with the imposed local information structure on a graph, the resource allocation problem

has a distinct nature, and the above mentioned algorithms cannot be applied directly. The

center-free algorithm considered in this paper belongs to a more general class of gradient-like

algorithms studied in [TBA86].

In this paper, we give weaker suﬃcient conditions than (10) for the center-free algorithm

to convergence, and optimize the edge weights to get fast convergence. Our method is closely

related to the approach in [BDX03], where the problem of ﬁnding the fastest mixing Markov

chain on a graph is considered. In [XB03], the same approach was used to ﬁnd fast linear

iterations for a distributed average consensus problem.

1.3 Outline

In §2, we give suﬃcient conditions on the weight matrix W under which the algorithm (5)

converges to the optimal solution monotonically. These conditions involve a linear matrix

inequality (LMI) in the weight matrix. Moreover, we quantify the convergence by deriving a

guaranteed convergence rate for the algorithm. In §3, we give some simple, explicit choices

for the weight matrix W that satisfy the convergence conditions. In §4, we propose to

minimize the guaranteed convergence rate obtained in §2 in order to get fast convergence

of the algorithm (5). We observe that the optimal weights (in the sense of minimizing the

guaranteed convergence rate) can be found by solving a semideﬁnite program (SDP). In §5,

we show some numerical examples that demonstrate the eﬀectiveness of the proposed weight

selection methods. Finally, in §6, we extend the main results to problems with general

equality constraints, and problems with block separable objective functions. We give our

conclusions and some ﬁnal remarks in §7.

2 Convergence conditions

In this section, we state and prove the main theorem. We use the following notation: L and U

denote diagonal matrices in R

n×n

whose diagonal entries are the lower bounds l

and upper

bounds u

given in (2). Note that L and U are positive deﬁnite. For a symmetric matrix Z,

we list its eigenvalues (all real) in nonincreasing order, as λ

(Z) ≥ λ

(Z) ≥ · · · ≥ λ

(Z),

where λ

(Z) denotes the ith largest eigenvalue of Z.

Optimal Scaling of a Gradient Method for Distributed Resource Allocation

Figures

Citations

Initialization-free distributed algorithms for optimal resource allocation with feasibility constraints and application to economic dispatch of power systems

Distributed Generator Coordination for Initialization and Anytime Optimization in Economic Dispatch

Newton-Raphson Consensus for Distributed Convex Optimization

Online Optimal Generation Control Based on Constrained Distributed Gradient Algorithm

Convex optimization of graph Laplacian eigenvalues

References

Equation of state calculations by fast computing machines

Convex Optimization

Matrix Analysis

Nonlinear Programming

Matrix analysis: Frontmatter

Related Papers (5)

Constrained Consensus and Optimization in Multi-Agent Networks

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

Distributed Subgradient Methods for Multi-Agent Optimization

On Distributed Convex Optimization Under Inequality and Equality Constraints

Convex Optimization

Frequently Asked Questions (17)

Q1. What have the authors contributed in "Optimal scaling of a gradient method for distributed resource allocation" ?

Q2. What class of algorithms are considered in this paper?

Q3. What is the common method to select the weight matrix?

Q4. What is the way to minimize the convergence rate of the algorithm?

Q5. What is the way to determine the weight of a graph?

Q6. What is the way to minimize the rate of convergence?

Q7. How can the authors find the optimal weights for the algorithm?

Q8. What is the meaning of center-free algorithms?

Q9. What is the LMI for symmetric W?

Q10. What is the cost of computing the optimal weights in the SDP-based weight selection method?

Q11. What is the way to select the weight matrix?

Q12. What is the smallest eigenvalue of V?

Q13. What is the weight matrix W found by solving the SDP?

Q14. What is the way to maximize the eigenvalue of the matrix V?

Q15. What is the condition to obtain the weights of the graph?

Q16. What is the way to solve a similar class of SDPs?

Q17. What is the way to solve the eigenvalue optimization problem?