Proceedings Article•DOI•

Distributed constrained convex optimization and consensus via dual decomposition and proximal minimization

Q: What contributions have the authors mentioned in the paper "Distributed constrained convex optimization and consensus via dual decomposition and proximal minimization" ?

The authors consider a general class of convex optimization problems over time-varying, multi-agent networks, that naturally arise in many application domains like energy systems and wireless networks. The authors propose a novel distributed algorithm to deal with such problems based on a combination of dual decomposition and proximal minimization. Their approach is based on an iterative scheme that enables agents to reach consensus with respect to the dual variables, while preserving information privacy. The authors show convergence of the proposed algorithm to some optimal dual solution of the centralized problem counterpart, while the primal iterates generated by the algorithm converge to the set of optimal primal solutions. A numerical example demonstrating the efficacy of the proposed algorithm is also provided.

Q: what is the weight of agent i to the solution of agent j?

Coefficient aij(k) is the weight that agent i attributes to the solution of agent j at iteration k; aij(k) = 0 means that agent j does not send any information to agent i at iteration k.

Q: What is the vector of ni decision variables of agent i?

Consider a time-varying network of m agents that communicate to solve the following optimization programP : min {xi∈Xi}mi=1 m∑ i=1 fi(xi)subject to: m∑ i=1 gi(xi) ≤ 0, (1)where for each i = 1, . . . ,m, xi ∈ Rni is the vector of ni decision variables of agent i, fi(·) : Rni →

Q: What is the way to define gi?

It is perhaps worth noticing that this setup comprises also equality coupling constraints like∑mi=1 g̃i(xi) = 0. To this purpose it is enough to define gi = [g̃ > i −g̃>i ]>.

Alessandro Falsone¹, Kostas Margellos², Simone Garatti¹, Maria Prandini¹•Institutions (2)

Polytechnic University of Milan¹, University of Oxford²

01 Dec 2016-pp 1889-1894

TL;DR: This work considers a general class of convex optimization problems over time-varying, multi-agent networks, that naturally arise in many application domains like energy systems and wireless networks and proposes a novel distributed algorithm based on a combination of dual decomposition and proximal minimization.

read less

Abstract: We consider a general class of convex optimization problems over time-varying, multi-agent networks, that naturally arise in many application domains like energy systems and wireless networks. In particular, we focus on programs with separable objective functions, local (possibly different) constraint sets and a coupling inequality constraint expressed as the non-negativity of the sum of convex functions, each corresponding to one agent. We propose a novel distributed algorithm to deal with such problems based on a combination of dual decomposition and proximal minimization. Our approach is based on an iterative scheme that enables agents to reach consensus with respect to the dual variables, while preserving information privacy. Specifically, agents are not required to disclose information about their local objective and constraint functions, nor to assume knowledge of the coupling constraint. Our analysis can be thought of as a generalization of dual gradient/subgradient algorithms to a distributed set-up. We show convergence of the proposed algorithm to some optimal dual solution of the centralized problem counterpart, while the primal iterates generated by the algorithm converge to the set of optimal primal solutions. A numerical example demonstrating the efficacy of the proposed algorithm is also provided.

...read moreread less

Summary (2 min read)

Jump to: [Introduction] – [A. Problem statement and proposed solution] – [B. Structural and communication assumptions] – [C. Statement of the main results] – [D. Sketch of the proof] – [III. NUMERICAL EXAMPLE] and [IV. CONCLUDING REMARKS]

Introduction

Optimization in multi-agent networks has attracted significant interest from both the control and the operations research community, and has already found numerous applications in different domains, like power systems [1], [2], wireless networks [3], [4], robotics [5], etc.
They then exchange the outcome of this computation (but not their private information) with neighboring agents, and the process is then repeated on the basis of the information received.
Applying the methodologies of the aforementioned references to this problem, though possible, would unnecessarily increase the computational and communication effort, since it would require each agent to maintain an estimate of the decision vectors of all other agents when solving its local optimization program, and to communicate it to its neighboring agents.
In particular, the contributions of their paper can be summarized as follows: 1) We extend dual decomposition based algorithms to a distributed setting, accounting for possibly time-varying network connectivity.the authors.the authors.

A. Problem statement and proposed solution

Moreover, agent i may not be willing to share information about fi(·), Xi, and gi(·), with other agents, due to privacy issues.
To account for information privacy and facilitate the development of a computationally tractable solution, the authors seek for a distributed strategy.
In principle, (6) fits the framework of algorithms like [7], [10], the concave function ϕi(·) is implicitly defined through an optimization problem parametric in λ.
In particular, the update of the local primal vector xi(k + 1) (step 7) is the same as in dual decomposition, whereas, in contrast to dual decomposition, the update of the dual vector (step 8) involves also a proximal term, which facilitates consensus among the agents.

B. Structural and communication assumptions

The authors then impose the following connectivity and communication assumption.
The graph (V,E∞) is strongly connected, i.e., for any two nodes there exists a path of directed edges that connects them.

C. Statement of the main results

Under Assumptions 1-6, Algorithm 1 converges and agents reach consensus to a common vector of Lagrange multipliers.
In particular, their local estimates λi(k) converge to some optimal dual solution, while the vector x̂(k) = [x̂.
This is formally stated in the following theorems.
Theorem 1. [Dual Optimality] Consider Assumptions 1-6.

D. Sketch of the proof

The proofs of Theorems 1 and 2 are quite technical and require the derivation of several intermediate results, therefore they are omitted in the interest of space.
In the following the authors provide a sketch of the main idea behind their proofs, while for more details the interested reader is referred to [23].
This is established by showing that the sequence {λi(k)}k≥0 achieves the optimal value of the dual function across a subsequence, and relies on Proposition 4 of [7].
Finally, the proof of Theorem 2 follows from [24], extending its derivations to deal with the considered distributed context.

III. NUMERICAL EXAMPLE

For the sake of simplicity the authors assumed that the network does not change across iterations.
Since problem (13) has a unique coupling constraint, there is just one Lagrange multiplier λ ∈ R+.
The authors ran the Algorithm 1 for 1000 iterations.
By inspection of Figure 2, the average converge quite fast to the optimal Lagrange multipliers of (13) (red triangles), whereas all agents gradually reach consensus on those values.
Figure 3 shows the evolution of the primal objective value∑m i=1 fi(xi) (upper plot), and constraint violation in terms of ‖ ∑m i=1 gi(xi)‖∞ (lower plot), where xi is replaced by two different sequences: xi(k) (blue solid lines), and x̂i(k) (orange dashed lines), where the latter is given by (7).

IV. CONCLUDING REMARKS

A novel distributed algorithm to deal with a class of convex optimization programs that exhibit a separable structure was developed.
The authors considered an iterative scheme based on a combination of dual decomposition and proximal minimization, and they showed that this scheme converges to some optimal dual solution of the centralized problem counterpart, while the primal iterates generated by the algorithm converge to the set of optimal primal solutions.
Current work concentrates on a convergence rate analysis and further comparison with gradient/subgradient methods.
Moreover, the authors aim at relaxing the convexity assumption by extending the results of [29] to a distributed set-up, quantifying the duality gap incurred in case of mixed-integer programs.
From an application point of view, the main focus is on applying the proposed algorithm to the problem of energy efficient control of a building network [30].

Did you find this useful? Give us your feedback

Figures (4)

Fig. 4. Sequences {x̂1(k)}k≥0 (dashed lines) and {x̃1(k)}k≥0 (solid lines). Red triangles represent the optimal primal solution for agent 1.

Fig. 3. Evolution of primal objective ∑m i=1 fi(xi) (upper plot) and constraint violation ‖ ∑m i=1 gi(xi)‖∞ (lower plot) as a function of xi(k) (blue solid lines) and x̂i(k) (orange dashed lines).

Fig. 2. Evolution of the agents’ estimates λi(k), i = 1, . . . ,m, of the vector λ (blue dotted lines), and their arithmetic average v(k) (orange solid line). Red triangles represent the optimal dual solution.

Content maybe subject to copyright Report

Distributed constrained convex optimization and consensus

via dual decomposition and proximal minimization

Alessandro Falsone, Kostas Margellos, Simone Garatti, Maria Prandini

Abstract— We consider a general class of convex optimization

problems over time-varying, multi-agent networks, that natu-

rally arise in many application domains like energy systems

and wireless networks. In particular, we focus on programs

with separable objective functions, local (possibly different)

constraint sets and a coupling inequality constraint expressed

as the non-negativity of the sum of convex functions, each

corresponding to one agent. We propose a novel distributed

algorithm to deal with such problems based on a combination of

dual decomposition and proximal minimization. Our approach

is based on an iterative scheme that enables agents to reach

consensus with respect to the dual variables, while preserving

information privacy. Speciﬁcally, agents are not required to

disclose information about their local objective and constraint

functions, nor to assume knowledge of the coupling constraint.

Our analysis can be thought of as a generalization of dual

gradient/subgradient algorithms to a distributed set-up. We

show convergence of the proposed algorithm to some optimal

dual solution of the centralized problem counterpart, while the

primal iterates generated by the algorithm converge to the set of

optimal primal solutions. A numerical example demonstrating

the efﬁcacy of the proposed algorithm is also provided.

I. INTRODUCTION

Optimization in multi-agent networks has attracted sig-

niﬁcant interest from both the control and the operations

research community, and has already found numerous ap-

plications in different domains, like power systems [1],

[2], wireless networks [3], [4], robotics [5], etc. Typically,

agents cooperate to reach agreement/consensus on a common

decision, while optimizing a given performance criterion.

From a centralized perspective, this task can be represented

as an optimization problem deﬁned over the entire network,

but the resulting mathematical program is often of large size,

making numerical computations prohibitive for large scale

systems, and/or requires the presence of a central entity

to have access to agent speciﬁc information, e.g., agents’

utility/objective and constraint functions.

Distributed optimization offers the means to bypass these

limitations, that are inherent in centralized approaches, al-

lowing agents to keep information about their objective and

constraint functions private, while distributing computation,

Research was supported by the European Commission under the project

UnCoVerCPS, grant number 643921.

Alessandro Falsone, Simone Garatti and Maria Prandini are with

the Dipartimento di Elettronica Informazione e Bioingegneria, Po-

litecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milano,

Italy, e-mail: {alessandro.falsone, simone.garatti,

maria.prandini}@polimi.it

Kostas Margellos is with the Department of Engineering Science, Uni-

versity of Oxford, Parks Road, Oxford, OX1 3PJ, United Kingdom, e-mail:

kostas.margellos@eng.ox.ac.uk

thus leading to computational savings compared to central-

ized paradigms. Typical implementations involve applying an

iterative procedure, where at each iteration agents perform

some local computation. They then exchange the outcome

of this computation (but not their private information) with

neighboring agents, and the process is then repeated on the

basis of the information received.

A notable research activity oriented to the development of

distributed optimization algorithms over time-varying multi-

agent networks for general classes of convex problems has

ﬂourished in recent years. In particular, in [6], [7], [8], [9] a

gradient/subgradient based consensus approach is followed

to address problems where agents with their own objec-

tive functions and constraints are coupled via a common

decision vector. In [10] the authors revisit this problem

from a proximal minimization perspective, addressing also

the case where the agents’ constraints may be affected by

uncertainty. Another class of problems, which is the one

considered in this paper and which has attracted consid-

erable interest, involves programs with separable objective

functions each agent having its own decision vector, local

(possibly different) constraint sets, and a coupling inequality

constraint expressed as the non-negativity of the sum of

convex functions, each corresponding to one agent. Applying

the methodologies of the aforementioned references to this

problem, though possible, would unnecessarily increase the

computational and communication effort, since it would

require each agent to maintain an estimate of the decision

vectors of all other agents when solving its local optimization

program, and to communicate it to its neighboring agents.

To exploit the particular problem structure and allevi-

ate these difﬁculties, dual decomposition techniques (see

[11], and references therein), or approaches based on the

alternating direction method of multipliers [12], are often

employed, relying on the separable structure of the problem

after dualizing the coupling constraint. These methods are

based on time-invariant, connected networks, and require

a central update step for the dual variables, that should

be then communicated to all agents that are coupled via

the constraints. The latter, however, may not be possible

in time-varying connectivity set-ups. Standard incremental

gradient/subgradient algorithms [13], [14], [15] constitute

an alternative to dual decomposition, however, they require

agents to perform updates sequentially, in a cyclic or ran-

domized order, and hence do not allow for parallelizable

computations. Recently these techniques have been extended

to allow for distributed computation under the assumption

that the underlying network is time-invariant and the agents

have memory capabilities [16]. Other extensions of such

incremental algorithms are provided in [17], [18], [19],

[20], though addressing the problem under study using the

approaches proposed in the aforementioned references would

require all agents to store and exchange copies of their local

decision variables with their neighbors. This would result

in an unnecessary increase of the amount of communication

and, moreover, it requires an exchange of private information.

Another research direction involves primal-dual subgradient

based consensus algorithms [21], whereas in [22] a perturba-

tion variant with superior performance is adopted. However,

in the former the coupling constraint is assumed to be known

to all agents, whereas in the latter each agents’ objective

function is required to be differentiable.

In this paper we propose a novel distributed algorithm

to deal with optimization problems that exhibit the afore-

mentioned structure, based on a combination of dual de-

composition and proximal minimization. In particular, the

contributions of our paper can be summarized as follows:

1) We extend dual decomposition based algorithms to a

distributed setting, accounting for possibly time-varying net-

work connectivity. 2) We respect agents’ information privacy,

with agents not being required to share information about

their local objective function and constraint set, nor about

the constraint function that encodes their contribution to the

coupling constraint. In particular, agents are not required

to share their tentative estimates for the primal decision

variables, but only for the dual ones. 3) We provide a

proximal minimization perspective to gradient/subgradient

algorithms, that allows us to bypass the differentiability

assumptions on the primal objective functions, which is at

the basis of such algorithms, and/or the requirement for

gradient/subgradient computation.

The remainder of the paper unfolds as follows: Section II

provides a statement of the problem under study, introduces

the proposed algorithm, states the main results of the paper,

and provides a sketch of their proofs. In Section III we

demonstrate the efﬁcacy of the proposed algorithm on a

numerical example. Finally, Section IV concludes the paper

and provides some directions for future work. Complete

proofs of the main statements, as well as some intermediate

results, are omitted in the interest of space; they are, however,

available in [23].

II. DISTRIBUTED CONSTRAINED OPTIMIZATION

A. Problem statement and proposed solution

Consider a time-varying network of m agents that com-

municate to solve the following optimization program

P : min

∈X

}

i=1

)

subject to:

i=1

) ≤ 0,

(1)

where for each i = 1, . . . , m, x

∈ R

is the vector of n

decision variables of agent i, f

(·) : R

→ R is its objective

function, X

⊆ R

its local constraint set, and g

(·) : R

→

is a function which represents the contribution of agent

i to the coupling constraint

i=1

) ≤ 0.

Solving P in a centralized fashion, would likely result in

a computationally intensive program, especially in the case

where the number of interacting agents is high. Moreover,

agent i may not be willing to share information about f

(·),

, and g

(·), with other agents, due to privacy issues. To

account for information privacy and facilitate the develop-

ment of a computationally tractable solution, we seek for

a distributed strategy. Let x = [x

· · · x

]

∈ R

, with

n =

i=1

, and X = X

× · · · × X

. Motivated by the

separable structure of P, consider its dual problem, which is

given by

D : max

λ≥0

min

x∈X

L(x, λ), (2)

where λ is the vector of Lagrange multipliers, and λ ≥ 0

stands for λ ∈ R

, where R

denotes the p-th dimensional

non-negative orthant. The Lagrangian function L(x, λ) :

× R

→ R is given by

L(x, λ) =

i=1

, λ) =

i=1



) + λ

)



. (3)

The dual function can be then deﬁned as

ϕ(λ) = min

x∈X

L(x, λ). (4)

Since it is the point-wise minimum of afﬁne functions, ϕ(·) is

a concave function. Notice that, due to the separable structure

of the objective and the constraint functions in P,

ϕ(λ) =

i=1

(λ) =

i=1

min

∈X

, λ). (5)

Therefore, we can equivalently write D as

D : max

λ≥0

i=1

(λ), (6)

in which each agent i has its own dual function ϕ

(λ), and

the coupling between agents arises due to the fact that they

should all agree on the same vector λ.

Although, in principle, (6) ﬁts the framework of algorithms

like [7], [10], the concave function ϕ

(·) is implicitly deﬁned

through an optimization problem parametric in λ. This would

require each agent to handle a max-min optimization pro-

gram which, apart from speciﬁc cases, is in general difﬁcult

to solve.

We develop a distributed algorithm that is speciﬁcally

tailored to the resolution of D. Its basic steps are summarized

in Algorithm 1. At the initialization step, each agent i,

i = 1, . . . , m, considers an estimate of its local decision

vector such that ˆx

(0) ∈ X

, and an estimate of what the

solution of D is believed to be, i.e., λ

(0) ∈ R

(step 3

and step 4 of Algorithm 1, respectively). A sensible choice

is to set ˆx

(0) as ˆx

(0) ∈ arg min

∈X

), and to take

Inequality here is meant component-wise. It is perhaps worth notic-

ing that this setup comprises also equality coupling constraints like

i=1

˜g

) = 0. To this purpose it is enough to deﬁne g

= [˜g

−˜g

]

Algorithm 1 Distributed algorithm

1: Initialization

2: k = 0.

3: Consider ˆx

(0) ∈ X

, for all i = 1, . . . , m.

4: Consider λ

(0) ∈ R

, for all i = 1, . . . , m.

5: For i = 1, . . . , m repeat until convergence

6: `

(k) =

j=1

(k)λ

(k).

7: x

(k + 1) ∈ arg min

∈X

) + `

(k)

8: λ

(k + 1) = arg max

≥0



(k + 1))

−

2c(k)

kλ

− `

(k)k



9: ˆx

(k + 1) = ˆx

(k) +

c(k)

r=0

c(r)

(k + 1) − ˆx

(k)).

10: k ← k + 1.

(0) = 0, i = 1, . . . , m. At every iteration k, each agent

constructs a weighted average `

(k) of the solutions λ

(k),

j = 1, . . . , m, of the other agents and its local one (step 6).

Coefﬁcient a

(k) is the weight that agent i attributes to the

solution of agent j at iteration k; a

(k) = 0 means that agent

j does not send any information to agent i at iteration k.

The optimization program in (6) exhibits the same struc-

ture of the problem addressed in [10], but involves dual, as

opposed to primal, variables. This would motivate the use

of a proximal maximization, as opposed to minimization,

step, for agent i to update its local estimate λ

(k + 1), i.e.,

(k+1) = arg max

≥0

min

∈X



, λ

)−

2c(k)

kλ

−

(k)k



, k · k being the standard Euclidean norm. However,

this problem is in general not easy to solve because it requires

the solution of a max-min program. Therefore, we alternate

between a primal and a dual update step. In particular, the

update of the local primal vector x

(k + 1) (step 7) is the

same as in dual decomposition, whereas, in contrast to dual

decomposition, the update of the dual vector (step 8) involves

also a proximal term, which facilitates consensus among the

agents. Step 9 of Algorithm 1 returns an update ˆx

(k + 1)

for the auxiliary primal iterates. It can be easily shown that

ˆx

(k + 1) can be equivalently written as

ˆx

(k + 1) =

r=0

c(r)x

(r + 1)

r=0

c(r)

, (7)

i.e., it is a weighted running average of {x

(r + 1)}

r=0

Such an auxiliary sequence is referred to as primal recovery

procedure and it is often used in dual decomposition meth-

ods, since it has better convergence properties compared to

(k)}

k≥0

[24], [22], [21].

Note that, since the maximization program in step 8 of

Algorithm 1 is quadratic with respect to λ

, an explicit

resolution is possible, and step 8 can be equivalently writ-

ten as λ

(k + 1) = [`

(k) + c(k)g

(k + 1))]

, where

[ · ]

denotes the projection of its argument on R

. The

aforementioned representation resembles the structure of a

projected subgradient step, where g

(k + 1)) constitutes

a subgradient of ϕ

(·) evaluated at `

(k), and c(k) plays

the role of the gradient step. Throughout the manuscript

we will use both representations according to convenience;

however, the proximal perspective in step 8 of Algorithm

1, and some of its related properties that will be shown in

the sequel, are crucial in the convergence analysis of the

proposed algorithm, enabling us to extend the approach of

[24] to the distributed case, and overcome the requirement

imposed in [21] for the coupling constraint to be known to

all agents (notice that in our set-up agent i needs to know

only its contribution g

(·) to the coupling constraint).

B. Structural and communication assumptions

We impose the following assumptions.

Assumption 1. [Convexity] For each i = 1, . . . , m, function

(·) : R

→ R and the components of g

(·) : R

→ R

are convex; moreover, set X

⊆ R

is convex too.

Assumption 2. [Compactness] For each i = 1, . . . , m, the

set X

⊆ R

is compact.

Note that, under Assumptions 1 and 2, kg

)k is ﬁnite

for any x

∈ X

. Therefore we have that kg

)k ≤ G,

where G = max

i=1,...,m

max

∈X

)k.

Assumption 3. [Constraint qualiﬁcation] Problem P sat-

isﬁes the Slater’s condition, i.e., there exists a vector ˜x =

[˜x

· · · ˜x

]

∈ X and ρ ∈ R

with ρ 6= 0, such that

{x ∈ R

: kx − ˜xk ≤ ρ} ⊂ X and

i=1

(˜x

) < 0. An

equality in the latter condition is admitted only for those

components that are linear.

Assumptions 1-3 are sufﬁcient conditions for strong dual-

ity to hold, and for an optimal primal-dual pair (x

, λ

) to

exists, where x

= [x

· · · x

]

. Moreover, for any optimal

pair (x

, λ

) the Saddle-Point Theorem [25] holds, i.e.,

L(x

, λ) ≤ L(x

, λ

) ≤ L(x, λ

), (8)

for any λ ∈ R

and any x ∈ X.

Denote by X

× Λ

= X

× · · · × X

× Λ

the set of

all optimal primal-dual pairs. Under Assumptions 1-3, it was

shown in [24] that the set of optimal dual solutions Λ

bounded, and hence sup

λ∈Λ

kλk is ﬁnite.

We impose the following assumption on the time-varying

coefﬁcient c(k).

Assumption 4. [Coefﬁcient c(k)] Assume that for all k ≥

0, c(k) ∈ R

\ {0} and {c(k)}

k≥0

is a non-increasing

sequence, i.e., c(k) ≤ c(r) for all k ≥ r, with r ≥ 0.

Moreover,

∞

k=0

c(k) = ∞,

∞

k=0

c(k)

< ∞.

One possible choice for {c(k)}

k≥0

that satisﬁes Assump-

tion 4 is to take c(k) = β/(k + 1) for some β ∈ R

\ {0}.

This assumption is analogous to the one imposed by the

authors of [10], [7], [21].

In line with [26], [27], [28] we impose the following

assumptions on the information exchange between agents

and on the connectivity of the network.

Assumption 5. [Weight coefﬁcients] There exists η ∈ (0, 1)

such that for all i, j ∈ {1, . . . , m} and all k ≥ 0, a

(k) ∈

[0, 1), a

(k) ≥ η, and a

(k) > 0 implies that a

(k) ≥ η.

Moreover, for all k ≥ 0,

j=1

(k) = 1 for all i = 1, . . . , m,

i=1

(k) = 1 for all j = 1, . . . , m.

For each k ≥ 0 the information exchange between the m

agents can be represented by a directed graph (V, E

), where

the agents are the nodes V = {1, . . . , m}, and the set E

directed edges is deﬁned as



(j, i) : a

(k) > 0



, (9)

i.e., at time k the link (j, i) is present if agent j sends

information to agent i and agent i weight this information

with a

(k). If the communication link is not present we set

(k) = 0, otherwise if a

(k) > 0 we say that j is a neighbor

of agent of i at time k. In Algorithm 1 at each iteration each

agent exchanges information with its neighbors only, thus

accounting for a fully distributed setup.

Let E

∞



(j, i) : (j, i) ∈ E

for inﬁnitely many k



denote the set of edges (j, i) that represent agent pairs that

communicate directly inﬁnitely often. We then impose the

following connectivity and communication assumption.

Assumption 6. [Connectivity and communication] The

graph (V, E

∞

) is strongly connected, i.e., for any two nodes

there exists a path of directed edges that connects them.

Moreover, there exists T ≥ 1 such that for every (j, i) ∈ E

∞

agent i receives information from a neighboring agent j at

least once every consecutive T iterations.

For details about the interpretation of Assumptions 5 and

6, the reader is referred to [6], [10], [7].

C. Statement of the main results

Under Assumptions 1-6, Algorithm 1 converges and agents

reach consensus to a common vector of Lagrange multi-

pliers. In particular, their local estimates λ

(k) converge

to some optimal dual solution, while the vector ˆx(k) =

[ˆx

(k)

· · · ˆx

(k)

]

converges to the set of optimal pri-

mal solutions X

This is formally stated in the following theorems.

Theorem 1. [Dual Optimality] Consider Assumptions 1-6.

We have that, for some λ

∈ Λ

lim

k→∞

kλ

(k) − λ

k = 0, for all i = 1, . . . , m. (10)

Theorem 2. [Primal Optimality] Consider Assumptions 1-6.

We have that

lim

k→∞

dist(ˆx(k), X

) = 0, (11)

where dist(y, Z) = min

z∈Z

ky − zk denotes the distance

between y and the set Z.

D. Sketch of the proof

The proofs of Theorems 1 and 2 are quite technical

and require the derivation of several intermediate results,

therefore they are omitted in the interest of space. In the

following we provide a sketch of the main idea behind their

proofs, while for more details the interested reader is referred

to [23].

Let v(k) =

(k) be the average of all tentative

Lagrange multipliers at iteration k, and e

(k + 1) = λ

(k +

1) − `

(k + 1) the consensus error. Assumptions 4-6 can

be exploited to prove certain relations among the quantities

kλ

(k+1)−v(k+1)k, ke

(k+1)k, and c(k). Assumptions 1-3

allow us to embed the obtained results in an inequality that

relates consecutive terms of the sequence {

kλ

(k) −

}

k≥0

. Speciﬁcally, it can be shown that

i=1

kλ

(k + 1) − λ

≤

i=1

kλ

(k) − λ

− γ

i=1

(k + 1)k

+ γ

c(k)

+ γ

c(k)

i=1

kλ

(k + 1) − v(k + 1)k, (12)

where γ

, γ

, and γ

are appropriate positive constants, to-

gether with the fact that

∞

k=0

c(k)

i=1

kλ

(k+1)−v(k +

1)k < ∞. Such a relationship can be exploited to show that

the consensus error vanishes, i.e. lim

k→∞

(k)k = 0, and

that consensus is achieved, i.e. lim

k→∞

kλ

(k)− v(k)k = 0,

for all i = 1, . . . , m. Furthermore, (12) can also be exploited

to show convergence of the {

kλ

(k) − λ

}

k≥0

se-

quence due to the deterministic version of the supermartin-

gale convergence theorem [14] (Proposition 8.2.10, p. 489)

and the fact that the last two terms in the right hand side of

(12) are summable.

Once convergence has been proven it is sufﬁcient to

show that there exist a subsequence for which the quantity

kλ

(k) − λ

goes to 0 to prove Theorem 1. This

is established by showing that the sequence {λ

(k)}

k≥0

achieves the optimal value of the dual function across a

subsequence, and relies on Proposition 4 of [7].

Finally, the proof of Theorem 2 follows from [24], ex-

tending its derivations to deal with the considered distributed

context.

III. NUMERICAL EXAMPLE

In this section we present a numerical example to show

the validity of the proposed approach. Consider a network

of m = 8 agents connected as in Figure 1. For the sake

of simplicity we assumed that the network does not change

across iterations. Each agent i, i = 1, . . . , m, has n

= 2

optimization variables, a local objective function deﬁned as

) = ξ

, and its own constraint set X

= [−5, 5] ×

[−5, 5]. The coefﬁcients ξ

, . . . , ξ

are independently ex-

tracted at random from a Gaussian probability distribution

with zero mean and covariance matrix equal to 25I

, I

being the identity matrix of order 2. Consider now a coupling

constraint given by

i=1

≤ b, where b = 25m. The

Fig. 1. Network of m = 8 agents.

resulting optimization program is given by

min

∈X

}

i=1

)

subject to:

i=1

≤ b.

(13)

To put (13) in the form of P it sufﬁces to set

) = kx

−

, (14)

for all i = 1, . . . , m, which is quadratic and convex. Since

problem (13) has a unique coupling constraint, there is just

one Lagrange multiplier λ ∈ R

We ran the Algorithm 1 for 1000 iterations. Figure 2 shows

the evolution of the agents’ estimates λ

(k), i = 1, . . . , m, of

the optimal value of λ (blue dotted lines) and their average

v(k) (orange solid line), as the algorithm progresses. By

Iteration

0 200 400 600 800 1000

Lagrange multipliers

0.5

1.5

2.5

3.5

Fig. 2. Evolution of the agents’ estimates λ

(k), i = 1, . . . , m, of the

vector λ (blue dotted lines), and their arithmetic average v(k) (orange solid

line). Red triangles represent the optimal dual solution.

Objective value

-500

-400

-300

-200

Iteration

0 200 400 600 800 1000

Constraint violation

100

150

Fig. 3. Evolution of primal objective

i=1

) (upper plot) and

constraint violation k

i=1

∞

(lower plot) as a function of x

(k)

(blue solid lines) and ˆx

(k) (orange dashed lines).

inspection of Figure 2, the average converge quite fast to the

optimal Lagrange multipliers of (13) (red triangles), whereas

all agents gradually reach consensus on those values.

Figure 3 shows the evolution of the primal objective value

i=1

) (upper plot), and constraint violation in terms

of k

i=1

∞

(lower plot), where x

is replaced by

two different sequences: x

(k) (blue solid lines), and ˆx

(k)

(orange dashed lines), where the latter is given by (7).

For the sake of completeness we also show in Figure 4

the evolution of the sequences {x

(k)}

k≥0

(solid lines), and

{ˆx

(k)}

k≥0

(dashed lines) for the ﬁrst agent (i = 1); the

evolution for other agents is similar.

IV. CONCLUDING REMARKS

In this paper, a novel distributed algorithm to deal with

a class of convex optimization programs that exhibit a

separable structure was developed. We considered an iterative

Iteration

0 200 400 600 800 1000

Primal variables

-6

-4

-2

Agent 1

Fig. 4. Sequences {ˆx

(k)}

k≥0

(dashed lines) and {˜x

(k)}

k≥0

(solid

lines). Red triangles represent the optimal primal solution for agent 1.

HTML Viewer

Frequently Asked Questions (7)

Q1. What contributions have the authors mentioned in the paper "Distributed constrained convex optimization and consensus via dual decomposition and proximal minimization" ?

The authors consider a general class of convex optimization problems over time-varying, multi-agent networks, that naturally arise in many application domains like energy systems and wireless networks. The authors propose a novel distributed algorithm to deal with such problems based on a combination of dual decomposition and proximal minimization. Their approach is based on an iterative scheme that enables agents to reach consensus with respect to the dual variables, while preserving information privacy. The authors show convergence of the proposed algorithm to some optimal dual solution of the centralized problem counterpart, while the primal iterates generated by the algorithm converge to the set of optimal primal solutions. A numerical example demonstrating the efficacy of the proposed algorithm is also provided.

Q2. what is the weight of agent i to the solution of agent j?

Coefficient aij(k) is the weight that agent i attributes to the solution of agent j at iteration k; aij(k) = 0 means that agent j does not send any information to agent i at iteration k.

Q3. What is the vector of ni decision variables of agent i?

Consider a time-varying network of m agents that communicate to solve the following optimization programP : min {xi∈Xi}mi=1 m∑ i=1 fi(xi)subject to: m∑ i=1 gi(xi) ≤ 0, (1)where for each i = 1, . . . ,m, xi ∈ Rni is the vector of ni decision variables of agent i, fi(·) : Rni →

Q4. Why is the objective and constraint functions in P separable?

Notice that, due to the separable structure of the objective and the constraint functions in P ,ϕ(λ) = m∑ i=1 ϕi(λ) = m∑ i=1 min xi∈Xi Li(xi, λ).

Q5. What is the way to define gi?

It is perhaps worth noticing that this setup comprises also equality coupling constraints like∑mi=1 g̃i(xi) = 0. To this purpose it is enough to define gi = [g̃ > i −g̃>i ]>.

Q6. What is the difference between a primal and a dual update step?

Such an auxiliary sequence is referred to as primal recovery procedure and it is often used in dual decomposition methods, since it has better convergence properties compared to {xi(k)}k≥0 [24], [22], [21].

Q7. What is the purpose of this article?

To account for information privacy and facilitate the development of a computationally tractable solution, the authors seek for a distributed strategy.

Distributed constrained convex optimization and consensus via dual decomposition and proximal minimization

Summary (2 min read)

Introduction

A. Problem statement and proposed solution

B. Structural and communication assumptions

C. Statement of the main results

D. Sketch of the proof

III. NUMERICAL EXAMPLE

IV. CONCLUDING REMARKS

Figures (4)

Citations

Cites background from "Distributed constrained convex opti..."

References

"Distributed constrained convex opti..." refers methods in this paper

"Distributed constrained convex opti..." refers background in this paper

Related Papers (5)

Frequently Asked Questions (7)

Q1. What contributions have the authors mentioned in the paper "Distributed constrained convex optimization and consensus via dual decomposition and proximal minimization" ?

Q2. what is the weight of agent i to the solution of agent j?

Q3. What is the vector of ni decision variables of agent i?

Q4. Why is the objective and constraint functions in P separable?

Q5. What is the way to define gi?

Q6. What is the difference between a primal and a dual update step?

Q7. What is the purpose of this article?