scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Distributed constrained convex optimization and consensus via dual decomposition and proximal minimization

TL;DR: This work considers a general class of convex optimization problems over time-varying, multi-agent networks, that naturally arise in many application domains like energy systems and wireless networks and proposes a novel distributed algorithm based on a combination of dual decomposition and proximal minimization.
Abstract: We consider a general class of convex optimization problems over time-varying, multi-agent networks, that naturally arise in many application domains like energy systems and wireless networks. In particular, we focus on programs with separable objective functions, local (possibly different) constraint sets and a coupling inequality constraint expressed as the non-negativity of the sum of convex functions, each corresponding to one agent. We propose a novel distributed algorithm to deal with such problems based on a combination of dual decomposition and proximal minimization. Our approach is based on an iterative scheme that enables agents to reach consensus with respect to the dual variables, while preserving information privacy. Specifically, agents are not required to disclose information about their local objective and constraint functions, nor to assume knowledge of the coupling constraint. Our analysis can be thought of as a generalization of dual gradient/subgradient algorithms to a distributed set-up. We show convergence of the proposed algorithm to some optimal dual solution of the centralized problem counterpart, while the primal iterates generated by the algorithm converge to the set of optimal primal solutions. A numerical example demonstrating the efficacy of the proposed algorithm is also provided.

Summary (2 min read)

Introduction

  • Optimization in multi-agent networks has attracted significant interest from both the control and the operations research community, and has already found numerous applications in different domains, like power systems [1], [2], wireless networks [3], [4], robotics [5], etc.
  • They then exchange the outcome of this computation (but not their private information) with neighboring agents, and the process is then repeated on the basis of the information received.
  • Applying the methodologies of the aforementioned references to this problem, though possible, would unnecessarily increase the computational and communication effort, since it would require each agent to maintain an estimate of the decision vectors of all other agents when solving its local optimization program, and to communicate it to its neighboring agents.
  • In particular, the contributions of their paper can be summarized as follows: 1) We extend dual decomposition based algorithms to a distributed setting, accounting for possibly time-varying network connectivity.the authors.the authors.

A. Problem statement and proposed solution

  • Moreover, agent i may not be willing to share information about fi(·), Xi, and gi(·), with other agents, due to privacy issues.
  • To account for information privacy and facilitate the development of a computationally tractable solution, the authors seek for a distributed strategy.
  • In principle, (6) fits the framework of algorithms like [7], [10], the concave function ϕi(·) is implicitly defined through an optimization problem parametric in λ.
  • In particular, the update of the local primal vector xi(k + 1) (step 7) is the same as in dual decomposition, whereas, in contrast to dual decomposition, the update of the dual vector (step 8) involves also a proximal term, which facilitates consensus among the agents.

B. Structural and communication assumptions

  • The authors then impose the following connectivity and communication assumption.
  • The graph (V,E∞) is strongly connected, i.e., for any two nodes there exists a path of directed edges that connects them.

C. Statement of the main results

  • Under Assumptions 1-6, Algorithm 1 converges and agents reach consensus to a common vector of Lagrange multipliers.
  • In particular, their local estimates λi(k) converge to some optimal dual solution, while the vector x̂(k) = [x̂.
  • This is formally stated in the following theorems.
  • Theorem 1. [Dual Optimality] Consider Assumptions 1-6.

D. Sketch of the proof

  • The proofs of Theorems 1 and 2 are quite technical and require the derivation of several intermediate results, therefore they are omitted in the interest of space.
  • In the following the authors provide a sketch of the main idea behind their proofs, while for more details the interested reader is referred to [23].
  • This is established by showing that the sequence {λi(k)}k≥0 achieves the optimal value of the dual function across a subsequence, and relies on Proposition 4 of [7].
  • Finally, the proof of Theorem 2 follows from [24], extending its derivations to deal with the considered distributed context.

III. NUMERICAL EXAMPLE

  • For the sake of simplicity the authors assumed that the network does not change across iterations.
  • Since problem (13) has a unique coupling constraint, there is just one Lagrange multiplier λ ∈ R+.
  • The authors ran the Algorithm 1 for 1000 iterations.
  • By inspection of Figure 2, the average converge quite fast to the optimal Lagrange multipliers of (13) (red triangles), whereas all agents gradually reach consensus on those values.
  • Figure 3 shows the evolution of the primal objective value∑m i=1 fi(xi) (upper plot), and constraint violation in terms of ‖ ∑m i=1 gi(xi)‖∞ (lower plot), where xi is replaced by two different sequences: xi(k) (blue solid lines), and x̂i(k) (orange dashed lines), where the latter is given by (7).

IV. CONCLUDING REMARKS

  • A novel distributed algorithm to deal with a class of convex optimization programs that exhibit a separable structure was developed.
  • The authors considered an iterative scheme based on a combination of dual decomposition and proximal minimization, and they showed that this scheme converges to some optimal dual solution of the centralized problem counterpart, while the primal iterates generated by the algorithm converge to the set of optimal primal solutions.
  • Current work concentrates on a convergence rate analysis and further comparison with gradient/subgradient methods.
  • Moreover, the authors aim at relaxing the convexity assumption by extending the results of [29] to a distributed set-up, quantifying the duality gap incurred in case of mixed-integer programs.
  • From an application point of view, the main focus is on applying the proposed algorithm to the problem of energy efficient control of a building network [30].

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Distributed constrained convex optimization and consensus
via dual decomposition and proximal minimization
Alessandro Falsone, Kostas Margellos, Simone Garatti, Maria Prandini
Abstract We consider a general class of convex optimization
problems over time-varying, multi-agent networks, that natu-
rally arise in many application domains like energy systems
and wireless networks. In particular, we focus on programs
with separable objective functions, local (possibly different)
constraint sets and a coupling inequality constraint expressed
as the non-negativity of the sum of convex functions, each
corresponding to one agent. We propose a novel distributed
algorithm to deal with such problems based on a combination of
dual decomposition and proximal minimization. Our approach
is based on an iterative scheme that enables agents to reach
consensus with respect to the dual variables, while preserving
information privacy. Specifically, agents are not required to
disclose information about their local objective and constraint
functions, nor to assume knowledge of the coupling constraint.
Our analysis can be thought of as a generalization of dual
gradient/subgradient algorithms to a distributed set-up. We
show convergence of the proposed algorithm to some optimal
dual solution of the centralized problem counterpart, while the
primal iterates generated by the algorithm converge to the set of
optimal primal solutions. A numerical example demonstrating
the efficacy of the proposed algorithm is also provided.
I. INTRODUCTION
Optimization in multi-agent networks has attracted sig-
nificant interest from both the control and the operations
research community, and has already found numerous ap-
plications in different domains, like power systems [1],
[2], wireless networks [3], [4], robotics [5], etc. Typically,
agents cooperate to reach agreement/consensus on a common
decision, while optimizing a given performance criterion.
From a centralized perspective, this task can be represented
as an optimization problem defined over the entire network,
but the resulting mathematical program is often of large size,
making numerical computations prohibitive for large scale
systems, and/or requires the presence of a central entity
to have access to agent specific information, e.g., agents’
utility/objective and constraint functions.
Distributed optimization offers the means to bypass these
limitations, that are inherent in centralized approaches, al-
lowing agents to keep information about their objective and
constraint functions private, while distributing computation,
Research was supported by the European Commission under the project
UnCoVerCPS, grant number 643921.
Alessandro Falsone, Simone Garatti and Maria Prandini are with
the Dipartimento di Elettronica Informazione e Bioingegneria, Po-
litecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milano,
Italy, e-mail: {alessandro.falsone, simone.garatti,
maria.prandini}@polimi.it
Kostas Margellos is with the Department of Engineering Science, Uni-
versity of Oxford, Parks Road, Oxford, OX1 3PJ, United Kingdom, e-mail:
kostas.margellos@eng.ox.ac.uk
thus leading to computational savings compared to central-
ized paradigms. Typical implementations involve applying an
iterative procedure, where at each iteration agents perform
some local computation. They then exchange the outcome
of this computation (but not their private information) with
neighboring agents, and the process is then repeated on the
basis of the information received.
A notable research activity oriented to the development of
distributed optimization algorithms over time-varying multi-
agent networks for general classes of convex problems has
flourished in recent years. In particular, in [6], [7], [8], [9] a
gradient/subgradient based consensus approach is followed
to address problems where agents with their own objec-
tive functions and constraints are coupled via a common
decision vector. In [10] the authors revisit this problem
from a proximal minimization perspective, addressing also
the case where the agents’ constraints may be affected by
uncertainty. Another class of problems, which is the one
considered in this paper and which has attracted consid-
erable interest, involves programs with separable objective
functions each agent having its own decision vector, local
(possibly different) constraint sets, and a coupling inequality
constraint expressed as the non-negativity of the sum of
convex functions, each corresponding to one agent. Applying
the methodologies of the aforementioned references to this
problem, though possible, would unnecessarily increase the
computational and communication effort, since it would
require each agent to maintain an estimate of the decision
vectors of all other agents when solving its local optimization
program, and to communicate it to its neighboring agents.
To exploit the particular problem structure and allevi-
ate these difficulties, dual decomposition techniques (see
[11], and references therein), or approaches based on the
alternating direction method of multipliers [12], are often
employed, relying on the separable structure of the problem
after dualizing the coupling constraint. These methods are
based on time-invariant, connected networks, and require
a central update step for the dual variables, that should
be then communicated to all agents that are coupled via
the constraints. The latter, however, may not be possible
in time-varying connectivity set-ups. Standard incremental
gradient/subgradient algorithms [13], [14], [15] constitute
an alternative to dual decomposition, however, they require
agents to perform updates sequentially, in a cyclic or ran-
domized order, and hence do not allow for parallelizable
computations. Recently these techniques have been extended
to allow for distributed computation under the assumption
that the underlying network is time-invariant and the agents

have memory capabilities [16]. Other extensions of such
incremental algorithms are provided in [17], [18], [19],
[20], though addressing the problem under study using the
approaches proposed in the aforementioned references would
require all agents to store and exchange copies of their local
decision variables with their neighbors. This would result
in an unnecessary increase of the amount of communication
and, moreover, it requires an exchange of private information.
Another research direction involves primal-dual subgradient
based consensus algorithms [21], whereas in [22] a perturba-
tion variant with superior performance is adopted. However,
in the former the coupling constraint is assumed to be known
to all agents, whereas in the latter each agents’ objective
function is required to be differentiable.
In this paper we propose a novel distributed algorithm
to deal with optimization problems that exhibit the afore-
mentioned structure, based on a combination of dual de-
composition and proximal minimization. In particular, the
contributions of our paper can be summarized as follows:
1) We extend dual decomposition based algorithms to a
distributed setting, accounting for possibly time-varying net-
work connectivity. 2) We respect agents’ information privacy,
with agents not being required to share information about
their local objective function and constraint set, nor about
the constraint function that encodes their contribution to the
coupling constraint. In particular, agents are not required
to share their tentative estimates for the primal decision
variables, but only for the dual ones. 3) We provide a
proximal minimization perspective to gradient/subgradient
algorithms, that allows us to bypass the differentiability
assumptions on the primal objective functions, which is at
the basis of such algorithms, and/or the requirement for
gradient/subgradient computation.
The remainder of the paper unfolds as follows: Section II
provides a statement of the problem under study, introduces
the proposed algorithm, states the main results of the paper,
and provides a sketch of their proofs. In Section III we
demonstrate the efficacy of the proposed algorithm on a
numerical example. Finally, Section IV concludes the paper
and provides some directions for future work. Complete
proofs of the main statements, as well as some intermediate
results, are omitted in the interest of space; they are, however,
available in [23].
II. DISTRIBUTED CONSTRAINED OPTIMIZATION
A. Problem statement and proposed solution
Consider a time-varying network of m agents that com-
municate to solve the following optimization program
P : min
{x
i
X
i
}
m
i=1
m
X
i=1
f
i
(x
i
)
subject to:
m
X
i=1
g
i
(x
i
) 0,
(1)
where for each i = 1, . . . , m, x
i
R
n
i
is the vector of n
i
decision variables of agent i, f
i
(·) : R
n
i
R is its objective
function, X
i
R
n
i
its local constraint set, and g
i
(·) : R
n
i
R
p
is a function which represents the contribution of agent
i to the coupling constraint
1
P
m
i=1
g
i
(x
i
) 0.
Solving P in a centralized fashion, would likely result in
a computationally intensive program, especially in the case
where the number of interacting agents is high. Moreover,
agent i may not be willing to share information about f
i
(·),
X
i
, and g
i
(·), with other agents, due to privacy issues. To
account for information privacy and facilitate the develop-
ment of a computationally tractable solution, we seek for
a distributed strategy. Let x = [x
>
1
· · · x
>
m
]
>
R
n
, with
n =
P
m
i=1
n
i
, and X = X
1
× · · · × X
m
. Motivated by the
separable structure of P, consider its dual problem, which is
given by
D : max
λ0
min
xX
L(x, λ), (2)
where λ is the vector of Lagrange multipliers, and λ 0
stands for λ R
p
+
, where R
p
+
denotes the p-th dimensional
non-negative orthant. The Lagrangian function L(x, λ) :
R
n
× R
p
+
R is given by
L(x, λ) =
m
X
i=1
L
i
(x
i
, λ) =
m
X
i=1
f
i
(x
i
) + λ
>
g
i
(x
i
)
. (3)
The dual function can be then defined as
ϕ(λ) = min
xX
L(x, λ). (4)
Since it is the point-wise minimum of affine functions, ϕ(·) is
a concave function. Notice that, due to the separable structure
of the objective and the constraint functions in P,
ϕ(λ) =
m
X
i=1
ϕ
i
(λ) =
m
X
i=1
min
x
i
X
i
L
i
(x
i
, λ). (5)
Therefore, we can equivalently write D as
D : max
λ0
m
X
i=1
ϕ
i
(λ), (6)
in which each agent i has its own dual function ϕ
i
(λ), and
the coupling between agents arises due to the fact that they
should all agree on the same vector λ.
Although, in principle, (6) fits the framework of algorithms
like [7], [10], the concave function ϕ
i
(·) is implicitly defined
through an optimization problem parametric in λ. This would
require each agent to handle a max-min optimization pro-
gram which, apart from specific cases, is in general difficult
to solve.
We develop a distributed algorithm that is specifically
tailored to the resolution of D. Its basic steps are summarized
in Algorithm 1. At the initialization step, each agent i,
i = 1, . . . , m, considers an estimate of its local decision
vector such that ˆx
i
(0) X
i
, and an estimate of what the
solution of D is believed to be, i.e., λ
i
(0) R
p
+
(step 3
and step 4 of Algorithm 1, respectively). A sensible choice
is to set ˆx
i
(0) as ˆx
i
(0) arg min
x
i
X
i
f
i
(x
i
), and to take
1
Inequality here is meant component-wise. It is perhaps worth notic-
ing that this setup comprises also equality coupling constraints like
P
m
i=1
˜g
i
(x
i
) = 0. To this purpose it is enough to define g
i
= [˜g
>
i
˜g
>
i
]
>
.

Algorithm 1 Distributed algorithm
1: Initialization
2: k = 0.
3: Consider ˆx
i
(0) X
i
, for all i = 1, . . . , m.
4: Consider λ
i
(0) R
p
+
, for all i = 1, . . . , m.
5: For i = 1, . . . , m repeat until convergence
6: `
i
(k) =
P
m
j=1
a
i
j
(k)λ
j
(k).
7: x
i
(k + 1) arg min
x
i
X
i
f
i
(x
i
) + `
i
(k)
>
g
i
(x
i
).
8: λ
i
(k + 1) = arg max
λ
i
0
g
i
(x
i
(k + 1))
>
λ
i
1
2c(k)
kλ
i
`
i
(k)k
2
9: ˆx
i
(k + 1) = ˆx
i
(k) +
c(k)
P
k
r=0
c(r)
(x
i
(k + 1) ˆx
i
(k)).
10: k k + 1.
λ
i
(0) = 0, i = 1, . . . , m. At every iteration k, each agent
constructs a weighted average `
i
(k) of the solutions λ
j
(k),
j = 1, . . . , m, of the other agents and its local one (step 6).
Coefficient a
i
j
(k) is the weight that agent i attributes to the
solution of agent j at iteration k; a
i
j
(k) = 0 means that agent
j does not send any information to agent i at iteration k.
The optimization program in (6) exhibits the same struc-
ture of the problem addressed in [10], but involves dual, as
opposed to primal, variables. This would motivate the use
of a proximal maximization, as opposed to minimization,
step, for agent i to update its local estimate λ
i
(k + 1), i.e.,
λ
i
(k+1) = arg max
λ
i
0
min
x
i
X
i
L
i
(x
i
, λ
i
)
1
2c(k)
kλ
i
`
i
(k)k
2
, k · k being the standard Euclidean norm. However,
this problem is in general not easy to solve because it requires
the solution of a max-min program. Therefore, we alternate
between a primal and a dual update step. In particular, the
update of the local primal vector x
i
(k + 1) (step 7) is the
same as in dual decomposition, whereas, in contrast to dual
decomposition, the update of the dual vector (step 8) involves
also a proximal term, which facilitates consensus among the
agents. Step 9 of Algorithm 1 returns an update ˆx
i
(k + 1)
for the auxiliary primal iterates. It can be easily shown that
ˆx
i
(k + 1) can be equivalently written as
ˆx
i
(k + 1) =
P
k
r=0
c(r)x
i
(r + 1)
P
k
r=0
c(r)
, (7)
i.e., it is a weighted running average of {x
i
(r + 1)}
k
r=0
.
Such an auxiliary sequence is referred to as primal recovery
procedure and it is often used in dual decomposition meth-
ods, since it has better convergence properties compared to
{x
i
(k)}
k0
[24], [22], [21].
Note that, since the maximization program in step 8 of
Algorithm 1 is quadratic with respect to λ
i
, an explicit
resolution is possible, and step 8 can be equivalently writ-
ten as λ
i
(k + 1) = [`
i
(k) + c(k)g
i
(x
i
(k + 1))]
+
, where
[ · ]
+
denotes the projection of its argument on R
p
+
. The
aforementioned representation resembles the structure of a
projected subgradient step, where g
i
(x
i
(k + 1)) constitutes
a subgradient of ϕ
i
(·) evaluated at `
i
(k), and c(k) plays
the role of the gradient step. Throughout the manuscript
we will use both representations according to convenience;
however, the proximal perspective in step 8 of Algorithm
1, and some of its related properties that will be shown in
the sequel, are crucial in the convergence analysis of the
proposed algorithm, enabling us to extend the approach of
[24] to the distributed case, and overcome the requirement
imposed in [21] for the coupling constraint to be known to
all agents (notice that in our set-up agent i needs to know
only its contribution g
i
(·) to the coupling constraint).
B. Structural and communication assumptions
We impose the following assumptions.
Assumption 1. [Convexity] For each i = 1, . . . , m, function
f
i
(·) : R
n
i
R and the components of g
i
(·) : R
n
i
R
p
are convex; moreover, set X
i
R
n
i
is convex too.
Assumption 2. [Compactness] For each i = 1, . . . , m, the
set X
i
R
n
i
is compact.
Note that, under Assumptions 1 and 2, kg
i
(x
i
)k is finite
for any x
i
X
i
. Therefore we have that kg
i
(x
i
)k G,
where G = max
i=1,...,m
max
x
i
X
i
kg
i
(x
i
)k.
Assumption 3. [Constraint qualification] Problem P sat-
isfies the Slater’s condition, i.e., there exists a vector ˜x =
[˜x
>
1
· · · ˜x
>
m
]
>
X and ρ R
+
with ρ 6= 0, such that
{x R
n
: kx ˜xk ρ} X and
P
m
i=1
g
i
(˜x
i
) < 0. An
equality in the latter condition is admitted only for those
components that are linear.
Assumptions 1-3 are sufficient conditions for strong dual-
ity to hold, and for an optimal primal-dual pair (x
?
, λ
?
) to
exists, where x
?
= [x
?
1
>
· · · x
?
m
>
]
>
. Moreover, for any optimal
pair (x
?
, λ
?
) the Saddle-Point Theorem [25] holds, i.e.,
L(x
?
, λ) L(x
?
, λ
?
) L(x, λ
?
), (8)
for any λ R
p
+
and any x X.
Denote by X
?
× Λ
?
= X
?
1
× · · · × X
?
m
× Λ
?
the set of
all optimal primal-dual pairs. Under Assumptions 1-3, it was
shown in [24] that the set of optimal dual solutions Λ
?
is
bounded, and hence sup
λΛ
?
kλk is finite.
We impose the following assumption on the time-varying
coefficient c(k).
Assumption 4. [Coefficient c(k)] Assume that for all k
0, c(k) R
+
\ {0} and {c(k)}
k0
is a non-increasing
sequence, i.e., c(k) c(r) for all k r, with r 0.
Moreover,
1)
P
k=0
c(k) = ,
2)
P
k=0
c(k)
2
< .
One possible choice for {c(k)}
k0
that satisfies Assump-
tion 4 is to take c(k) = β/(k + 1) for some β R
+
\ {0}.
This assumption is analogous to the one imposed by the
authors of [10], [7], [21].
In line with [26], [27], [28] we impose the following
assumptions on the information exchange between agents
and on the connectivity of the network.

Assumption 5. [Weight coefficients] There exists η (0, 1)
such that for all i, j {1, . . . , m} and all k 0, a
i
j
(k)
[0, 1), a
i
i
(k) η, and a
i
j
(k) > 0 implies that a
i
j
(k) η.
Moreover, for all k 0,
1)
P
m
j=1
a
i
j
(k) = 1 for all i = 1, . . . , m,
2)
P
m
i=1
a
i
j
(k) = 1 for all j = 1, . . . , m.
For each k 0 the information exchange between the m
agents can be represented by a directed graph (V, E
k
), where
the agents are the nodes V = {1, . . . , m}, and the set E
k
of
directed edges is defined as
E
k
=
(j, i) : a
i
j
(k) > 0
, (9)
i.e., at time k the link (j, i) is present if agent j sends
information to agent i and agent i weight this information
with a
i
j
(k). If the communication link is not present we set
a
i
j
(k) = 0, otherwise if a
i
j
(k) > 0 we say that j is a neighbor
of agent of i at time k. In Algorithm 1 at each iteration each
agent exchanges information with its neighbors only, thus
accounting for a fully distributed setup.
Let E
=
(j, i) : (j, i) E
k
for infinitely many k
denote the set of edges (j, i) that represent agent pairs that
communicate directly infinitely often. We then impose the
following connectivity and communication assumption.
Assumption 6. [Connectivity and communication] The
graph (V, E
) is strongly connected, i.e., for any two nodes
there exists a path of directed edges that connects them.
Moreover, there exists T 1 such that for every (j, i) E
,
agent i receives information from a neighboring agent j at
least once every consecutive T iterations.
For details about the interpretation of Assumptions 5 and
6, the reader is referred to [6], [10], [7].
C. Statement of the main results
Under Assumptions 1-6, Algorithm 1 converges and agents
reach consensus to a common vector of Lagrange multi-
pliers. In particular, their local estimates λ
i
(k) converge
to some optimal dual solution, while the vector ˆx(k) =
[ˆx
1
(k)
>
· · · ˆx
m
(k)
>
]
>
converges to the set of optimal pri-
mal solutions X
?
.
This is formally stated in the following theorems.
Theorem 1. [Dual Optimality] Consider Assumptions 1-6.
We have that, for some λ
?
Λ
?
,
lim
k→∞
kλ
i
(k) λ
?
k = 0, for all i = 1, . . . , m. (10)
Theorem 2. [Primal Optimality] Consider Assumptions 1-6.
We have that
lim
k→∞
dist(ˆx(k), X
?
) = 0, (11)
where dist(y, Z) = min
zZ
ky zk denotes the distance
between y and the set Z.
D. Sketch of the proof
The proofs of Theorems 1 and 2 are quite technical
and require the derivation of several intermediate results,
therefore they are omitted in the interest of space. In the
following we provide a sketch of the main idea behind their
proofs, while for more details the interested reader is referred
to [23].
Let v(k) =
1
m
P
m
i
λ
i
(k) be the average of all tentative
Lagrange multipliers at iteration k, and e
i
(k + 1) = λ
i
(k +
1) `
i
(k + 1) the consensus error. Assumptions 4-6 can
be exploited to prove certain relations among the quantities
kλ
i
(k+1)v(k+1)k, ke
i
(k+1)k, and c(k). Assumptions 1-3
allow us to embed the obtained results in an inequality that
relates consecutive terms of the sequence {
P
m
i
kλ
i
(k)
λ
?
k
2
}
k0
. Specifically, it can be shown that
m
X
i=1
kλ
i
(k + 1) λ
?
k
2
m
X
i=1
kλ
i
(k) λ
?
k
2
γ
1
m
X
i=1
ke
i
(k + 1)k
2
+ γ
2
c(k)
2
+ γ
3
c(k)
m
X
i=1
kλ
i
(k + 1) v(k + 1)k, (12)
where γ
1
, γ
2
, and γ
3
are appropriate positive constants, to-
gether with the fact that
P
k=0
c(k)
P
m
i=1
kλ
i
(k+1)v(k +
1)k < . Such a relationship can be exploited to show that
the consensus error vanishes, i.e. lim
k→∞
ke
i
(k)k = 0, and
that consensus is achieved, i.e. lim
k→∞
kλ
i
(k) v(k)k = 0,
for all i = 1, . . . , m. Furthermore, (12) can also be exploited
to show convergence of the {
P
m
i
kλ
i
(k) λ
?
k
2
}
k0
se-
quence due to the deterministic version of the supermartin-
gale convergence theorem [14] (Proposition 8.2.10, p. 489)
and the fact that the last two terms in the right hand side of
(12) are summable.
Once convergence has been proven it is sufficient to
show that there exist a subsequence for which the quantity
P
m
i
kλ
i
(k) λ
?
k
2
goes to 0 to prove Theorem 1. This
is established by showing that the sequence {λ
i
(k)}
k0
achieves the optimal value of the dual function across a
subsequence, and relies on Proposition 4 of [7].
Finally, the proof of Theorem 2 follows from [24], ex-
tending its derivations to deal with the considered distributed
context.
III. NUMERICAL EXAMPLE
In this section we present a numerical example to show
the validity of the proposed approach. Consider a network
of m = 8 agents connected as in Figure 1. For the sake
of simplicity we assumed that the network does not change
across iterations. Each agent i, i = 1, . . . , m, has n
i
= 2
optimization variables, a local objective function defined as
f
i
(x
i
) = ξ
>
i
x
i
, and its own constraint set X
i
= [5, 5] ×
[5, 5]. The coefficients ξ
1
, . . . , ξ
m
are independently ex-
tracted at random from a Gaussian probability distribution
with zero mean and covariance matrix equal to 25I
2
, I
2
being the identity matrix of order 2. Consider now a coupling
constraint given by
P
m
i=1
kx
i
k
2
b, where b = 25m. The

1
4
2
3
6
5
8
7
Fig. 1. Network of m = 8 agents.
resulting optimization program is given by
min
{x
i
X
i
}
m
i=1
m
X
i=1
f
i
(x
i
)
subject to:
m
X
i=1
kx
i
k
2
b.
(13)
To put (13) in the form of P it suffices to set
g
i
(x
i
) = kx
i
k
2
b
m
, (14)
for all i = 1, . . . , m, which is quadratic and convex. Since
problem (13) has a unique coupling constraint, there is just
one Lagrange multiplier λ R
+
.
We ran the Algorithm 1 for 1000 iterations. Figure 2 shows
the evolution of the agents’ estimates λ
i
(k), i = 1, . . . , m, of
the optimal value of λ (blue dotted lines) and their average
v(k) (orange solid line), as the algorithm progresses. By
Iteration
0 200 400 600 800 1000
Lagrange multipliers
0
0.5
1
1.5
2
2.5
3
3.5
4
Fig. 2. Evolution of the agents’ estimates λ
i
(k), i = 1, . . . , m, of the
vector λ (blue dotted lines), and their arithmetic average v(k) (orange solid
line). Red triangles represent the optimal dual solution.
Objective value
-500
-400
-300
-200
Iteration
0 200 400 600 800 1000
Constraint violation
0
50
100
150
Fig. 3. Evolution of primal objective
P
m
i=1
f
i
(x
i
) (upper plot) and
constraint violation k
P
m
i=1
g
i
(x
i
)k
(lower plot) as a function of x
i
(k)
(blue solid lines) and ˆx
i
(k) (orange dashed lines).
inspection of Figure 2, the average converge quite fast to the
optimal Lagrange multipliers of (13) (red triangles), whereas
all agents gradually reach consensus on those values.
Figure 3 shows the evolution of the primal objective value
P
m
i=1
f
i
(x
i
) (upper plot), and constraint violation in terms
of k
P
m
i=1
g
i
(x
i
)k
(lower plot), where x
i
is replaced by
two different sequences: x
i
(k) (blue solid lines), and ˆx
i
(k)
(orange dashed lines), where the latter is given by (7).
For the sake of completeness we also show in Figure 4
the evolution of the sequences {x
i
(k)}
k0
(solid lines), and
{ˆx
i
(k)}
k0
(dashed lines) for the first agent (i = 1); the
evolution for other agents is similar.
IV. CONCLUDING REMARKS
In this paper, a novel distributed algorithm to deal with
a class of convex optimization programs that exhibit a
separable structure was developed. We considered an iterative
Iteration
0 200 400 600 800 1000
Primal variables
-6
-4
-2
0
2
4
6
Agent 1
Fig. 4. Sequences {ˆx
1
(k)}
k0
(dashed lines) and {˜x
1
(k)}
k0
(solid
lines). Red triangles represent the optimal primal solution for agent 1.

Citations
More filters
Journal ArticleDOI
TL;DR: This work proposes a novel distributed algorithm to minimize the sum of the agents’ objective functions subject to both local and coupling constraints, where dual decomposition and proximal minimization are combined in an iterative scheme.

126 citations


Cites background from "Distributed constrained convex opti..."

  • ...A preliminary version of this work is given in [10]....

    [...]

Posted Content
TL;DR: In this paper, a distributed algorithm is proposed to minimize the sum of the agents' objective functions subject to both local and coupling constraints, where dual decomposition and proximal minimization are combined in an iterative scheme.
Abstract: We study distributed optimization in a cooperative multi-agent setting, where agents have to agree on the usage of shared resources and can communicate via a time-varying network to this purpose. Each agent has its own decision variables that should be set so as to minimize its individual objective function subject to local constraints. Resource sharing is modeled via coupling constraints that involve the non-positivity of the sum of agents' individual functions, each one depending on the decision variables of one single agent. We propose a novel distributed algorithm to minimize the sum of the agents' objective functions subject to both local and coupling constraints, where dual decomposition and proximal minimization are combined in an iterative scheme. Notably, privacy of information is guaranteed since only the dual optimization variables associated with the coupling constraints are exchanged by the agents. Under convexity assumptions, jointly with suitable connectivity properties of the communication network, we are able to prove that agents reach consensus to some optimal solution of the centralized dual problem counterpart, while primal variables converge to the set of optimizers of the centralized primal problem. The efficacy of the proposed approach is demonstrated on a plug-in electric vehicles charging problem.

40 citations

Journal ArticleDOI
TL;DR: A novel distributed control strategy for the optimal charging of a fleet of Electric Vehicles in case of limited overall capacity of the electrical distribution network based on duality, proximity, and consensus theory is proposed.

33 citations

Journal ArticleDOI
11 Oct 2017
TL;DR: An online algorithm is devised and it is shown that it achieves logarithmic bi-criteria competitive ratio and demonstrates the e↵ectiveness of the algorithm.
Abstract: This paper studies the problem of utilizing energy storage systems to perform demand-response in microgrids. The objective is to minimize the operational cost while balancing the supply-and-demand mismatch. The design space is to select and schedule a subset of heterogeneous storage devices that arrive online with di↵erent availabilities. Designing a performance-optimized solution is challenging due to the existence of mixed packing and covering constraints in a combinatorial problem, and the essential need for online design. We devise an online algorithm and show that it achieves logarithmic bi-criteria competitive ratio. Experimental results demonstrate the e↵ectiveness of our algorithm.

5 citations

Proceedings ArticleDOI
01 Dec 2018
TL;DR: This paper forms a general multi-user waterfilling-structured optimization problem including coupling constraints, and defines a low-complexity iterative distributed algorithm based on duality, consensus and fixed point mapping theory that shows its effectiveness.
Abstract: In this paper we present a distributed control approach for the multi-user multi-constrained waterfilling. This a specific category of distributed optimization for Networked Control Systems (NCSs), where agents aim at optimizing a non-separable global objective function while satisfying both local constraints and coupling constraints. Differently from the existing literature, in the considered setting we adopt a fully distributed mechanism where communication is allowed between neighbors only. First, we formulate a general multi-user waterfilling-structured optimization problem including coupling constraints, which may represent many engineering distributed control problems. Successively, we define a low-complexity iterative distributed algorithm based on duality, consensus and fixed point mapping theory. Finally, applying the technique to a simulated case referring to the electric vehicles optimal charging problem, we show its effectiveness.

5 citations

References
More filters
Book
23 May 2011
TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

17,433 citations


"Distributed constrained convex opti..." refers methods in this paper

  • ...To exploit the particular problem structure and alleviate these difficulties, dual decomposition techniques (see [11], and references therein), or approaches based on the alternating direction method of multipliers [12], are often employed, relying on the separable structure of the problem after dualizing the coupling constraint....

    [...]

Journal ArticleDOI
TL;DR: The authors' convergence rate results explicitly characterize the tradeoff between a desired accuracy of the generated approximate optimal solutions and the number of iterations needed to achieve the accuracy.
Abstract: We study a distributed computation model for optimizing a sum of convex objective functions corresponding to multiple agents. For solving this (not necessarily smooth) optimization problem, we consider a subgradient method that is distributed among the agents. The method involves every agent minimizing his/her own objective function while exchanging information locally with other agents in the network over a time-varying topology. We provide convergence results and convergence rate estimates for the subgradient method. Our convergence rate results explicitly characterize the tradeoff between a desired accuracy of the generated approximate optimal solutions and the number of iterations needed to achieve the accuracy.

3,238 citations


"Distributed constrained convex opti..." refers background in this paper

  • ...In particular, in [6], [7], [8], [9] a gradient/subgradient based consensus approach is followed to address problems where agents with their own objective functions and constraints are coupled via a common decision vector....

    [...]

  • ...For details about the interpretation of Assumptions 5 and 6, the reader is referred to [6], [10], [7]....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors present a distributed algorithm that can be used by multiple agents to align their estimates with a particular value over a network with time-varying connectivity.
Abstract: We present distributed algorithms that can be used by multiple agents to align their estimates with a particular value over a network with time-varying connectivity. Our framework is general in that this value can represent a consensus value among multiple agents or an optimal solution of an optimization problem, where the global objective function is a combination of local agent objective functions. Our main focus is on constrained problems where the estimates of each agent are restricted to lie in different convex sets. To highlight the effects of constraints, we first consider a constrained consensus problem and present a distributed "projected consensus algorithm" in which agents combine their local averaging operation with projection on their individual constraint sets. This algorithm can be viewed as a version of an alternating projection method with weights that are varying over time and across agents. We establish convergence and convergence rate results for the projected consensus algorithm. We next study a constrained optimization problem for optimizing the sum of local objective functions of the agents subject to the intersection of their local constraint sets. We present a distributed "projected subgradient algorithm" which involves each agent performing a local averaging operation, taking a subgradient step to minimize its own objective function, and projecting on its constraint set. We show that, with an appropriately selected stepsize rule, the agent estimates generated by this algorithm converge to the same optimal solution for the cases when the weights are constant and equal, and when the weights are time-varying but all agents have the same constraint set.

1,773 citations

Journal ArticleDOI
TL;DR: A model for asynchronous distributed computation is presented and it is shown that natural asynchronous distributed versions of a large class of deterministic and stochastic gradient-like algorithms retain the desirable convergence properties of their centralized counterparts.
Abstract: We present a model for asynchronous distributed computation and then proceed to analyze the convergence of natural asynchronous distributed versions of a large class of deterministic and stochastic gradient-like algorithms. We show that such algorithms retain the desirable convergence properties of their centralized counterparts, provided that the time between consecutive interprocessor communications and the communication delays are not too large.

1,761 citations

Frequently Asked Questions (7)
Q1. What contributions have the authors mentioned in the paper "Distributed constrained convex optimization and consensus via dual decomposition and proximal minimization" ?

The authors consider a general class of convex optimization problems over time-varying, multi-agent networks, that naturally arise in many application domains like energy systems and wireless networks. The authors propose a novel distributed algorithm to deal with such problems based on a combination of dual decomposition and proximal minimization. Their approach is based on an iterative scheme that enables agents to reach consensus with respect to the dual variables, while preserving information privacy. The authors show convergence of the proposed algorithm to some optimal dual solution of the centralized problem counterpart, while the primal iterates generated by the algorithm converge to the set of optimal primal solutions. A numerical example demonstrating the efficacy of the proposed algorithm is also provided. 

Coefficient aij(k) is the weight that agent i attributes to the solution of agent j at iteration k; aij(k) = 0 means that agent j does not send any information to agent i at iteration k. 

Consider a time-varying network of m agents that communicate to solve the following optimization programP : min {xi∈Xi}mi=1 m∑ i=1 fi(xi)subject to: m∑ i=1 gi(xi) ≤ 0, (1)where for each i = 1, . . . ,m, xi ∈ Rni is the vector of ni decision variables of agent i, fi(·) : Rni → 

Notice that, due to the separable structure of the objective and the constraint functions in P ,ϕ(λ) = m∑ i=1 ϕi(λ) = m∑ i=1 min xi∈Xi Li(xi, λ). 

It is perhaps worth noticing that this setup comprises also equality coupling constraints like∑mi=1 g̃i(xi) = 0. To this purpose it is enough to define gi = [g̃ > i −g̃>i ]>. 

Such an auxiliary sequence is referred to as primal recovery procedure and it is often used in dual decomposition methods, since it has better convergence properties compared to {xi(k)}k≥0 [24], [22], [21]. 

To account for information privacy and facilitate the development of a computationally tractable solution, the authors seek for a distributed strategy.