Impact of Communication Delays on the Convergence Rate of Distributed Optimization Algorithms

Home
/
Papers
/
Impact of Communication Delays on the Convergence Rate of Distributed Optimization Algorithms

Posted Content•

Impact of Communication Delays on the Convergence Rate of Distributed Optimization Algorithms

Thinh T. Doan, Carolyn L. Beck, R. Srikant

10 Aug 2017-

TL;DR: This paper provides convergence results and convergence rate estimates of the gradient-based consensus algorithm in the presence of uniform, but possibly arbitrarily large, communication delays between the nodes.

read less

Abstract: In this paper, we study distributed optimization problems over a network of nodes, where the goal is to optimize a global objective composed of a sum of local functions. For solving such optimization problems, we are interested in a popular distributed gradient-based consensus algorithm, which only requires local computation and communication. A significant challenge in this area is to analyze the convergence rate of such algorithms in the presence of communication delays that are inevitable in distributed systems. We provide convergence results and convergence rate estimates of the gradient-based consensus algorithm in the presence of uniform, but possibly arbitrarily large, communication delays between the nodes. Our results explicitly characterize the rate of convergence of the algorithm as a function of the network size, topology, and the inter-node communication delays.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Achieving Linear Convergence in Distributed Asynchronous Multiagent Optimization

[...]

Ye Tian¹, Ying Sun¹, Gesualdo Scutari¹•Institutions (1)

Purdue University¹

03 Mar 2020-IEEE Transactions on Automatic Control

TL;DR: This article proposes a general distributed asynchronous algorithmic framework whereby agents can update their local variables as well as communicate with their neighbors at any time, without any form of coordination, and proves that this is the first distributed algorithm with provable geometric convergence rate in such a general asynchronous setting.

...read moreread less

Abstract: This article studies multiagent (convex and nonconvex ) optimization over static digraphs. We propose a general distributed asynchronous algorithmic framework whereby 1) agents can update their local variables as well as communicate with their neighbors at any time, without any form of coordination; and 2) they can perform their local computations using (possibly) delayed, out-of-sync information from the other agents. Delays need not be known to the agent or obey any specific profile, and can also be time-varying (but bounded). The algorithm builds on a tracking mechanism that is robust against asynchrony (in the above sense), whose goal is to estimate locally the average of agents’ gradients. When applied to strongly convex functions, we prove that it converges at an R-linear (geometric) rate as long as the step-size is sufficiently small. A sublinear convergence rate is proved, when nonconvex problems and/or diminishing, uncoordinated step-sizes are considered. To the best of our knowledge, this is the first distributed algorithm with provable geometric convergence rate in such a general asynchronous setting. Preliminary numerical results demonstrate the efficacy of the proposed algorithm and validate our theoretical findings.

...read moreread less

68 citations

Proceedings Article•DOI•

The Role of Network Topology for Distributed Machine Learning

[...]

Giovanni Neglia¹, Gianmarco Calbi¹, Don Towsley², Gayane Vardoyan²•Institutions (2)

French Institute for Research in Computer Science and Automation¹, University of Massachusetts Amherst²

29 Apr 2019

TL;DR: It is shown that, even when communication overhead may be neglected, the clique is not necessarily the most effective topology, as commonly assumed in previous works.

...read moreread less

Abstract: Many learning problems are formulated as minimization of some loss function on a training set of examples. Distributed gradient methods on a cluster are often used for this purpose. In this paper, we study how the variability of task execution times at cluster nodes affects the system throughput. In particular, a simple but accurate model allows us to quantity how the time to solve the minimization problem depends on the network of information exchanges among the nodes. Interestingly, we show that, even when communication overhead may be neglected, the clique is not necessarily the most effective topology, as commonly assumed in previous works.

...read moreread less

43 citations

Cites background from "Impact of Communication Delays on t..."

...Recently, [22] has studied the effect of a communication delay on the convergence rate of distributed gradient methods...
[...]

Proceedings Article•DOI•

ASY-SONATA: Achieving Linear Convergence in Distributed Asynchronous Multiagent Optimization

[...]

Ye Tian¹, Ying Sun¹, Gesualdo Scutari¹•Institutions (1)

Purdue University¹

01 Oct 2018

TL;DR: This paper proves that this is the first distributed algorithm with provable geometric convergence rate in such a general asynchonous setting, when applied to strongly convex functions.

...read moreread less

Abstract: This papers studies multi-agent (convex and nonconvex) optimization over static digraphs. We propose a general distributed asynchronous algorithmic framework whereby i) agents can update their local variables as well as communicate with their neighbors at any time, without any form of coordination; and ii) they can perform their local computations using (possibly) delayed, out-of-sync information from their neighbors. Delays need not be known to the agents or obey any specific profile, and can also be time-varying (but bounded). The algorithm builds on a tracking mechanism that is robust against asynchrony (in the above sense), whose goal is to estimate locally the sum of agents’ gradients. When applied to strongly convex functions, we prove that it converges at an R-linear (geometric) rate as long as the step-size is sufficiently small. A sublinear convergence rate is proved, when nonconvex problems and/or diminishing, uncoordinated step-sizes are employed. To the best of our knowledge, this is the first distributed algorithm with provable geometric convergence rate in such a general asynchonous setting.

...read moreread less

27 citations

Cites background or methods from "Impact of Communication Delays on t..."

...They study the impact of delayed gradient information [22], [23] or communication delays (fixed [24], uniform [23], [27] or time-varying [25], [26]) on the convergence rate of distributed gradient (proximal [22], [23] or projection-based [26], [27]) algorithms or dual-averaging distributed-based schemes [24], [25]....
[...]
...(b) Synchronous activations and delays [22]–[27]: These schemes considered distributed constrained convex optimization over undirected graphs....
[...]
...Distributed methods exploring (some form of) asynchrony over networks with no centralized node have been studied in [2]–[8], [17]–[27]....
[...]

Posted Content•

ASY-SONATA: Achieving Geometric Convergence for Distributed Asynchronous Optimization.

[...]

Ye Tian, Ying Sun, Bin Du, Gesualdo Scutari

28 Mar 2018

TL;DR: A sublinear convergence rate is proved, when nonconvex problems and/or diminishing, uncoordinated step-sizes are considered, and it is proved that it converges at an R-linear (geometric) rate as long as the step-size is sufficiently small.

...read moreread less

Abstract: Can one obtain a geometrically convergent algorithm for distributed asynchronous multi-agent optimization? This paper provides a positive answer to this open question. The proposed algorithm solves multi-agent (convex and nonconvex) optimization over static digraphs and it is asynchronous, in the following sense: i) agents can update their local variables as well as communicate with their neighbors at any time, without any form of coordination; and ii) they can perform their local computations using (possibly) delayed, out-of-sync information from the other agents. Delays need not obey any specific profile, and can also be time-varying (but bounded). The algorithm builds on a tracking mechanism that is robust against asynchrony (in the above sense), whose goal is to estimate locally the average of agents' gradients. When applied to strongly convex functions, we prove that it converges at an R-linear (geometric) rate as long as the step-size is sufficiently small. A sublinear convergence rate is proved, when nonconvex problems and/or diminishing, uncoordinated step-sizes are considered. Preliminary numerical results demonstrate the efficacy of the proposed algorithm and validate our theoretical findings.

...read moreread less

19 citations

Cites background from "Impact of Communication Delays on t..."

...(b) Synchronous activations and delays [17]–[22]: These schemes consider synchronous agents’ activations/updates, subject to fixed computation delays (agents can use their outdated gradient information) [17], [18] or communication delays–fixed [19], [22] or time-varying [20], [21]....
[...]

Posted Content•

Achieving Linear Convergence in Distributed Asynchronous Multi-agent Optimization

[...]

Ye Tian, Ying Sun, Gesualdo Scutari

28 Mar 2018-arXiv: Optimization and Control

TL;DR: In this article, the authors proposed a general distributed algorithm with provable geometric convergence rate in such a general asynchronous setting, where agents can update their local variables as well as communicate with their neighbors at any time, without any form of coordination.

...read moreread less

Abstract: This papers studies multi-agent (convex and \emph{nonconvex}) optimization over static digraphs. We propose a general distributed \emph{asynchronous} algorithmic framework whereby i) agents can update their local variables as well as communicate with their neighbors at any time, without any form of coordination; and ii) they can perform their local computations using (possibly) delayed, out-of-sync information from the other agents. Delays need not be known to the agent or obey any specific profile, and can also be time-varying (but bounded). The algorithm builds on a tracking mechanism that is robust against asynchrony (in the above sense), whose goal is to estimate locally the average of agents' gradients. When applied to strongly convex functions, we prove that it converges at an R-linear (geometric) rate as long as the step-size is {sufficiently small}. A sublinear convergence rate is proved, when nonconvex problems and/or diminishing, {\it uncoordinated} step-sizes are considered. To the best of our knowledge, this is the first distributed algorithm with provable geometric convergence rate in such a general asynchronous setting. Preliminary numerical results demonstrate the efficacy of the proposed algorithm and validate our theoretical findings.

...read moreread less

17 citations

References

PDF

Open Access

More filters

Book•

Matrix Analysis

[...]

Roger A. Horn¹, Charles R. Johnson²•Institutions (2)

Johns Hopkins University¹, Clemson University²

01 Jan 1985

TL;DR: In this article, the authors present results of both classic and recent matrix analyses using canonical forms as a unifying theme, and demonstrate their importance in a variety of applications, such as linear algebra and matrix theory.

...read moreread less

Abstract: Linear algebra and matrix theory are fundamental tools in mathematical and physical science, as well as fertile fields for research. This new edition of the acclaimed text presents results of both classic and recent matrix analyses using canonical forms as a unifying theme, and demonstrates their importance in a variety of applications. The authors have thoroughly revised, updated, and expanded on the first edition. The book opens with an extended summary of useful concepts and facts and includes numerous new topics and features, such as: - New sections on the singular value and CS decompositions - New applications of the Jordan canonical form - A new section on the Weyr canonical form - Expanded treatments of inverse problems and of block matrices - A central role for the Von Neumann trace theorem - A new appendix with a modern list of canonical forms for a pair of Hermitian matrices and for a symmetric-skew symmetric pair - Expanded index with more than 3,500 entries for easy reference - More than 1,100 problems and exercises, many with hints, to reinforce understanding and develop auxiliary themes such as finite-dimensional quantum systems, the compound and adjugate matrices, and the Loewner ellipsoid - A new appendix provides a collection of problem-solving hints.

...read moreread less

23,986 citations

Book•

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

[...]

Stephen Boyd¹, Neal Parikh¹, Eric Chu¹, Borja Peleato¹, Jonathan Eckstein² - Show less +1 more•Institutions (2)

Stanford University¹, Rutgers University²

23 May 2011

TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.

...read moreread less

Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

...read moreread less

17,433 citations

"Impact of Communication Delays on t..." refers background in this paper

..., an exponential convergence rate under assumptions of strong convexity and smoothness of objective functions; see for example the work in [3, 15, 16, 27, 34]....
[...]

Journal Article•DOI•

Algebraic connectivity of graphs

[...]

Miroslav Fiedler

01 Jan 1973-Czechoslovak Mathematical Journal

3,888 citations

Book•

Understanding Machine Learning: From Theory To Algorithms

[...]

Shai Shalev-Shwartz¹, Shai Ben-David²•Institutions (2)

Hebrew University of Jerusalem¹, University of Waterloo²

01 Jan 2015

TL;DR: The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way in an advanced undergraduate or beginning graduate course.

...read moreread less

Abstract: Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides an extensive theoretical account of the fundamental ideas underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics of the field, the book covers a wide array of central topics that have not been addressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for an advanced undergraduate or beginning graduate course, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics, and engineering.

...read moreread less

3,857 citations

"Impact of Communication Delays on t..." refers methods in this paper

...5 SIMULATIONS In this section, we apply the distributed gradient algorithm to study the well-known linear regression problem in statistical machine learning, which is the most popular technique for data fiing [12, 26]....
[...]

Book•

Introductory Lectures on Convex Optimization: A Basic Course

[...]

I︠u︡. E. Nesterov

14 Jan 2014

TL;DR: A polynomial-time interior-point method for linear optimization was proposed in this paper, where the complexity bound was not only in its complexity, but also in the theoretical pre- diction of its high efficiency was supported by excellent computational results.

...read moreread less

Abstract: It was in the middle of the 1980s, when the seminal paper by Kar- markar opened a new epoch in nonlinear optimization The importance of this paper, containing a new polynomial-time algorithm for linear op- timization problems, was not only in its complexity bound At that time, the most surprising feature of this algorithm was that the theoretical pre- diction of its high efficiency was supported by excellent computational results This unusual fact dramatically changed the style and direc- tions of the research in nonlinear optimization Thereafter it became more and more common that the new methods were provided with a complexity analysis, which was considered a better justification of their efficiency than computational experiments In a new rapidly develop- ing field, which got the name "polynomial-time interior-point methods", such a justification was obligatory Afteralmost fifteen years of intensive research, the main results of this development started to appear in monographs [12, 14, 16, 17, 18, 19] Approximately at that time the author was asked to prepare a new course on nonlinear optimization for graduate students The idea was to create a course which would reflect the new developments in the field Actually, this was a major challenge At the time only the theory of interior-point methods for linear optimization was polished enough to be explained to students The general theory of self-concordant functions had appeared in print only once in the form of research monograph [12]

...read moreread less

3,372 citations