Distributed delayed stochastic optimization
Alekh Agarwal,John C. Duchi +1 more
- pp 5451-5452
TLDR
In this paper, the authors analyzed the convergence of gradient-based distributed optimization algorithms that base their updates on delayed stochastic gradient information and showed that the delay is asymptotically negligible.Abstract:
We analyze the convergence of gradient-based optimization algorithms that base their updates on delayed stochastic gradient information. The main application of our results is to gradient-based distributed optimization algorithms where a master node performs parameter updates while worker nodes compute stochastic gradients based on local information in parallel, which may give rise to delays due to asynchrony. We take motivation from statistical problems where the size of the data is so large that it cannot fit on one computer; with the advent of huge datasets in biology, astronomy, and the internet, such problems are now common. Our main contribution is to show that for smooth stochastic problems, the delays are asymptotically negligible and we can achieve order-optimal convergence results. We show n-node architectures whose optimization error in stochastic problems—in spite of asynchronous delays—scales asymptotically as O(1/√nT) after T iterations. This rate is known to be optimal for a distributed system with n nodes even in the absence of delays. We additionally complement our theoretical results with numerical experiments on a logistic regression task.read more
Citations
More filters
Book
Adaptation, Learning, and Optimization Over Networks
TL;DR: The limits of performance of distributed solutions are examined and procedures that help bring forth their potential more fully are discussed and a useful statistical framework is adopted and performance results that elucidate the mean-square stability, convergence, and steady-state behavior of the learning networks are derived.
Proceedings Article
More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server
Qirong Ho,James Cipar,Henggang Cui,Seunghak Lee,Jin-Kyu Kim,Phillip B. Gibbons,Garth A. Gibson,Greg Ganger,Eric P. Xing +8 more
TL;DR: A parameter server system for distributed ML, which follows a Stale Synchronous Parallel (SSP) model of computation that maximizes the time computational workers spend doing useful work on ML algorithms, while still providing correctness guarantees.
Journal ArticleDOI
Federated Learning With Differential Privacy: Algorithms and Performance Analysis
Kang Wei,Jun Li,Ming Ding,Chuan Ma,Howard H. Yang,Farhad Farokhi,Shi Jin,Tony Q. S. Quek,H. Vincent Poor +8 more
TL;DR: Wang et al. as mentioned in this paper proposed a novel framework based on the concept of differential privacy, in which artificial noise is added to parameters at the clients' side before aggregating, namely, noising before model aggregation FL (NbAFL).
Proceedings Article
Communication Efficient Distributed Machine Learning with the Parameter Server
TL;DR: An in-depth analysis of two large scale machine learning problems ranging from l1 -regularized logistic regression on CPUs to reconstruction ICA on GPUs, using 636TB of real data with hundreds of billions of samples and dimensions is presented.
Journal ArticleDOI
Petuum: A New Platform for Distributed Machine Learning on Big Data
Eric P. Xing,Qirong Ho,Wei Dai,Jin-Kyu Kim,Jinliang Wei,Seunghak Lee,Xun Zheng,Pengtao Xie,Abhimanu Kumar,Yaoliang Yu +9 more
TL;DR: This work proposes a general-purpose framework, Petuum, that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions.
References
More filters
Journal ArticleDOI
A Stochastic Approximation Method
Herbert Robbins,Sutton Monro +1 more
TL;DR: In this article, a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability is presented.
Book
Parallel and Distributed Computation: Numerical Methods
TL;DR: This work discusses parallel and distributed architectures, complexity measures, and communication and synchronization issues, and it presents both Jacobi and Gauss-Seidel iterations, which serve as algorithms of reference for many of the computational approaches addressed later.
Journal ArticleDOI
Distributed Subgradient Methods for Multi-Agent Optimization
Angelia Nedic,Asuman Ozdaglar +1 more
TL;DR: The authors' convergence rate results explicitly characterize the tradeoff between a desired accuracy of the generated approximate optimal solutions and the number of iterations needed to achieve the accuracy.
Journal ArticleDOI
RCV1: A New Benchmark Collection for Text Categorization Research
TL;DR: This work describes the coding policy and quality control procedures used in producing the RCV1 data, the intended semantics of the hierarchical category taxonomies, and the corrections necessary to remove errorful data.
Book
Problem complexity and method efficiency in optimization
TL;DR: In this article, problem complexity and method efficiency in optimisation are discussed in terms of problem complexity, method efficiency, and method complexity in the context of OO optimization, respectively.