scispace - formally typeset
Open AccessProceedings ArticleDOI

Distributed delayed stochastic optimization

TLDR
In this paper, the authors analyzed the convergence of gradient-based distributed optimization algorithms that base their updates on delayed stochastic gradient information and showed that the delay is asymptotically negligible.
Abstract
We analyze the convergence of gradient-based optimization algorithms that base their updates on delayed stochastic gradient information. The main application of our results is to gradient-based distributed optimization algorithms where a master node performs parameter updates while worker nodes compute stochastic gradients based on local information in parallel, which may give rise to delays due to asynchrony. We take motivation from statistical problems where the size of the data is so large that it cannot fit on one computer; with the advent of huge datasets in biology, astronomy, and the internet, such problems are now common. Our main contribution is to show that for smooth stochastic problems, the delays are asymptotically negligible and we can achieve order-optimal convergence results. We show n-node architectures whose optimization error in stochastic problems—in spite of asynchronous delays—scales asymptotically as O(1/√nT) after T iterations. This rate is known to be optimal for a distributed system with n nodes even in the absence of delays. We additionally complement our theoretical results with numerical experiments on a logistic regression task.

read more

Citations
More filters
Book

Adaptation, Learning, and Optimization Over Networks

TL;DR: The limits of performance of distributed solutions are examined and procedures that help bring forth their potential more fully are discussed and a useful statistical framework is adopted and performance results that elucidate the mean-square stability, convergence, and steady-state behavior of the learning networks are derived.
Proceedings Article

More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server

TL;DR: A parameter server system for distributed ML, which follows a Stale Synchronous Parallel (SSP) model of computation that maximizes the time computational workers spend doing useful work on ML algorithms, while still providing correctness guarantees.
Journal ArticleDOI

Federated Learning With Differential Privacy: Algorithms and Performance Analysis

TL;DR: Wang et al. as mentioned in this paper proposed a novel framework based on the concept of differential privacy, in which artificial noise is added to parameters at the clients' side before aggregating, namely, noising before model aggregation FL (NbAFL).
Proceedings Article

Communication Efficient Distributed Machine Learning with the Parameter Server

TL;DR: An in-depth analysis of two large scale machine learning problems ranging from l1 -regularized logistic regression on CPUs to reconstruction ICA on GPUs, using 636TB of real data with hundreds of billions of samples and dimensions is presented.
Journal ArticleDOI

Petuum: A New Platform for Distributed Machine Learning on Big Data

TL;DR: This work proposes a general-purpose framework, Petuum, that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions.
References
More filters
Journal ArticleDOI

A Stochastic Approximation Method

TL;DR: In this article, a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability is presented.
Book

Parallel and Distributed Computation: Numerical Methods

TL;DR: This work discusses parallel and distributed architectures, complexity measures, and communication and synchronization issues, and it presents both Jacobi and Gauss-Seidel iterations, which serve as algorithms of reference for many of the computational approaches addressed later.
Journal ArticleDOI

Distributed Subgradient Methods for Multi-Agent Optimization

TL;DR: The authors' convergence rate results explicitly characterize the tradeoff between a desired accuracy of the generated approximate optimal solutions and the number of iterations needed to achieve the accuracy.
Journal ArticleDOI

RCV1: A New Benchmark Collection for Text Categorization Research

TL;DR: This work describes the coding policy and quality control procedures used in producing the RCV1 data, the intended semantics of the hierarchical category taxonomies, and the corrections necessary to remove errorful data.
Book

Problem complexity and method efficiency in optimization

TL;DR: In this article, problem complexity and method efficiency in optimisation are discussed in terms of problem complexity, method efficiency, and method complexity in the context of OO optimization, respectively.
Related Papers (5)