Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking

Open AccessProceedings Article

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking

Haoran Sun, +2 more

- Vol. 1, pp 9217-9228

Chats0

TLDR

This work proposes an algorithm named D-GET (decentralized gradient estimation and tracking), which jointly performs decentralized gradient estimation (which estimates the local gradient using a subset of local samples) and gradient tracking (which tracks the global full gradient using local estimates).

Abstract:

Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform stateof-the-art centralized algorithms, in applications involving highly non-convex problems, such as training deep neural networks. In this work, we propose a decentralized stochastic algorithm to deal with certain smooth non-convex problems where there are m nodes in the system, and each node has a large number of samples (denoted as n). Differently from the majority of the existing decentralized learning algorithms for either stochastic or finite-sum problems, our focus is given to both reducing the total communication rounds among the nodes, while accessing the minimum number of local data samples. In particular, we propose an algorithm named D-GET (decentralized gradient estimation and tracking), which jointly performs decentralized gradient estimation (which estimates the local gradient using a subset of local samples) and gradient tracking (which tracks the global full gradient using local estimates). We show that, to achieve certain ✏ stationary solution of the deterministic finite sum problem, the proposed algorithm achieves an O(mn1/2✏ 1) sample complexity and an O(✏ 1) communication complexity. These bounds significantly improve upon the best existing bounds of O(mn✏ 1) and O(✏ 1), respectively. Similarly, for online problems, the proposed method achieves an O(m✏ 3/2) sample complexity and an O(✏ 1) communication complexity. Department of ECE, University of Minnesota Twin Cities, Minneapolis, MN USA IBM Research AI, IBM Thomas J. Watson Research Center, Yorktown Heights, NY USA. Correspondence to: Haoran Sun <sun00111@umn.edu>, Songtao Lu <songtao@ibm.com>, Mingyi Hong <mhong@umn.edu>. Proceedings of the 37 th International Conference on Machine Learning, Online, PMLR 119, 2020. Copyright 2020 by the author(s).

Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking

Citations

BRIDGE: Byzantine-resilient Decentralized Gradient Descent

A General Framework for Decentralized Optimization With First-Order Methods

A general framework for decentralized optimization with first-order methods

Distributed Learning Systems with First-Order Methods

A fast randomized incremental gradient method for decentralized non-convex optimization

References

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

Parallel and Distributed Computation: Numerical Methods

Distributed Subgradient Methods for Multi-Agent Optimization

Federated Learning: Strategies for Improving Communication Efficiency

Fast linear iterations for distributed averaging

Related Papers (5)

EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs

Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization