Federated Optimization Algorithms with Random Reshuffling and Gradient Compression

doi:10.48550/arXiv.2206.07021

Journal ArticleDOI

Federated Optimization Algorithms with Random Reshuffling and Gradient Compression

Abdurakhmon Sadiev, +6 more

- 14 Jun 2022 -

arXiv.org

- Vol. abs/2206.07021

Chats0

TLDR

This work develops a distributed variant of random reshuffling with gradient compression (Q-RR), and shows how to reduce the variance coming from gradient quantization through the use of control iterates, and proposes a variant of Q-RR called Q-NASTYA to have a better fit to Federated Learning applications.

Abstract:

Gradient compression is a popular technique for improving communication complexity of stochastic first-order methods in distributed training of machine learning models. However, the existing works consider only with-replacement sampling of stochastic gradients. In contrast, it is well-known in practice and recently confirmed in theory that stochastic methods based on without-replacement sampling, e.g., Random Reshuffling (RR) method, perform better than ones that sample the gradients with-replacement. In this work, we close this gap in the literature and provide the first analysis of methods with gradient compression and without-replacement sampling. We first develop a distributed variant of random reshuffling with gradient compression (Q-RR), and show how to reduce the variance coming from gradient quantization through the use of control iterates. Next, to have a better fit to Federated Learning applications, we incorporate local computation and propose a variant of Q-RR called Q-NASTYA. Q-NASTYA uses local gradient steps and different local and global stepsizes. Next, we show how to reduce compression variance in this setting as well. Finally, we prove the convergence results for the proposed methods and outline several settings in which they improve upon existing algorithms.

Federated Optimization Algorithms with Random Reshuffling and Gradient Compression

Citations

Provably Doubly Accelerated Federated Learning: The First Theoretically Successful Combination of Local Training and Compressed Communication

FedVQCS: Federated Learning via Vector Quantized Compressed Sensing

Federated Learning with Regularized Client Participation

CD-GraB: Coordinating Distributed Example Orders for Provably Accelerated Training

Improving Accelerated Federated Learning with Compression and Importance Sampling

References

Deep Residual Learning for Image Recognition

LIBSVM: A library for support vector machines

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Communication-Efficient Learning of Deep Networks from Decentralized Data

Trending Questions (1)