From Local SGD to Local Fixed Point Methods for Federated Learning

Open AccessProceedings Article

From Local SGD to Local Fixed Point Methods for Federated Learning

Grigory Malinovskiy, +4 more

- Vol. 1, pp 6692-6701

Chats0

TLDR

This work considers the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting, and investigates two strategies to achieve such a consensus: one based on a fixed number of local steps, and the other based on randomized computations.

Abstract:

Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed-point algorithms. In this work we consider the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting. Our work is motivated by the needs of federated learning. In this context, each local operator models the computations done locally on a mobile device. We investigate two strategies to achieve such a consensus: one based on a fixed number of local steps, and the other based on randomized computations. In both cases, the goal is to limit communication of the locally-computed variables, which is often the bottleneck in distributed frameworks. We perform convergence analysis of both methods and conduct a number of experiments highlighting the benefits of our approach.

Citations

PDF

Open Access

More filters

Proceedings Article

ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!

K.N. Mishchenko, +3 more

TL;DR: ProxSkip is a surprisingly simple and provably provably effective method for minimizing the sum of a smooth and an expensive nonsmooth proximable function and offers an effec-tive acceleration of communication complexity.

...read moreread less

Proceedings ArticleDOI

Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning

Lin Zhang, +4 more

TL;DR: A data-free knowledge distillation method to fine-tune the global model in the server (FedFTG), which relieves the issue of direct model aggregation and develops customized label sampling and class-level ensemble to derive maximum utilization of knowledge.

...read moreread less

Proceedings Article

Towards Understanding Biased Client Selection in Federated Learning

Yae Jee Cho, +2 more

TL;DR: This work presents the convergence analysis of federated learning with biased client selection and quantifies how the bias affects convergence speed, and proposes Power-of-Choice, a communication- and computation-eﬃcient client selection framework that spans the trade-oﬀ between convergence speed and solution bias.

...read moreread less

Posted Content

On the Outsized Importance of Learning Rates in Local Update Methods

Zachary Charles, +1 more

- 02 Jul 2020 -

arXiv: Learning

TL;DR: This work proves that for quadratic objectives, local update methods perform stochastic gradient descent on a surrogate loss function which it exactly characterize, and uses this theory to derive novel convergence rates for federated averaging that showcase this trade-off between the condition number of the surrogate loss and its alignment with the true loss function.

...read moreread less

Posted Content

A Field Guide to Federated Optimization

Jianyu Wang, +52 more

- 14 Jul 2021 -

arXiv: Learning

TL;DR: In this article, a general consensus about the need for a guide about federated optimization is reached at the Workshop on Federated Learning and Analytics, virtually held June 29-30th, 2020.

...read moreread less