scispace - formally typeset
Open AccessProceedings Article

From Local SGD to Local Fixed Point Methods for Federated Learning

Reads0
Chats0
TLDR
This work considers the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting, and investigates two strategies to achieve such a consensus: one based on a fixed number of local steps, and the other based on randomized computations.
Abstract
Most algorithms for solving optimization problems or finding saddle points of convex-concave functions are fixed-point algorithms. In this work we consider the generic problem of finding a fixed point of an average of operators, or an approximation thereof, in a distributed setting. Our work is motivated by the needs of federated learning. In this context, each local operator models the computations done locally on a mobile device. We investigate two strategies to achieve such a consensus: one based on a fixed number of local steps, and the other based on randomized computations. In both cases, the goal is to limit communication of the locally-computed variables, which is often the bottleneck in distributed frameworks. We perform convergence analysis of both methods and conduct a number of experiments highlighting the benefits of our approach.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article

ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!

TL;DR: ProxSkip is a surprisingly simple and provably provably effective method for minimizing the sum of a smooth and an expensive nonsmooth proximable function and offers an effec-tive acceleration of communication complexity.
Proceedings ArticleDOI

Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning

TL;DR: A data-free knowledge distillation method to fine-tune the global model in the server (FedFTG), which relieves the issue of direct model aggregation and develops customized label sampling and class-level ensemble to derive maximum utilization of knowledge.
Proceedings Article

Towards Understanding Biased Client Selection in Federated Learning

TL;DR: This work presents the convergence analysis of federated learning with biased client selection and quantifies how the bias affects convergence speed, and proposes Power-of-Choice, a communication- and computation-efficient client selection framework that spans the trade-off between convergence speed and solution bias.
Posted Content

On the Outsized Importance of Learning Rates in Local Update Methods

TL;DR: This work proves that for quadratic objectives, local update methods perform stochastic gradient descent on a surrogate loss function which it exactly characterize, and uses this theory to derive novel convergence rates for federated averaging that showcase this trade-off between the condition number of the surrogate loss and its alignment with the true loss function.
Related Papers (5)