FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning

Open AccessPosted Content

FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning

Haoyu Zhao, +2 more

- 10 Aug 2021 -

arXiv: Learning

Chats0

TLDR

In this paper, the authors proposed a new federated learning algorithm, FedPAGE, able to further reduce the communication complexity by utilizing the recent optimal PAGE method (Li et al., 2021) instead of plain SGD in FedAvg.

Abstract:

Federated Averaging (FedAvg, also known as Local-SGD) (McMahan et al., 2017) is a classical federated learning algorithm in which clients run multiple local SGD steps before communicating their update to an orchestrating server. We propose a new federated learning algorithm, FedPAGE, able to further reduce the communication complexity by utilizing the recent optimal PAGE method (Li et al., 2021) instead of plain SGD in FedAvg. We show that FedPAGE uses much fewer communication rounds than previous local methods for both federated convex and nonconvex optimization. Concretely, 1) in the convex setting, the number of communication rounds of FedPAGE is $O(\frac{N^{3/4}}{S\epsilon})$, improving the best-known result $O(\frac{N}{S\epsilon})$ of SCAFFOLD (Karimireddy et al.,2020) by a factor of $N^{1/4}$, where $N$ is the total number of clients (usually is very large in federated learning), $S$ is the sampled subset of clients in each communication round, and $\epsilon$ is the target error; 2) in the nonconvex setting, the number of communication rounds of FedPAGE is $O(\frac{\sqrt{N}+S}{S\epsilon^2})$, improving the best-known result $O(\frac{N^{2/3}}{S^{2/3}\epsilon^2})$ of SCAFFOLD (Karimireddy et al.,2020) by a factor of $N^{1/6}S^{1/3}$, if the sampled clients $S\leq \sqrt{N}$. Note that in both settings, the communication cost for each round is the same for both FedPAGE and SCAFFOLD. As a result, FedPAGE achieves new state-of-the-art results in terms of communication complexity for both federated convex and nonconvex optimization.

FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning

Citations

ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation

EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback

References

LIBSVM: A library for support vector machines

Communication-Efficient Learning of Deep Networks from Decentralized Data

Federated Learning: Strategies for Improving Communication Efficiency

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

Related Papers (5)

Communication steps for parallel query processing

Finding Connected Components on Map-reduce in Logarithmic Rounds

Massively Parallel Computation via Remote Memory Access

Solving the at-most-once problem with nearly optimal effectiveness

An improved algorithm for finding the median distributively