scispace - formally typeset
Open AccessPosted Content

A Theorem of the Alternative for Personalized Federated Learning.

TLDR
This paper shows how the excess risks of personalized federated learning with a smooth, strongly convex loss depend on data heterogeneity from a minimax point of view, and reveals a surprising theorem of the alternative for personalized federation learning.
Abstract
A widely recognized difficulty in federated learning arises from the statistical heterogeneity among clients: local datasets often come from different but not entirely unrelated distributions, and personalization is, therefore, necessary to achieve optimal results from each individual's perspective. In this paper, we show how the excess risks of personalized federated learning with a smooth, strongly convex loss depend on data heterogeneity from a minimax point of view. Our analysis reveals a surprising theorem of the alternative for personalized federated learning: there exists a threshold such that (a) if a certain measure of data heterogeneity is below this threshold, the FedAvg algorithm [McMahan et al., 2017] is minimax optimal; (b) when the measure of heterogeneity is above this threshold, then doing pure local training (i.e., clients solve empirical risk minimization problems on their local datasets without any communication) is minimax optimal. As an implication, our results show that the presumably difficult (infinite-dimensional) problem of adapting to client-wise heterogeneity can be reduced to a simple binary decision problem of choosing between the two baseline algorithms. Our analysis relies on a new notion of algorithmic stability that takes into account the nature of federated learning.

read more

Citations
More filters
Proceedings ArticleDOI

FedAvg with Fine Tuning: Local Updates Lead to Representation Learning

TL;DR: The reason behind generalizability of the FedAvg’s output is its power in learning the common data representation among the clients’ tasks, by leveraging the diversity among client data distributions via local updates, in the multi-task linear representation setting.
Journal Article

Adaptive and Robust Multi-task Learning

Yaqiong Duan, +1 more
- 10 Feb 2022 - 
TL;DR: In this article , a family of adaptive methods that automatically utilize possible similarities among those tasks while carefully handling their differences is proposed. But their robustness against outlier tasks is questionable.
Proceedings ArticleDOI

Privacy-Preserving Federated Multi-Task Linear Regression: A One-Shot Linear Mixing Approach Inspired By Graph Regularization

TL;DR: This work focuses on the federated multi-task linear regression setting, where each machine possesses its own data for individual tasks and sharing the full local data between machines is prohibited, and proposes a novel fusion framework that only requires a one-shot communication of local estimates.
Posted Content

Personalized Federated Learning with Gaussian Processes

TL;DR: In this paper, a solution to PFL that is based on Gaussian processes (GPs) with deep kernel learning is presented, where a shared kernel function across all clients, parameterized by a neural network, with a personal GP classifier for each client.
Journal ArticleDOI

Personalized Federated Learning with Multiple Known Clusters

TL;DR: This work develops an algorithm that allows each cluster to communicate independently and derive the convergence results, and studies a hierarchical linear model to theoretically demonstrate that this approach outperforms agents learning independently and agents learning a single shared weight.
References
More filters
Proceedings Article

Minibatch vs Local SGD for Heterogeneous Distributed Learning

TL;DR: In this paper, the authors analyzed Local SGD and Minibatch SGD in the heterogeneous distributed setting, where each machine has access to stochastic gradient estimates for a different, machine-specific, convex objective, and machines can only communicate intermittently.
Proceedings Article

Backpropagation Convergence Via Deterministic Nonmonotone Perturbed Minimization

TL;DR: The fundamental backpropagation algorithm for training artificial neural networks is cast as a deterministic nonmonotone perturbed gradient method, and the results presented cover serial and parallel online BP, modified BP with a momentum term, and BP with weight decay.
Posted Content

A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization.

TL;DR: This paper provides a single convergence analysis for all methods that satisfy the proposed unified assumption of the second moment of the stochastic gradient, thereby offering a unified understanding of SGD variants in the nonconvex regime instead of relying on dedicated analyses of each variant.
Posted Content

A No-Free-Lunch Theorem for MultiTask Learning.

TL;DR: This work considers a seemingly favorable classification scenario where all tasks share a common optimal classifier and which can be shown to admit a broad range of regimes with improved oracle rates, and shows that no adaptive algorithm exists.
Posted Content

Distributed Stochastic Multi-Task Learning with Graph Regularization.

TL;DR: It is shown how simply skewing the averaging weights or controlling the stepsize allows learning different, but related, tasks on the different machines.