Open AccessPosted Content
A Theorem of the Alternative for Personalized Federated Learning.
TLDR
This paper shows how the excess risks of personalized federated learning with a smooth, strongly convex loss depend on data heterogeneity from a minimax point of view, and reveals a surprising theorem of the alternative for personalized federation learning.Abstract:
A widely recognized difficulty in federated learning arises from the statistical heterogeneity among clients: local datasets often come from different but not entirely unrelated distributions, and personalization is, therefore, necessary to achieve optimal results from each individual's perspective. In this paper, we show how the excess risks of personalized federated learning with a smooth, strongly convex loss depend on data heterogeneity from a minimax point of view. Our analysis reveals a surprising theorem of the alternative for personalized federated learning: there exists a threshold such that (a) if a certain measure of data heterogeneity is below this threshold, the FedAvg algorithm [McMahan et al., 2017] is minimax optimal; (b) when the measure of heterogeneity is above this threshold, then doing pure local training (i.e., clients solve empirical risk minimization problems on their local datasets without any communication) is minimax optimal. As an implication, our results show that the presumably difficult (infinite-dimensional) problem of adapting to client-wise heterogeneity can be reduced to a simple binary decision problem of choosing between the two baseline algorithms. Our analysis relies on a new notion of algorithmic stability that takes into account the nature of federated learning.read more
Citations
More filters
Proceedings ArticleDOI
FedAvg with Fine Tuning: Local Updates Lead to Representation Learning
TL;DR: The reason behind generalizability of the FedAvg’s output is its power in learning the common data representation among the clients’ tasks, by leveraging the diversity among client data distributions via local updates, in the multi-task linear representation setting.
Journal Article
Adaptive and Robust Multi-task Learning
Yaqiong Duan,Kaizheng Wang +1 more
TL;DR: In this article , a family of adaptive methods that automatically utilize possible similarities among those tasks while carefully handling their differences is proposed. But their robustness against outlier tasks is questionable.
Proceedings ArticleDOI
Privacy-Preserving Federated Multi-Task Linear Regression: A One-Shot Linear Mixing Approach Inspired By Graph Regularization
TL;DR: This work focuses on the federated multi-task linear regression setting, where each machine possesses its own data for individual tasks and sharing the full local data between machines is prohibited, and proposes a novel fusion framework that only requires a one-shot communication of local estimates.
Posted Content
Personalized Federated Learning with Gaussian Processes
TL;DR: In this paper, a solution to PFL that is based on Gaussian processes (GPs) with deep kernel learning is presented, where a shared kernel function across all clients, parameterized by a neural network, with a personal GP classifier for each client.
Journal ArticleDOI
Personalized Federated Learning with Multiple Known Clusters
TL;DR: This work develops an algorithm that allows each cluster to communicate independently and derive the convergence results, and studies a hierarchical linear model to theoretically demonstrate that this approach outperforms agents learning independently and agents learning a single shared weight.
References
More filters
Proceedings Article
Minibatch vs Local SGD for Heterogeneous Distributed Learning
TL;DR: In this paper, the authors analyzed Local SGD and Minibatch SGD in the heterogeneous distributed setting, where each machine has access to stochastic gradient estimates for a different, machine-specific, convex objective, and machines can only communicate intermittently.
Proceedings Article
Backpropagation Convergence Via Deterministic Nonmonotone Perturbed Minimization
TL;DR: The fundamental backpropagation algorithm for training artificial neural networks is cast as a deterministic nonmonotone perturbed gradient method, and the results presented cover serial and parallel online BP, modified BP with a momentum term, and BP with weight decay.
Posted Content
A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization.
Zhize Li,Peter Richtárik +1 more
TL;DR: This paper provides a single convergence analysis for all methods that satisfy the proposed unified assumption of the second moment of the stochastic gradient, thereby offering a unified understanding of SGD variants in the nonconvex regime instead of relying on dedicated analyses of each variant.
Posted Content
A No-Free-Lunch Theorem for MultiTask Learning.
Steve Hanneke,Samory Kpotufe +1 more
TL;DR: This work considers a seemingly favorable classification scenario where all tasks share a common optimal classifier and which can be shown to admit a broad range of regimes with improved oracle rates, and shows that no adaptive algorithm exists.
Posted Content
Distributed Stochastic Multi-Task Learning with Graph Regularization.
TL;DR: It is shown how simply skewing the averaging weights or controlling the stepsize allows learning different, but related, tasks on the different machines.