Proceedings ArticleDOI
FedAvg with Fine Tuning: Local Updates Lead to Representation Learning
Liam Collins,Hamed Hassani,Aryan Mokhtari,Sanjay Shakkottai +3 more
- Vol. abs/2205.13692
TLDR
The reason behind generalizability of the FedAvg’s output is its power in learning the common data representation among the clients’ tasks, by leveraging the diversity among client data distributions via local updates, in the multi-task linear representation setting.Abstract:
The Federated Averaging (FedAvg) algorithm, which consists of alternating between a few local stochastic gradient updates at client nodes, followed by a model averaging update at the server, is perhaps the most commonly used method in Federated Learning. Notwithstanding its simplicity, several empirical studies have illustrated that the output model of FedAvg, after a few fine-tuning steps, leads to a model that generalizes well to new unseen tasks. This surprising performance of such a simple method, however, is not fully understood from a theoretical point of view. In this paper, we formally investigate this phenomenon in the multi-task linear representation setting. We show that the reason behind generalizability of the FedAvg's output is its power in learning the common data representation among the clients' tasks, by leveraging the diversity among client data distributions via local updates. We formally establish the iteration complexity required by the clients for proving such result in the setting where the underlying shared representation is a linear map. To the best of our knowledge, this is the first such result for any setting. We also provide empirical evidence demonstrating FedAvg's representation learning ability in federated image classification with heterogeneous data.read more
Citations
More filters
Journal ArticleDOI
Learning to Generate Image Embeddings with User-level Differential Privacy
Zheng Xu,Maxwell D. Collins,Yuxiao Wang,Liviu Panait,Se-Heum Oh,Sean Augenstein,Ting Liu,Florian Schroff,H. Brendan McMahan +8 more
TL;DR: DP-FedEmb as discussed by the authors is a variant of federated learning algorithms with per-user sensitivity control and noise addition, to train from user-partitioned data centralized in the datacenter.
Journal ArticleDOI
Partial Variance Reduction improves Non-Convex Federated learning on heterogeneous data
TL;DR: This article proposed to correct model drift by variance reduction only on the final layers, which significantly outperforms existing benchmarks at a similar or lower communication cost and provides proof for the convergence rate.
Journal ArticleDOI
Personalised Federated Learning On Heterogeneous Feature Spaces
TL;DR: In this article , the authors propose a federated federated learning (FLIC) approach to map client's data onto a common feature space via local embedding functions, in which the common space is learned in a federate manner using Wasserstein barycenters while the local embeddings are trained on each client via distribution alignment.
Journal ArticleDOI
GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity
TL;DR: In this paper , the authors study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing the clients to perform multiple local gradient-type training steps prior to communication, and they show that their modified method converges linearly under the same assumptions, has the same accelerated communication complexity, while the number of local gradient steps can be reduced relative to a local condition number.
Journal ArticleDOI
Quantifying the Impact of Label Noise on Federated Learning
Shuqi Ke,Chao Huang,Xin Liu +2 more
TL;DR: In this article , an upper bound for the generalization error that is linear in the clients' label noise level is derived, and the empirical results show that the global model accuracy linearly decreases as the noise level increases, which is consistent with the theoretical analysis.
References
More filters
Posted Content
A Simple Framework for Contrastive Learning of Visual Representations
TL;DR: It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.
Proceedings Article
Model-agnostic meta-learning for fast adaptation of deep networks
TL;DR: An algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning is proposed.
Posted Content
Communication-Efficient Learning of Deep Networks from Decentralized Data
TL;DR: This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.
Proceedings ArticleDOI
Dimensionality Reduction by Learning an Invariant Mapping
TL;DR: This work presents a method - called Dimensionality Reduction by Learning an Invariant Mapping (DrLIM) - for learning a globally coherent nonlinear function that maps the data evenly to the output manifold.
Federated Optimization in Heterogeneous Networks
TL;DR: This work introduces a framework, FedProx, to tackle heterogeneity in federated networks, and provides convergence guarantees for this framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work.