scispace - formally typeset
Proceedings ArticleDOI

FedAvg with Fine Tuning: Local Updates Lead to Representation Learning

TLDR
The reason behind generalizability of the FedAvg’s output is its power in learning the common data representation among the clients’ tasks, by leveraging the diversity among client data distributions via local updates, in the multi-task linear representation setting.
Abstract
The Federated Averaging (FedAvg) algorithm, which consists of alternating between a few local stochastic gradient updates at client nodes, followed by a model averaging update at the server, is perhaps the most commonly used method in Federated Learning. Notwithstanding its simplicity, several empirical studies have illustrated that the output model of FedAvg, after a few fine-tuning steps, leads to a model that generalizes well to new unseen tasks. This surprising performance of such a simple method, however, is not fully understood from a theoretical point of view. In this paper, we formally investigate this phenomenon in the multi-task linear representation setting. We show that the reason behind generalizability of the FedAvg's output is its power in learning the common data representation among the clients' tasks, by leveraging the diversity among client data distributions via local updates. We formally establish the iteration complexity required by the clients for proving such result in the setting where the underlying shared representation is a linear map. To the best of our knowledge, this is the first such result for any setting. We also provide empirical evidence demonstrating FedAvg's representation learning ability in federated image classification with heterogeneous data.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Learning to Generate Image Embeddings with User-level Differential Privacy

TL;DR: DP-FedEmb as discussed by the authors is a variant of federated learning algorithms with per-user sensitivity control and noise addition, to train from user-partitioned data centralized in the datacenter.
Journal ArticleDOI

Partial Variance Reduction improves Non-Convex Federated learning on heterogeneous data

TL;DR: This article proposed to correct model drift by variance reduction only on the final layers, which significantly outperforms existing benchmarks at a similar or lower communication cost and provides proof for the convergence rate.
Journal ArticleDOI

Personalised Federated Learning On Heterogeneous Feature Spaces

TL;DR: In this article , the authors propose a federated federated learning (FLIC) approach to map client's data onto a common feature space via local embedding functions, in which the common space is learned in a federate manner using Wasserstein barycenters while the local embeddings are trained on each client via distribution alignment.
Journal ArticleDOI

GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity

TL;DR: In this paper , the authors study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing the clients to perform multiple local gradient-type training steps prior to communication, and they show that their modified method converges linearly under the same assumptions, has the same accelerated communication complexity, while the number of local gradient steps can be reduced relative to a local condition number.
Journal ArticleDOI

Quantifying the Impact of Label Noise on Federated Learning

Shuqi Ke, +2 more
- 15 Nov 2022 - 
TL;DR: In this article , an upper bound for the generalization error that is linear in the clients' label noise level is derived, and the empirical results show that the global model accuracy linearly decreases as the noise level increases, which is consistent with the theoretical analysis.
References
More filters
Posted Content

A Simple Framework for Contrastive Learning of Visual Representations

TL;DR: It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.
Proceedings Article

Model-agnostic meta-learning for fast adaptation of deep networks

TL;DR: An algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning is proposed.
Posted Content

Communication-Efficient Learning of Deep Networks from Decentralized Data

TL;DR: This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.
Proceedings ArticleDOI

Dimensionality Reduction by Learning an Invariant Mapping

TL;DR: This work presents a method - called Dimensionality Reduction by Learning an Invariant Mapping (DrLIM) - for learning a globally coherent nonlinear function that maps the data evenly to the output manifold.

Federated Optimization in Heterogeneous Networks

TL;DR: This work introduces a framework, FedProx, to tackle heterogeneity in federated networks, and provides convergence guarantees for this framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work.