scispace - formally typeset
Open accessPosted Content

A Theorem of the Alternative for Personalized Federated Learning.

Abstract: A widely recognized difficulty in federated learning arises from the statistical heterogeneity among clients: local datasets often come from different but not entirely unrelated distributions, and personalization is, therefore, necessary to achieve optimal results from each individual's perspective. In this paper, we show how the excess risks of personalized federated learning with a smooth, strongly convex loss depend on data heterogeneity from a minimax point of view. Our analysis reveals a surprising theorem of the alternative for personalized federated learning: there exists a threshold such that (a) if a certain measure of data heterogeneity is below this threshold, the FedAvg algorithm [McMahan et al., 2017] is minimax optimal; (b) when the measure of heterogeneity is above this threshold, then doing pure local training (i.e., clients solve empirical risk minimization problems on their local datasets without any communication) is minimax optimal. As an implication, our results show that the presumably difficult (infinite-dimensional) problem of adapting to client-wise heterogeneity can be reduced to a simple binary decision problem of choosing between the two baseline algorithms. Our analysis relies on a new notion of algorithmic stability that takes into account the nature of federated learning.

... read more

Topics: Stability (learning theory) (57.99%), Empirical risk minimization (56%), Study heterogeneity (54%) ... show more
Citations
  More

5 results found


Open accessPosted Content
Idan Achituve1, Aviv Shamsian1, Aviv Navon1, Gal Chechik1  +1 moreInstitutions (1)
29 Jun 2021-arXiv: Learning
Abstract: Federated learning aims to learn a global model that performs well on client devices with limited cross-client communication. Personalized federated learning (PFL) further extends this setup to handle data heterogeneity between clients by learning personalized models. A key challenge in this setting is to learn effectively across clients even though each client has unique data that is often limited in size. Here we present pFedGP, a solution to PFL that is based on Gaussian processes (GPs) with deep kernel learning. GPs are highly expressive models that work well in the low data regime due to their Bayesian nature. However, applying GPs to PFL raises multiple challenges. Mainly, GPs performance depends heavily on access to a good kernel function, and learning a kernel requires a large training set. Therefore, we propose learning a shared kernel function across all clients, parameterized by a neural network, with a personal GP classifier for each client. We further extend pFedGP to include inducing points using two novel methods, the first helps to improve generalization in the low data regime and the second reduces the computational cost. We derive a PAC-Bayes generalization bound on novel clients and empirically show that it gives non-vacuous guarantees. Extensive experiments on standard PFL benchmarks with CIFAR-10, CIFAR-100, and CINIC-10, and on a new setup of learning under input noise show that pFedGP achieves well-calibrated predictions while significantly outperforming baseline methods, reaching up to 21% in accuracy gain.

... read more

1 Citations


Open accessPosted Content
22 Nov 2021-arXiv: Learning
Abstract: Federated Learning (FL) is an increasingly popular machine learning paradigm in which multiple nodes try to collaboratively learn under privacy, communication and multiple heterogeneity constraints. A persistent problem in federated learning is that it is not clear what the optimization objective should be: the standard average risk minimization of supervised learning is inadequate in handling several major constraints specific to federated learning, such as communication adaptivity and personalization control. We identify several key desiderata in frameworks for federated learning and introduce a new framework, FLIX, that takes into account the unique challenges brought by federated learning. FLIX has a standard finite-sum form, which enables practitioners to tap into the immense wealth of existing (potentially non-local) methods for distributed optimization. Through a smart initialization that does not require any communication, FLIX does not require the use of local steps but is still provably capable of performing dissimilarity regularization on par with local methods. We give several algorithms for solving the FLIX formulation efficiently under communication constraints. Finally, we corroborate our theoretical results with extensive experimentation.

... read more

Topics: Supervised learning (53%)

Open accessPosted Content
Gary Cheng1, Karan N. Chadha1, John C. Duchi1Institutions (1)
01 Nov 2021-arXiv: Learning
Abstract: We study the performance of federated learning algorithms and their variants in an asymptotic framework. Our starting point is the formulation of federated learning as a multi-criterion objective, where the goal is to minimize each client's loss using information from all of the clients. We analyze a linear regression model, where, for a given client, we theoretically compare the performance of various algorithms in the high-dimensional asymptotic limit. This asymptotic multi-criterion approach naturally models the high-dimensional, many-device nature of federated learning and suggests that personalization is central to federated learning. In this paper, we investigate how some sophisticated personalization algorithms fare against simple fine-tuning baselines. In particular, our theory suggests that Federated Averaging with client fine-tuning is competitive than more intricate meta-learning and proximal-regularized approaches. In addition to being conceptually simpler, our fine-tuning-based methods are computationally more efficient than their competitors. We corroborate our theoretical claims with extensive experiments on federated versions of the EMNIST, CIFAR-100, Shakespeare, and Stack Overflow datasets.

... read more

Topics: Personalization (50%)

Open accessPosted Content
16 Aug 2021-arXiv: Learning
Abstract: We study the performance of federated learning algorithms and their variants in an asymptotic framework. Our starting point is the formulation of federated learning as a multi-criterion objective, where the goal is to minimize each client's loss using information from all of the clients. We propose a linear regression model, where, for a given client, we theoretically compare the performance of various algorithms in the high-dimensional asymptotic limit. This asymptotic multi-criterion approach naturally models the high-dimensional, many-device nature of federated learning and suggests that personalization is central to federated learning. Our theory suggests that Fine-tuned Federated Averaging (FTFA), i.e., Federated Averaging followed by local training, and the ridge regularized variant Ridge-tuned Federated Averaging (RTFA) are competitive with more sophisticated meta-learning and proximal-regularized approaches. In addition to being conceptually simpler, FTFA and RTFA are computationally more efficient than its competitors. We corroborate our theoretical claims with extensive experiments on federated versions of the EMNIST, CIFAR-100, Shakespeare, and Stack Overflow datasets.

... read more


Open accessPosted Content
Shuxiao Chen1, Bo Zhang1, Ting Ye2Institutions (2)
22 Sep 2021-arXiv: Methodology
Abstract: Randomized controlled trials (RCTs) are the gold standard for evaluating the causal effect of a treatment; however, they often have limited sample sizes and sometimes poor generalizability. On the other hand, non-randomized, observational data derived from large administrative databases have massive sample sizes and better generalizability, but they are prone to unmeasured confounding bias. It is thus of considerable interest to reconcile effect estimates obtained from randomized controlled trials and observational studies investigating the same intervention, potentially harvesting the best from both realms. In this paper, we theoretically characterize the potential efficiency gain of integrating observational data into the RCT-based analysis from a minimax point of view. For estimation, we derive the minimax rate of convergence for the mean squared error, and propose a fully adaptive anchored thresholding estimator that attains the optimal rate up to poly-log factors. For inference, we characterize the minimax rate for the length of confidence intervals and show that adaptation (to unknown confounding bias) is in general impossible. A curious phenomenon thus emerges: for estimation, the efficiency gain from data integration can be achieved without prior knowledge on the magnitude of the confounding bias; for inference, the same task becomes information-theoretically impossible in general. We corroborate our theoretical findings using simulations and a real data example from the RCT DUPLICATE initiative [Franklin et al., 2021b].

... read more

Topics: Minimax (53%), Observational study (52%), Sample size determination (52%) ... show more

References
  More

84 results found


Journal ArticleDOI: 10.1109/TKDE.2009.191
Sinno Jialin Pan1, Qiang Yang1Institutions (1)
Abstract: A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data-labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression, and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.

... read more

Topics: Semi-supervised learning (69%), Inductive transfer (68%), Multi-task learning (67%) ... show more

13,267 Citations


Open accessProceedings Article
Chelsea Finn1, Pieter Abbeel1, Sergey Levine1Institutions (1)
06 Aug 2017-
Abstract: We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on two few-shot image classification benchmarks, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies.

... read more

4,069 Citations


Open accessJournal ArticleDOI: 10.1023/A:1007379606734
Rich Caruana1Institutions (1)
01 Jul 1997-
Abstract: Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared representation; what is learned for each task can help other tasks be learned better. This paper reviews prior work on MTL, presents new evidence that MTL in backprop nets discovers task relatedness without the need of supervisory signals, and presents new results for MTL with k-nearest neighbor and kernel regression. In this paper we demonstrate multitask learning in three domains. We explain how multitask learning works, and show that there are many opportunities for multitask learning in real domains. We present an algorithm and results for multitask learning with case-based methods like k-nearest neighbor and kernel regression, and sketch an algorithm for multitask learning in decision trees. Because multitask learning works, can be applied to many different kinds of domains, and can be used with different learning algorithms, we conjecture there will be many opportunities for its use on real-world problems.

... read more

Topics: Multi-task learning (69%), Inductive bias (57.99%), Inductive transfer (57.99%) ... show more

3,632 Citations


Open accessBook
Shai Shalev-Shwartz1, Shai Ben-David2Institutions (2)
01 Jan 2015-
Abstract: Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides an extensive theoretical account of the fundamental ideas underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics of the field, the book covers a wide array of central topics that have not been addressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for an advanced undergraduate or beginning graduate course, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics, and engineering.

... read more

2,986 Citations


Open accessProceedings Article
H. Brendan McMahan1, Eider Moore1, Daniel Ramage1, Seth Hampson  +1 moreInstitutions (1)
10 Apr 2017-
Abstract: Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device For example, language models can improve speech recognition and text entry, and image models can automatically select good photos However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates We term this decentralized approach Federated Learning We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets These experiments demonstrate the approach is robust to the unbalanced and non-IID data distributions that are a defining characteristic of this setting Communication costs are the principal constraint, and we show a reduction in required communication rounds by 10-100x as compared to synchronized stochastic gradient descent

... read more

2,970 Citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20215
Network Information
Related Papers (5)
Second-order quantile methods for experts and combinatorial games01 Jun 2015

Wouter M. Koolen, Tim van Erven

70% related
A Decision Theoretic Approach to A/B Testing10 Oct 2017, arXiv: Statistics Theory

David E. Goldberg, James E. Johndrow

69% related
Learning From People01 Jan 2017

Nihar Bhadresh Shah

68% related
Learning Diverse Bayesian Networks17 Jul 2019

Cong Chen, Changhe Yuan

68% related