Proceedings ArticleDOI
SoteriaFL: A Unified Framework for Private Federated Learning with Communication Compression
Zhize Li,Haoyu Zhao,Boyue Li,Yuejie Chi +3 more
- Vol. abs/2206.09888
Reads0
Chats0
TLDR
A framework called SoteriaFL is proposed, which accommodates a general family of local gradient estimators including popular stochastic variance-reduced gradient methods and the state-of-the-art shifted compression scheme, and is shown to achieve better communication complexity without sacrificing privacy nor utility than other private federated learning algorithms without communication compression.Abstract:
To enable large-scale machine learning in bandwidth-hungry environments such as wireless networks, significant progress has been made recently in designing communication-efficient federated learning algorithms with the aid of communication compression. On the other end, privacy-preserving, especially at the client level, is another important desideratum that has not been addressed simultaneously in the presence of advanced communication compression techniques yet. In this paper, we propose a unified framework that enhances the communication efficiency of private federated learning with communication compression. Exploiting both general compression operators and local differential privacy, we first examine a simple algorithm that applies compression directly to differentially-private stochastic gradient descent, and identify its limitations. We then propose a unified framework SoteriaFL for private federated learning, which accommodates a general family of local gradient estimators including popular stochastic variance-reduced gradient methods and the state-of-the-art shifted compression scheme. We provide a comprehensive characterization of its performance trade-offs in terms of privacy, utility, and communication complexity, where SoteraFL is shown to achieve better communication complexity without sacrificing privacy nor utility than other private federated learning algorithms without communication compression.read more
Citations
More filters
Proceedings Article
BEER: Fast O(1/T) Rate for Decentralized Nonconvex Optimization with Communication Compression
TL;DR: This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of O (1 /T ) than the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity.
Journal ArticleDOI
Simple and Optimal Stochastic Gradient Methods for Nonsmooth Nonconvex Optimization
Zhize Li,Jian-Bing Li +1 more
TL;DR: In this paper , a simple proximal stochastic gradient algorithm based on variance reduction called ProxSVRG+ was proposed for finding stationary points or local minimum in nonconvex, possibly with nonsmooth regularizer, finite-sum and online optimization problems.
Journal ArticleDOI
Bayesian Federated Learning: A Survey
TL;DR: Bayesian federated learning (BFL) has emerged as a promising approach to address the issues of limited and dynamic data and conditions, complexities including heterogeneities and uncertainties, and analytical explainability as discussed by the authors .
Journal ArticleDOI
DIFF2: Differential Private Optimization via Gradient Differences for Nonconvex Distributed Learning
Tomoya Murata,Taiji Suzuki +1 more
TL;DR: In this paper , the authors proposed a differential private optimization via gradient differences (DIFF2) framework, which constructs a global gradient estimator with possibly quite small variance based on communicated gradient differences rather than gradients themselves.
Journal ArticleDOI
Distributed Learning Meets 6G: A Communication and Computing Perspective
TL;DR: In this article , the authors provide an outline of how DL in general and FL-based strategies specifically can contribute toward realizing part of the 6G vision and strike a balance between communication and computing constraints.
References
More filters
Journal ArticleDOI
LIBSVM: A library for support vector machines
Chih-Chung Chang,Chih-Jen Lin +1 more
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Proceedings Article
Language Models are Few-Shot Learners
Tom B. Brown,Benjamin Mann,Nick Ryder,Melanie Subbiah,Jared Kaplan,Prafulla Dhariwal,Arvind Neelakantan,Pranav Shyam,Girish Sastry,Amanda Askell,Sandhini Agarwal,Ariel Herbert-Voss,Gretchen Krueger,Thomas Henighan,Rewon Child,Aditya Ramesh,Daniel M. Ziegler,Jeffrey Wu,Clemens Winter,Christopher Hesse,Mark Chen,Eric Sigler,Mateusz Litwin,Scott Gray,Benjamin Chess,Jack Clark,Christopher Berner,Samuel McCandlish,Alec Radford,Ilya Sutskever,Dario Amodei +30 more
TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner,Patrick Haffner +7 more
TL;DR: This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.
Book ChapterDOI
Calibrating noise to sensitivity in private data analysis
TL;DR: In this article, the authors show that for several particular applications substantially less noise is needed than was previously understood to be the case, and also show the separation results showing the increased value of interactive sanitization mechanisms over non-interactive.
Posted Content
Communication-Efficient Learning of Deep Networks from Decentralized Data
TL;DR: This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.