scispace - formally typeset
Search or ask a question

Showing papers by "Anit Kumar Sahu published in 2020"


Journal ArticleDOI
TL;DR: In this paper, the authors discuss the unique characteristics and challenges of federated learning, provide a broad overview of current approaches, and outline several directions of future work that are relevant to a wide range of research communities.
Abstract: Federated learning involves training statistical models over remote devices or siloed data centers, such as mobile phones or hospitals, while keeping data localized. Training in heterogeneous and potentially massive networks introduces novel challenges that require a fundamental departure from standard approaches for large-scale machine learning, distributed optimization, and privacy-preserving data analysis. In this article, we discuss the unique characteristics and challenges of federated learning, provide a broad overview of current approaches, and outline several directions of future work that are relevant to a wide range of research communities.

2,163 citations


15 Mar 2020
TL;DR: This work introduces a framework, FedProx, to tackle heterogeneity in federated networks, and provides convergence guarantees for this framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work.
Abstract: Federated Learning is a distributed learning paradigm with two key challenges that differentiate it from traditional distributed optimization: (1) significant variability in terms of the systems characteristics on each device in the network (systems heterogeneity), and (2) non-identically distributed data across the network (statistical heterogeneity). In this work, we introduce a framework, FedProx, to tackle heterogeneity in federated networks. FedProx can be viewed as a generalization and re-parametrization of FedAvg, the current state-of-the-art method for federated learning. While this re-parameterization makes only minor modifications to the method itself, these modifications have important ramifications both in theory and in practice. Theoretically, we provide convergence guarantees for our framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work (systems heterogeneity). Practically, we demonstrate that FedProx allows for more robust convergence than FedAvg across a suite of realistic federated datasets. In particular, in highly heterogeneous settings, FedProx demonstrates significantly more stable and accurate convergence behavior relative to FedAvg---improving absolute test accuracy by 22% on average.

1,490 citations


Posted Content
TL;DR: This work proposes FedDANE, an optimization method that is adapted from DANE, a method for classical distributed optimization, to handle the practical constraints of federated learning, and provides convergence guarantees for this method when learning over both convex and non-convex functions.
Abstract: Federated learning aims to jointly learn statistical models over massively distributed remote devices. In this work, we propose FedDANE, an optimization method that we adapt from DANE, a method for classical distributed optimization, to handle the practical constraints of federated learning. We provide convergence guarantees for this method when learning over both convex and non-convex functions. Despite encouraging theoretical results, we find that the method has underwhelming performance empirically. In particular, through empirical simulations on both synthetic and real-world datasets, FedDANE consistently underperforms baselines of FedAvg and FedProx in realistic federated settings. We identify low device participation and statistical device heterogeneity as two underlying causes of this underwhelming performance, and conclude by suggesting several directions of future work.

63 citations


Posted Content
13 Jul 2020
TL;DR: This work proposes a simple and efficient Bayesian Optimization (BO) based approach for developing black-box adversarial attacks, which consistently achieves 2x to 10x higher attack success rate while requiring 10x to 20x fewer queries compared to the current state-of-the-art black- box adversarial attack.
Abstract: We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples for deep learning models solely based on information limited to output labels (hard label) to a queried data input. We use Bayesian optimization (BO) to specifically cater to scenarios involving low query budgets to develop efficient adversarial attacks. Issues with BO's performance in high dimensions are avoided by searching for adversarial examples in structured low-dimensional subspace. Our proposed approach achieves better performance to state of the art black-box adversarial attacks that require orders of magnitude more queries than ours.

14 citations


Journal ArticleDOI
18 Aug 2020
TL;DR: An overview of recent work in the area of distributed zeroth-order optimization, focusing on constrained optimization settings and algorithms built around the Frank–Wolfe framework is presented.
Abstract: Zeroth-order optimization algorithms are an attractive alternative for stochastic optimization problems, when gradient computations are expensive or when closed-form loss functions are not available. Recently, there has been a surge of activity in utilizing zeroth-order optimization algorithms in myriads of applications including black-box adversarial attacks on machine learning frameworks, reinforcement learning, and simulation-based optimization, to name a few. In addition to utilizing the simplicity of a typical zeroth-order optimization scheme, distributed implementations of zeroth-order schemes so as to exploit data parallelizability are getting significant attention recently. This article presents an overview of recent work in the area of distributed zeroth-order optimization, focusing on constrained optimization settings and algorithms built around the Frank–Wolfe framework. In particular, we review different types of architectures, from master–worker-based decentralized to fully distributed, and describe appropriate zeroth-order projection-free schemes for solving constrained stochastic optimization problems catered to these architectures. We discuss performance issues including convergence rates and dimension dependence. In addition, we also focus on more refined extensions such as by employing variance reduction and describe and quantify convergence rates for a variance-reduced decentralized zeroth-order optimization method inspired by martingale difference sequences. We discuss limitations of zeroth-order optimization frameworks in terms of dimension dependence. Finally, we illustrate the use of distributed zeroth-order algorithms in the context of adversarial attacks on deep learning models.

11 citations


Proceedings ArticleDOI
01 Nov 2020
TL;DR: In this article, the authors propose several variants of the MatchA algorithm and show that MATCHA can work with many other activation schemes and decentralized computation tasks and can reduce the communication delay for free in decentralized environments.
Abstract: Decentralized stochastic gradient descent (SGD) has recently become one of the most promising methods to use data parallelism in order to train a machine learning model on a network of arbitrarily connected nodes/edge devices. Although the error convergence of decentralized SGD has been well studied in the last decade, most of the previous works do not explicitly consider how the network topology influences the overall convergence time. Communicating over all available links in the network may give faster error convergence, however, it will also incur higher communication overhead. The MATCHA algorithm proposed in [1] achieves a win-win in this error-runtime trade-off by judiciously sampling the communication graph. In this paper, we propose several variants of the MATCHA algorithm and show that MATCHA can work with many other activation schemes and decentralized computation tasks. It is a flexible framework to reduce the communication delay for free in decentralized environments.

3 citations


Posted Content
TL;DR: In this paper, a simple and efficient Bayesian Optimization (BO) based approach for developing black-box adversarial attacks is proposed. But the method is limited to output label (hard label) to a queried data input.
Abstract: We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples for deep learning models solely based on information limited to output label~(hard label) to a queried data input. We propose a simple and efficient Bayesian Optimization~(BO) based approach for developing black-box adversarial attacks. Issues with BO's performance in high dimensions are avoided by searching for adversarial examples in a structured low-dimensional subspace. We demonstrate the efficacy of our proposed attack method by evaluating both $\ell_\infty$ and $\ell_2$ norm constrained untargeted and targeted hard label black-box attacks on three standard datasets - MNIST, CIFAR-10 and ImageNet. Our proposed approach consistently achieves 2x to 10x higher attack success rate while requiring 10x to 20x fewer queries compared to the current state-of-the-art black-box adversarial attacks.

1 citations