scispace - formally typeset
Search or ask a question

Showing papers by "Nitin H. Vaidya published in 2021"


Journal ArticleDOI
TL;DR: This work is among the first to study Byzantine-resilient optimization where no central coordinating agent exists, and it is thefirst to characterize the structures of the convex coefficients of the achievable global objectives.
Abstract: We consider the problem of multiagent optimization wherein an unknown subset of agents suffer Byzantine faults and thus behave adversarially. We assume that each agent $i$ has a local cost function $f_i$ , and the overarching goal of the good agents is to collaboratively minimize a global objective that properly aggregates these local cost functions. To the best of our knowledge, we are among the first to study Byzantine-resilient optimization where no central coordinating agent exists, and we are the first to characterize the structures of the convex coefficients of the achievable global objectives. Dealing with Byzantine faults is very challenging. For example, in contrast to fault-free networks, reaching Byzantine-resilient agreement even in the simplest setting is far from trivial. We take a step toward solving the proposed Byzantine-resilient multiagent optimization problem by focusing on scalar local cost functions. Our results might provide useful insights for the general local cost functions.

29 citations


Proceedings ArticleDOI
25 May 2021
TL;DR: In this article, the authors considered the problem of exact Byzantine fault-tolerance in multi-agent decentralized optimization and proposed the first decentralized algorithm with provable exact fault tolerance against a bounded fraction of faulty agents, provided the non-faulty agents have the necessary property named 2f-redundancy.
Abstract: This paper considers the problem of exact Byzantine fault-tolerance in multi-agent decentralized optimization. We consider a complete peer-to-peer network of $n$ agents; each agent has a local cost function, however, up to $f$ of the agents are Byzantine faulty. Such faulty agents may not follow a prescribed algorithm correctly, and may share arbitrary incorrect information about their costs. The goal of the exact fault-tolerance problem is to design a decentralized algorithm that allows all the non-faulty agents to compute a common minimum point of only non-faulty agents' aggregate cost function. We propose the first-ever decentralized algorithm with provable exact fault-tolerance against a bounded fraction of faulty agents, provided the non-faulty agents have the necessary property named 2f-redundancy, defined later in the paper.

17 citations


Proceedings ArticleDOI
21 Jul 2021
TL;DR: In this paper, the authors consider Byzantine fault-tolerance in distributed multi-agent optimization and obtain necessary and sufficient conditions for achieving (f, e)-resilience characterizing the correlation between relaxation in redundancy and approximation in resilience.
Abstract: This paper considers the problem of Byzantine fault-tolerance in distributed multi-agent optimization. In this problem, each agent has a local cost function, and in the fault-free case, the goal is to design a distributed algorithm that allows all the agents to find a minimum point of all the agents' aggregate cost function. We consider a scenario where some agents might be Byzantine faulty that renders the original goal of computing a minimum point of all the agents' aggregate cost vacuous. A more reasonable objective for an algorithm in this scenario is to allow all the non-faulty agents to compute the minimum point of only the non-faulty agents' aggregate cost. Prior work shows that if there are up to f (out of n) Byzantine agents then a minimum point of the non-faulty agents' aggregate cost can be computed exactly if and only if the non-faulty agents' costs satisfy a certain redundancy property called 2f-redundancy. However, 2f-redundancy is an ideal property that can be satisfied only in systems free from noise or uncertainties, which can make the goal of exact fault-tolerance unachievable in some applications. Thus, we introduce the notion of (f,e)-resilience, a generalization of exact fault-tolerance wherein the objective is to find an approximate minimum point of the non-faulty aggregate cost, with e accuracy. This approximate fault-tolerance can be achieved under a weaker condition that is easier to satisfy in practice, compared to 2f-redundancy. We obtain necessary and sufficient conditions for achieving (f, e)-resilience characterizing the correlation between relaxation in redundancy and approximation in resilience. In case when the agents' cost functions are differentiable, we obtain conditions for (f, e)-resilience of the distributed gradient-descent method when equipped with robust gradient aggregation; such as comparative gradient elimination or coordinate-wise trimmed mean.

15 citations


Proceedings ArticleDOI
01 Jan 2021
TL;DR: In this paper, the authors consider the Byzantine fault-tolerance problem in distributed stochastic gradient descent (D-SGD) method and propose a norm-based gradient-filter, named comparative gradient elimination (CGE), that robustifies the D- SGD method against Byzantine agents.
Abstract: This paper considers the Byzantine fault-tolerance problem in distributed stochastic gradient descent (D-SGD) method – a popular algorithm for distributed multi-agent machine learning. In this problem, each agent samples data points independently from a certain data-generating distribution. In the fault-free case, the D-SGD method allows all the agents to learn a mathematical model best fitting the data collectively sampled by all agents. We consider the case when a fraction of agents may be Byzantine faulty. Such faulty agents may not follow a prescribed algorithm correctly, and may render traditional D-SGD method ineffective by sharing arbitrary incorrect stochastic gradients. We propose a norm-based gradient-filter, named comparative gradient elimination (CGE), that robustifies the D-SGD method against Byzantine agents. We show that the CGE gradient-filter guarantees fault-tolerance against a bounded fraction of Byzantine agents under standard stochastic assumptions, and is computationally simpler compared to many existing gradient-filters such as multi-KRUM, geometric median-of-means, and the spectral filters. We empirically show, by simulating distributed learning on neural networks, that the fault-tolerance of CGE is comparable to that of existing gradient-filters. We also empirically show that exponential averaging of stochastic gradients improves the fault-tolerance of a generic gradient-filter.

7 citations


Posted Content
TL;DR: In this paper, the authors consider the problem of Byzantine fault-tolerance in distributed multi-agent optimization, where some agents might be Byzantine faulty that renders the original goal of computing a minimum point of all the agents' aggregate cost vacuous.
Abstract: This paper considers the problem of Byzantine fault-tolerance in distributed multi-agent optimization. In this problem, each agent has a local cost function, and in the fault-free case, the goal is to design a distributed algorithm that allows all the agents to find a minimum point of all the agents' aggregate cost function. We consider a scenario where some agents might be Byzantine faulty that renders the original goal of computing a minimum point of all the agents' aggregate cost vacuous. A more reasonable objective for an algorithm in this scenario is to allow all the non-faulty agents to compute the minimum point of only the non-faulty agents' aggregate cost. Prior work shows that if there are up to $f$ (out of $n$) Byzantine agents then a minimum point of the non-faulty agents' aggregate cost can be computed exactly if and only if the non-faulty agents' costs satisfy a certain redundancy property called $2f$-redundancy. However, $2f$-redundancy is an ideal property that can be satisfied only in systems free from noise or uncertainties, which can make the goal of exact fault-tolerance unachievable in some applications. Thus, we introduce the notion of $(f,\epsilon)$-resilience, a generalization of exact fault-tolerance wherein the objective is to find an approximate minimum point of the non-faulty aggregate cost, with $\epsilon$ accuracy. This approximate fault-tolerance can be achieved under a weaker condition that is easier to satisfy in practice, compared to $2f$-redundancy. We obtain necessary and sufficient conditions for achieving $(f,\epsilon)$-resilience characterizing the correlation between relaxation in redundancy and approximation in resilience. In case when the agents' cost functions are differentiable, we obtain conditions for $(f,\epsilon)$-resilience of the distributed gradient-descent method when equipped with robust gradient aggregation.

5 citations


Proceedings ArticleDOI
21 Jul 2021
TL;DR: In this article, the authors consider contention resolution algorithms augmented with predictions about the network and prove lower bounds on the expected time complexity with respect to the Shannon entropy of the corresponding network size random variable, for both the collision detection and no collision detection assumptions.
Abstract: In this paper, we consider contention resolution algorithms that are augmented with predictions about the network. We begin by studying the natural setup in which the algorithm is provided a distribution defined over the possible network sizes that predicts the likelihood of each size occurring. The goal is to leverage the predictive power of this distribution to improve on worst-case time complexity bounds. Using a novel connection between contention resolution and information theory, we prove lower bounds on the expected time complexity with respect to the Shannon entropy of the corresponding network size random variable, for both the collision detection and no collision detection assumptions. We then analyze upper bounds for these settings, assuming now that the distribution provided as input might differ from the actual distribution generating network sizes. We express their performance with respect to both entropy and the statistical divergence between the two distributions---allowing us to quantify the cost of poor predictions. Finally, we turn our attention to the related perfect advice setting, parameterized with a length b ≥ 0, in which all active processes in a given execution are provided the best possible b bits of information about their network. We provide tight bounds on the speed-up possible with respect to b for deterministic and randomized algorithms, with and without collision detection. These bounds provide a fundamental limit on the maximum power that can be provided by any predictive model with a bounded output size.

4 citations


Posted Content
TL;DR: An asynchronous DGD algorithm where in each iteration the server only waits for (any) n− r agents, instead of all the n agents, implying solvability of the original multi-agent optimization problem with accuracy, despite the removal of up to r agents from the system.
Abstract: This paper considers the problem of asynchronous distributed multi-agent optimization on server-based system architecture. In this problem, each agent has a local cost, and the goal for the agents is to collectively find a minimum of their aggregate cost. A standard algorithm to solve this problem is the iterative distributed gradient-descent (DGD) method being implemented collaboratively by the server and the agents. In the synchronous setting, the algorithm proceeds from one iteration to the next only after all the agents complete their expected communication with the server. However, such synchrony can be expensive and even infeasible in real-world applications. We show that waiting for all the agents is unnecessary in many applications of distributed optimization, including distributed machine learning, due to redundancy in the cost functions (or {\em data}). Specifically, we consider a generic notion of redundancy named $(r,\epsilon)$-redundancy implying solvability of the original multi-agent optimization problem with $\epsilon$ accuracy, despite the removal of up to $r$ (out of total $n$) agents from the system. We present an asynchronous DGD algorithm where in each iteration the server only waits for (any) $n-r$ agents, instead of all the $n$ agents. Assuming $(r,\epsilon)$-redundancy, we show that our asynchronous algorithm converges to an approximate solution with error that is linear in $\epsilon$ and $r$. Moreover, we also present a generalization of our algorithm to tolerate some Byzantine faulty agents in the system. Finally, we demonstrate the improved communication efficiency of our algorithm through experiments on MNIST and Fashion-MNIST using the benchmark neural network LeNet.

3 citations


Journal ArticleDOI
01 Jul 2021
TL;DR: A distributed optimization protocol that preserves statistical privacy of agents’ local cost functions against a passive adversary that corrupts some agents in the network and ensures accuracy of the computed solution.
Abstract: We present a distributed optimization protocol that preserves statistical privacy of agents’ local cost functions against a passive adversary that corrupts some agents in the network. The protocol is a composition of a distributed “ zero-sum ” obfuscation protocol that obfuscates the agents’ local cost functions, and a standard non-private distributed optimization method. We show that our protocol protects the statistical privacy of the agents’ local cost functions against a passive adversary that corrupts up to $t$ arbitrary agents as long as the communication network has ( $t+1$ )-vertex connectivity. The “ zero-sum ” obfuscation protocol preserves the sum of the agents’ local cost functions and therefore ensures accuracy of the computed solution.

3 citations


Proceedings ArticleDOI
25 Oct 2021
TL;DR: In this article, the authors characterize redundancies in agents' cost functions that are necessary and sufficient for provable Byzantine resilience in distributed optimization, and discuss the implications of these results in the context of federated learning.
Abstract: Federated learning has gained significant attention in recent years owing to the development of hardware and rapid growth in data collection. However, its ability to incorporate a large number of participating agents with various data sources makes federated learning susceptible to adversarial agents. This paper summarizes our recent results on server-based Byzantine fault-tolerant distributed optimization with applicability to resilience in federated learning. Specifically, we characterize redundancies in agents' cost functions that are necessary and sufficient for provable Byzantine resilience in distributed optimization. We discuss the implications of these results in the context of federated learning.

1 citations



Posted Content
TL;DR: In this paper, the authors consider the problem of Byzantine fault-tolerance in the P2P distributed gradient-descent method, where a certain number of Byzantine faulty agents may not follow an algorithm correctly, and may share arbitrary incorrect information to prevent other non-faulty agents from solving the optimization problem.
Abstract: We consider the problem of Byzantine fault-tolerance in the peer-to-peer (P2P) distributed gradient-descent method -- a prominent algorithm for distributed optimization in a P2P system In this problem, the system comprises of multiple agents, and each agent has a local cost function In the fault-free case, when all the agents are honest, the P2P distributed gradient-descent method allows all the agents to reach a consensus on a solution that minimizes their aggregate cost However, we consider a scenario where a certain number of agents may be Byzantine faulty Such faulty agents may not follow an algorithm correctly, and may share arbitrary incorrect information to prevent other non-faulty agents from solving the optimization problem In the presence of Byzantine faulty agents, a more reasonable goal is to allow all the non-faulty agents to reach a consensus on a solution that minimizes the aggregate cost of all the non-faulty agents We refer to this fault-tolerance goal as $f$-resilience where $f$ is the maximum number of Byzantine faulty agents in a system of $n$ agents, with $f < n$ Most prior work on fault-tolerance in P2P distributed optimization only consider approximate fault-tolerance wherein, unlike $f$-resilience, all the non-faulty agents' compute a minimum point of a non-uniformly weighted aggregate of their cost functions We propose a fault-tolerance mechanism that confers provable $f$-resilience to the P2P distributed gradient-descent method, provided the non-faulty agents satisfy the necessary condition of $2f$-redundancy, defined later in the paper Moreover, compared to prior work, our algorithm is applicable to a larger class of high-dimensional convex distributed optimization problems

Posted Content
TL;DR: In this paper, the authors consider the problem of Byzantine fault-tolerance in federated machine learning and propose a novel technique named comparative elimination (CE) for federated local SGD.
Abstract: We consider the problem of Byzantine fault-tolerance in federated machine learning. In this problem, the system comprises multiple agents each with local data, and a trusted centralized coordinator. In fault-free setting, the agents collaborate with the coordinator to find a minimizer of the aggregate of their local cost functions defined over their local data. We consider a scenario where some agents ($f$ out of $N$) are Byzantine faulty. Such agents need not follow a prescribed algorithm correctly, and may communicate arbitrary incorrect information to the coordinator. In the presence of Byzantine agents, a more reasonable goal for the non-faulty agents is to find a minimizer of the aggregate cost function of only the non-faulty agents. This particular goal is commonly referred as exact fault-tolerance. Recent work has shown that exact fault-tolerance is achievable if only if the non-faulty agents satisfy the property of $2f$-redundancy. Now, under this property, techniques are known to impart exact fault-tolerance to the distributed implementation of the classical stochastic gradient-descent (SGD) algorithm. However, we do not know of any such techniques for the federated local SGD algorithm - a more commonly used method for federated machine learning. To address this issue, we propose a novel technique named comparative elimination (CE). We show that, under $2f$-redundancy, the federated local SGD algorithm with CE can indeed obtain exact fault-tolerance in the deterministic setting when the non-faulty agents can accurately compute gradients of their local cost functions. In the general stochastic case, when agents can only compute unbiased noisy estimates of their local gradients, our algorithm achieves approximate fault-tolerance with approximation error proportional to the variance of stochastic gradients and the fraction of Byzantine agents.

Posted Content
TL;DR: In this article, the authors consider the setting where communication between nodes is modelled via a directed hypergraph and identify tight network conditions for Byzantine consensus in the presence of Byzantine faulty nodes.
Abstract: Byzantine consensus is a classical problem in distributed computing. Each node in a synchronous system starts with a binary input. The goal is to reach agreement in the presence of Byzantine faulty nodes. We consider the setting where communication between nodes is modelled via a directed hypergraph. In the classical point-to-point communication model, the communication between nodes is modelled as a simple graph where all messages sent on an edge are private between the two endpoints of the edge. This allows a faulty node to equivocate, i.e., lie differently to its different neighbors. Different models have been proposed in the literature that weaken equivocation. In the local broadcast model, every message transmitted by a node is received identically and correctly by all of its neighbors. In the hypergraph model, every message transmitted by a node on a hyperedge is received identically and correctly by all nodes on the hyperedge. Tight network conditions are known for each of the three cases for undirected (hyper)graphs. For the directed models, tight conditions are known for the point-to-point and local broadcast models. In this work, we consider the directed hypergraph model that encompasses all the models above. Each directed hyperedge consists of a single head (sender) and at least one tail (receiver), This models a local multicast channel where messages transmitted by the sender are received identically by all the receivers in the hyperedge. For this model, we identify tight network conditions for consensus. We observe how the directed hypergraph model reduces to each of the three models above under specific conditions. In each case, we relate our network condition to the corresponding known tight conditions. The directed hypergraph model also encompasses other practical network models of interest that have not been explored previously, as elaborated in the paper.

Posted Content
TL;DR: In this paper, the authors consider contention resolution algorithms augmented with predictions about the network and prove lower bounds on the expected time complexity with respect to the Shannon entropy of the corresponding network size random variable, for both the collision detection and no collision detection assumptions.
Abstract: In this paper, we consider contention resolution algorithms that are augmented with predictions about the network. We begin by studying the natural setup in which the algorithm is provided a distribution defined over the possible network sizes that predicts the likelihood of each size occurring. The goal is to leverage the predictive power of this distribution to improve on worst-case time complexity bounds. Using a novel connection between contention resolution and information theory, we prove lower bounds on the expected time complexity with respect to the Shannon entropy of the corresponding network size random variable, for both the collision detection and no collision detection assumptions. We then analyze upper bounds for these settings, assuming now that the distribution provided as input might differ from the actual distribution generating network sizes. We express their performance with respect to both entropy and the statistical divergence between the two distributions -- allowing us to quantify the cost of poor predictions. Finally, we turn our attention to the related perfect advice setting, parameterized with a length $b\geq 0$, in which all active processes in a given execution are provided the best possible $b$ bits of information about their network. We provide tight bounds on the speed-up possible with respect to $b$ for deterministic and randomized algorithms, with and without collision detection. These bounds provide a fundamental limit on the maximum power that can be provided by any predictive model with a bounded output size.

Book ChapterDOI
28 Jun 2021
TL;DR: In this paper, the authors considered the problem of minimizing the communication cost of a multiparty equality protocol under the local broadcast model for the case where the underlying communication graph is undirected.
Abstract: In the multiparty equality problem, each of the n nodes starts with a k-bit input. If there is a mismatch between the inputs, then at least one node must be able to detect it. The cost of a multiparty equality protocol is the total number of bits sent in the protocol. We consider the problem of minimizing this communication cost under the local broadcast model for the case where the underlying communication graph is undirected. In the local broadcast model of communication, a message sent by a node is received identically by all of its neighbors. This is in contrast to the classical point-to-point communication model, where a message sent by a node to one of its neighbors is received only by its intended recipient.