scispace - formally typeset
Search or ask a question

Showing papers on "Communication complexity published in 2020"


Proceedings ArticleDOI
01 Jan 2020
TL;DR: This work proposes an actively secure four-party protocol (4PC), and a framework for PPML, showcasing its applications on four of the most widely-known machine learning algorithms -- Linear Regression, Logisticregression, Neural Networks, and Convolutional Neural Networks.
Abstract: Machine learning has started to be deployed in fields such as healthcare and finance, which involves dealing with a lot of sensitive data. This propelled the need for and growth of privacy-preserving machine learning (PPML). We propose an actively secure four-party protocol (4PC), and a framework for PPML, showcasing its applications on four of the most widely-known machine learning algorithms -- Linear Regression, Logistic Regression, Neural Networks, and Convolutional Neural Networks. Our 4PC protocol tolerating at most one malicious corruption is practically efficient as compared to Gordon et al. (ASIACRYPT 2018) as the 4th party in our protocol is not active in the online phase, except input sharing and output reconstruction stages. Concretely, we reduce the online communication as compared to them by 1 ring element. We use the protocol to build an efficient mixed-world framework (Trident) to switch between the Arithmetic, Boolean, and Garbled worlds. Our framework operates in the offline-online paradigm over rings and is instantiated in an outsourced setting for machine learning, where the data is secretly shared among the servers. Also, we propose conversions especially relevant to privacy-preserving machine learning. With the privilege of having an extra honest party, we outperform the current state-of-the-art ABY3 (for three parties), in terms of both rounds as well as communication complexity. The highlights of our framework include using a minimal number of expensive circuits overall as compared to ABY3. This can be seen in our technique for truncation, which does not affect the online cost of multiplication and removes the need for any circuits in the offline phase. Our B2A conversion has an improvement of 7× in rounds and 18× in the communication complexity.

132 citations


Proceedings ArticleDOI
30 Oct 2020
TL;DR: Using CrypTFlow2, the first secure inference over ImageNet-scale DNNs like ResNet50 and DenseNet121 is presented, at least an order of magnitude larger than those considered in the prior work of 2-party DNN inference.
Abstract: We present CrypTFlow2, a cryptographic framework for secure inference over realistic Deep Neural Networks (DNNs) using secure 2-party computation. CrypTFlow2 protocols are both correct -- i.e., their outputs are bitwise equivalent to the cleartext execution -- and efficient -- they outperform the state-of-the-art protocols in both latency and scale. At the core of CrypTFlow2, we have new 2PC protocols for secure comparison and division, designed carefully to balance round and communication complexity for secure inference tasks. Using CrypTFlow2, we present the first secure inference over ImageNet-scale DNNs like ResNet50 and DenseNet121. These DNNs are at least an order of magnitude larger than those considered in the prior work of 2-party DNN inference. Even on the benchmarks considered by prior work, CrypTFlow2 requires an order of magnitude less communication and 20x-30x less time than the state-of-the-art.

109 citations


Posted Content
TL;DR: In this article, a distributed caching optimization algorithm via belief propagation (BP) for minimizing the downloading latency is proposed, where the authors derive the delay minimization objective function and formulate an optimization problem.
Abstract: Heterogeneous cellular networks (HCN) with embedded small cells are considered, where multiple mobile users wish to download network content of different popularity. By caching data into the small-cell base stations (SBS), we will design distributed caching optimization algorithms via belief propagation (BP) for minimizing the downloading latency. First, we derive the delay-minimization objective function (OF) and formulate an optimization problem. Then we develop a framework for modeling the underlying HCN topology with the aid of a factor graph. Furthermore, distributed BP algorithm is proposed based on the network's factor graph. Next, we prove that a fixed point of convergence exists for our distributed BP algorithm. In order to reduce the complexity of the BP, we propose a heuristic BP algorithm. Furthermore, we evaluate the average downloading performance of our HCN for different numbers and locations of the base stations (BS) and mobile users (MU), with the aid of stochastic geometry theory. By modeling the nodes distributions using a Poisson point process, we develop the expressions of the average factor graph degree distribution, as well as an upper bound of the outage probability for random caching schemes. We also improve the performance of random caching. Our simulations show that (1) the proposed distributed BP algorithm has a near-optimal delay performance, approaching that of the high-complexity exhaustive search method, (2) the modified BP offers a good delay performance at a low communication complexity, (3) both the average degree distribution and the outage upper bound analysis relying on stochastic geometry match well with our Monte-Carlo simulations, and (4) the optimization based on the upper bound provides both a better outage and a better delay performance than the benchmarks.

98 citations


Proceedings ArticleDOI
TL;DR: In this article, the authors present CrypTFlow2, a cryptographic framework for secure inference over realistic deep neural networks (DNNs) using secure 2-party computation.
Abstract: We present CrypTFlow2, a cryptographic framework for secure inference over realistic Deep Neural Networks (DNNs) using secure 2-party computation. CrypTFlow2 protocols are both correct -- i.e., their outputs are bitwise equivalent to the cleartext execution -- and efficient -- they outperform the state-of-the-art protocols in both latency and scale. At the core of CrypTFlow2, we have new 2PC protocols for secure comparison and division, designed carefully to balance round and communication complexity for secure inference tasks. Using CrypTFlow2, we present the first secure inference over ImageNet-scale DNNs like ResNet50 and DenseNet121. These DNNs are at least an order of magnitude larger than those considered in the prior work of 2-party DNN inference. Even on the benchmarks considered by prior work, CrypTFlow2 requires an order of magnitude less communication and 20x-30x less time than the state-of-the-art.

96 citations


Posted Content
TL;DR: This paper first explicitly characterize the behavior of the FedAvg algorithm, and shows that without strong and unrealistic assumptions on the problem structure, the algorithm can behave erratically for non-convex problems (e.g., diverge to infinity).
Abstract: Federated Learning (FL) has become a popular paradigm for learning from distributed data. To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model, in which multiple local updates are performed using local data, before sending the local models to the cloud for aggregation. However, these schemes typically require strong assumptions, such as the local data are identically independent distributed (i.i.d), or the size of the local gradients are bounded. In this paper, we first explicitly characterize the behavior of the FedAvg algorithm, and show that without strong and unrealistic assumptions on the problem structure, the algorithm can behave erratically for non-convex problems (e.g., diverge to infinity). Aiming at designing FL algorithms that are provably fast and require as few assumptions as possible, we propose a new algorithm design strategy from the primal-dual optimization perspective. Our strategy yields a family of algorithms that take the same CTA model as existing algorithms, but they can deal with the non-convex objective, achieve the best possible optimization and communication complexity while being able to deal with both the full batch and mini-batch local computation models. Most importantly, the proposed algorithms are {\it communication efficient}, in the sense that the communication pattern can be adaptive to the level of heterogeneity among the local data. To the best of our knowledge, this is the first algorithmic framework for FL that achieves all the above properties.

85 citations


Posted Content
TL;DR: This work establishes the first lower bounds for this formulation of personalized federated learning, for both the communication complexity and the local oracle complexity, and designs several optimal methods matching these lower bounds in almost all regimes.
Abstract: In this work, we consider the optimization formulation of personalized federated learning recently introduced by Hanzely and Richtarik (2020) which was shown to give an alternative explanation to the workings of local {\tt SGD} methods. Our first contribution is establishing the first lower bounds for this formulation, for both the communication complexity and the local oracle complexity. Our second contribution is the design of several optimal methods matching these lower bounds in almost all regimes. These are the first provably optimal methods for personalized federated learning. Our optimal methods include an accelerated variant of {\tt FedProx}, and an accelerated variance-reduced version of {\tt FedAvg}/Local {\tt SGD}. We demonstrate the practical superiority of our methods through extensive numerical experiments.

84 citations


Book
20 Feb 2020
TL;DR: Communication complexity is the mathematical study of scenarios where several parties need to communicate to achieve a common goal as mentioned in this paper, a situation that naturally appears during computation, and it is defined as the study of situations where multiple parties must communicate in order to achieve the same goal.
Abstract: Communication complexity is the mathematical study of scenarios where several parties need to communicate to achieve a common goal, a situation that naturally appears during computation. This introduction presents the most recent developments in an accessible form, providing the language to unify several disjoint research subareas. Written as a guide for a graduate course on communication complexity, it will interest a broad audience in computer science, from advanced undergraduates to researchers in areas ranging from theory to algorithm design to distributed computing. The first part presents basic theory in a clear and illustrative way, offering beginners an entry into the field. The second part describes applications including circuit complexity, proof complexity, streaming algorithms, extension complexity of polytopes, and distributed computing. Proofs throughout the text use ideas from a wide range of mathematics, including geometry, algebra, and probability. Each chapter contains numerous examples, figures, and exercises to aid understanding.

70 citations


Posted Content
TL;DR: It is shown FedGAN converges and has similar performance to general distributed GAN, while reduces communication complexity, and its robustness to reduced communications is also shown.
Abstract: We propose Federated Generative Adversarial Network (FedGAN) for training a GAN across distributed sources of non-independent-and-identically-distributed data sources subject to communication and privacy constraints. Our algorithm uses local generators and discriminators which are periodically synced via an intermediary that averages and broadcasts the generator and discriminator parameters. We theoretically prove the convergence of FedGAN with both equal and two time-scale updates of generator and discriminator, under standard assumptions, using stochastic approximations and communication efficient stochastic gradient descents. We experiment FedGAN on toy examples (2D system, mixed Gaussian, and Swiss role), image datasets (MNIST, CIFAR-10, and CelebA), and time series datasets (household electricity consumption and electric vehicle charging sessions). We show FedGAN converges and has similar performance to general distributed GAN, while reduces communication complexity. We also show its robustness to reduced communications.

69 citations


Posted Content
TL;DR: A comprehensive survey of the communication-efficient distributed training algorithms in both system-level and algorithmic-level optimizations is provided, which provides the readers to understand what algorithms are more efficient under specific distributed environments and extrapolate potential directions for further optimizations.
Abstract: Distributed deep learning becomes very common to reduce the overall training time by exploiting multiple computing devices (e.g., GPUs/TPUs) as the size of deep models and data sets increases. However, data communication between computing devices could be a potential bottleneck to limit the system scalability. How to address the communication problem in distributed deep learning is becoming a hot research topic recently. In this paper, we provide a comprehensive survey of the communication-efficient distributed training algorithms in both system-level and algorithmic-level optimizations. In the system-level, we demystify the system design and implementation to reduce the communication cost. In algorithmic-level, we compare different algorithms with theoretical convergence bounds and communication complexity. Specifically, we first propose the taxonomy of data-parallel distributed training algorithms, which contains four main dimensions: communication synchronization, system architectures, compression techniques, and parallelism of communication and computing. Then we discuss the studies in addressing the problems of the four dimensions to compare the communication cost. We further compare the convergence rates of different algorithms, which enable us to know how fast the algorithms can converge to the solution in terms of iterations. According to the system-level communication cost analysis and theoretical convergence speed comparison, we provide the readers to understand what algorithms are more efficient under specific distributed environments and extrapolate potential directions for further optimizations.

62 citations


Proceedings Article
12 Jul 2020
TL;DR: This work proposes an algorithm named D-GET (decentralized gradient estimation and tracking), which jointly performs decentralized gradient estimation (which estimates the local gradient using a subset of local samples) and gradient tracking (which tracks the global full gradient using local estimates).
Abstract: Many modern large-scale machine learning problems benefit from decentralized and stochastic optimization. Recent works have shown that utilizing both decentralized computing and local stochastic gradient estimates can outperform stateof-the-art centralized algorithms, in applications involving highly non-convex problems, such as training deep neural networks. In this work, we propose a decentralized stochastic algorithm to deal with certain smooth non-convex problems where there are m nodes in the system, and each node has a large number of samples (denoted as n). Differently from the majority of the existing decentralized learning algorithms for either stochastic or finite-sum problems, our focus is given to both reducing the total communication rounds among the nodes, while accessing the minimum number of local data samples. In particular, we propose an algorithm named D-GET (decentralized gradient estimation and tracking), which jointly performs decentralized gradient estimation (which estimates the local gradient using a subset of local samples) and gradient tracking (which tracks the global full gradient using local estimates). We show that, to achieve certain ✏ stationary solution of the deterministic finite sum problem, the proposed algorithm achieves an O(mn1/2✏ 1) sample complexity and an O(✏ 1) communication complexity. These bounds significantly improve upon the best existing bounds of O(mn✏ 1) and O(✏ 1), respectively. Similarly, for online problems, the proposed method achieves an O(m✏ 3/2) sample complexity and an O(✏ 1) communication complexity. Department of ECE, University of Minnesota Twin Cities, Minneapolis, MN USA IBM Research AI, IBM Thomas J. Watson Research Center, Yorktown Heights, NY USA. Correspondence to: Haoran Sun , Songtao Lu , Mingyi Hong . Proceedings of the 37 th International Conference on Machine Learning, Online, PMLR 119, 2020. Copyright 2020 by the author(s).

52 citations


Journal ArticleDOI
TL;DR: This work proposes a random walk algorithm that uses a fixed step size and converges faster to the solution than the existing random walk incremental algorithms.
Abstract: This paper introduces a new algorithm for consensus optimization in a multi-agent network, where all agents collaboratively find a minimizer for the sum of their private functions. All decentralized algorithms rely on communications between adjacent nodes. One class of algorithms use communications between some or all pairs of adjacent agents at each iteration. Another class of algorithms uses a random walk incremental strategy, which sequentially activates a succession of agents. Existing incremental algorithms require diminishing step sizes to converge to the solution, and their convergence is slow. In this work, we propose a random walk algorithm that uses a fixed step size and converges faster to the solution than the existing random walk incremental algorithms. Our algorithm uses only one link to communicate the latest information from an agent to another. Since this style of communication mimics a man walking in a network, we call our algorithm Walkman . We establish convergence for convex and nonconvex objectives. For decentralized least squares, we derive a linear rate of convergence and obtain a better communication complexity than those of other decentralized algorithms. Numerical experiments verify our analysis results.

Journal ArticleDOI
TL;DR: A novel multihop average consensus time synchronization (MACTS) is developed with innovative implementation that achieves hundreds of times the MACTS convergence rate compared to average TimeSync (ATS).
Abstract: Average consensus theory is intensely popular for building time synchronization in wireless sensor network (WSN). However, the average consensus-based time synchronization algorithm is based on the iteration that poses challenges for efficiency, as they entail high communication cost and long convergence time in large-scale WSN. Based on the suggestion that the greater the algebraic connectivity the faster the convergence, a novel multihop average consensus time synchronization (MACTS) is developed with innovative implementation in this article. By employing multihop communication model, it shows that virtual communication links among multihop nodes are generated and algebraic connectivity of the network increases. Meanwhile, a multihop controller is developed to balance the convergence time, accuracy, and communication complexity. Moreover, the accurate relative clock offset estimation is yielded by delay compensation. Implementing the MACTS based on the popular one-way broadcast model and taking multihop over short distances, we achieve hundreds of times the MACTS convergence rate compared to average TimeSync (ATS).

Proceedings ArticleDOI
18 May 2020
TL;DR: Techniques are presented that help scale threshold signature schemes, verifiable secret sharing and distributed key generation protocols to hundreds of thousands of participants and beyond and generalize to any Lagrange-based threshold scheme, not just threshold signatures.
Abstract: The resurging interest in Byzantine fault tolerant systems will demand more scalable threshold cryptosystems. Unfortunately, current systems scale poorly, requiring time quadratic in the number of participants. In this paper, we present techniques that help scale threshold signature schemes (TSS), verifiable secret sharing (VSS) and distributed key generation (DKG) protocols to hundreds of thousands of participants and beyond. First, we use efficient algorithms for evaluating polynomials at multiple points to speed up computing Lagrange coefficients when aggregating threshold signatures. As a result, we can aggregate a 130,000 out of 260,000 BLS threshold signature in just 6 seconds (down from 30 minutes). Second, we show how "authenticating" such multipoint evaluations can speed up proving polynomial evaluations, a key step in communication-efficient VSS and DKG protocols. As a result, we reduce the asymptotic (and concrete) computational complexity of VSS and DKG protocols from quadratic time to quasilinear time, at a small increase in communication complexity. For example, using our DKG protocol, we can securely generate a key for the BLS scheme above in 2.3 hours (down from 8 days). Our techniques improve performance for thresholds as small as 255 and generalize to any Lagrange-based threshold scheme, not just threshold signatures. Our work has certain limitations: we require a trusted setup, we focus on synchronous VSS and DKG protocols and we do not address the worst-case complaint overhead in DKGs. Nonetheless, we hope it will spark new interest in designing large-scale distributed systems.

Posted Content
TL;DR: A novel algorithm is proposed that can achieve near optimal communication complexity, matching the known lower bound up to a logarithmic factor of the condition number of the problem.
Abstract: This paper considers the decentralized optimization problem, which has applications in large scale machine learning, sensor networks, and control theory. We propose a novel algorithm that can achieve near optimal communication complexity, matching the known lower bound up to a logarithmic factor of the condition number of the problem. Our theoretical results give affirmative answers to the open problem on whether there exists an algorithm that can achieve a communication complexity (nearly) matching the lower bound depending on the global condition number instead of the local one. Moreover, the proposed algorithm achieves the optimal computation complexity matching the lower bound up to universal constants. Furthermore, to achieve a linear convergence rate, our algorithm \emph{doesn't} require the individual functions to be (strongly) convex. Our method relies on a novel combination of known techniques including Nesterov's accelerated gradient descent, multi-consensus and gradient-tracking. The analysis is new, and may be applied to other related problems. Empirical studies demonstrate the effectiveness of our method for machine learning applications.

Journal ArticleDOI
TL;DR: A bit query (BQ) strategy based M-ary query tree protocol ( BQMT) is presented, which can not only eliminate idle queries but also separate collided tags into many small subsets and make full use of the collided bits.
Abstract: The tag collision avoidance has been viewed as one of the most important research problems in RFID communications and bit tracking technology has been widely embedded in query tree (QT) based algorithms to tackle such challenge. Existing solutions show further opportunity to greatly improve the reading performance because collision queries and empty queries are not fully explored. In this paper, a bit query (BQ) strategy based M-ary query tree protocol (BQMT) is presented, which can not only eliminate idle queries but also separate collided tags into many small subsets and make full use of the collided bits. To further optimize the reading performance, a modified dual prefixes matching (MDPM) mechanism is presented to allow multiple tags to respond in the same slot and thus significantly reduce the number of queries. Theoretical analysis and simulations are supplemented to validate the effectiveness of the proposed BQMT and MDPM, which outperform the existing QT-based algorithms. Also, the BQMT and MDPM can be combined to BQ-MDPM to improve the reading performance in system efficiency, total identification time, communication complexity and average energy cost.

Proceedings ArticleDOI
22 Jun 2020
TL;DR: In this paper, the authors consider the problem of maximizing a monotone submodular function subject to a cardinality constraint, and show that the possibility of querying infeasible sets can actually be exploited to beat this bound, by presenting a tight 2/3-approximation taking exponential time.
Abstract: We consider the classical problem of maximizing a monotone submodular function subject to a cardinality constraint, which, due to its numerous applications, has recently been studied in various computational models. We consider a clean multi-player model that lies between the offline and streaming model, and study it under the aspect of one-way communication complexity. Our model captures the streaming setting (by considering a large number of players), and, in addition, two player approximation results for it translate into the robust setting. We present tight one-way communication complexity results for our model, which, due to the above-mentioned connections, have multiple implications in the data stream and robust setting. Even for just two players, a prior information-theoretic hardness result implies that no approximation factor above 1/2 can be achieved in our model, if only queries to feasible sets, i.e., sets respecting the cardinality constraint, are allowed. We show that the possibility of querying infeasible sets can actually be exploited to beat this bound, by presenting a tight 2/3-approximation taking exponential time, and an efficient 0.514-approximation. To the best of our knowledge, this is the first example where querying a submodular function on infeasible sets leads to provably better results. Through the above-mentioned link to the robust setting, both of these algorithms improve on the current state-of-the-art for robust submodular maximization, showing that approximation factors beyond 1/2 are possible. Moreover, exploiting the link of our model to streaming, we settle the approximability for streaming algorithms by presenting a tight 1/2+e hardness result, based on the construction of a new family of coverage functions. This improves on a prior 1−1/e+e hardness and matches, up to an arbitrarily small margin, the best known approximation algorithm.

Book ChapterDOI
17 Aug 2020
TL;DR: In this article, the authors studied the communication complexity of unconditionally secure MPC with guaranteed output delivery over point-to-point channels for corruption threshold (t < n/2), assuming the existence of a public broadcast channel.
Abstract: We study the communication complexity of unconditionally secure MPC with guaranteed output delivery over point-to-point channels for corruption threshold \(t < n/2\), assuming the existence of a public broadcast channel. We ask the question: “is it possible to construct MPC in this setting s.t. the communication complexity per multiplication gate is linear in the number of parties?” While a number of works have focused on reducing the communication complexity in this setting, the answer to the above question has remained elusive until now. We also focus on the concrete communication complexity of evaluating each multiplication gate.

Proceedings ArticleDOI
18 May 2020
TL;DR: This work improves upon previous random beacon approaches with HydRand, a novel distributed protocol based on publicly-verifiable secret sharing (PVSS) to ensure unpredictability, bias-resistance, and public-verifiability of a continuous sequence of random beacon values.
Abstract: A reliable source of randomness is not only an essential building block in various cryptographic, security, and distributed systems protocols, but also plays an integral part in the design of many new blockchain proposals. Consequently, the topic of publicly-verifiable, bias-resistant and unpredictable randomness has recently enjoyed increased attention. In particular random beacon protocols, aimed at continuous operation, can be a vital component for current Proof-of-Stake based distributed ledger proposals. We improve upon previous random beacon approaches with HydRand, a novel distributed protocol based on publicly-verifiable secret sharing (PVSS) to ensure unpredictability, bias-resistance, and public-verifiability of a continuous sequence of random beacon values. Furthermore, HydRand provides guaranteed output delivery of randomness at regular and predictable intervals in the presence of adversarial behavior and does not rely on a trusted dealer for the initial setup. Compared to existing PVSS based approaches that strive to achieve similar properties, our solution improves scalability by lowering the communication complexity from $\mathcal{O}\left( {{n^3}} \right)$ to $\mathcal{O}\left( {{n^2}} \right)$ . Furthermore, we are the first to present a detailed comparison of recently described schemes and protocols that can be used for implementing random beacons.

Posted Content
TL;DR: IBFT is a simple and elegant Byzantine fault-tolerant consensus algorithm that is used to implement state machine replication in the Quorum blockchain and has O(n^2)$ total communication complexity.
Abstract: This paper presents IBFT, a simple and elegant Byzantine fault-tolerant consensus algorithm that is used to implement state machine replication in the \emph{Quorum} blockchain. IBFT assumes a partially synchronous communication model, where safety does not depend on any timing assumptions and only liveness depends on periods of synchrony. The algorithm is deterministic, leader-based, and optimally resilient - tolerating $f$ faulty processes out of $n$, where $n \geq 3f+1$. During periods of good communication, IBFT achieves termination in three message delays and has $O(n^2)$ total communication complexity.

Proceedings ArticleDOI
13 Jul 2020
TL;DR: A novel framework for the winner selection problem in voting, in which a voting rule is seen as a combination of an elicitation rule and an aggregation rule, is study, which shows that the best communication complexity is ~Θ (m/(kd)) when the rule uses deterministic elicitation and ~δ (m/d3) when therule uses randomized elicitation.
Abstract: In recent work, Mandal et al. [2019] study a novel framework for the winner selection problem in voting, in which a voting rule is seen as a combination of an elicitation rule and an aggregation rule. The elicitation rule asks voters to respond to a query based on their preferences over a set of alternatives, and the aggregation rule aggregates voter responses to return a winning alternative. They study the tradeoff between the communication complexity of a voting rule, which measures the number of bits of information each voter must send in response to its query, and its distortion, which measures the quality of the winning alternative in terms of utilitarian social welfare. They prove upper and lower bounds on the communication complexity required to achieve a desired level of distortion, but their bounds are not tight. Importantly, they also leave open the question whether the best randomized rule can significantly outperform the best deterministic rule. We settle this question in the affirmative. For a winner selection rule to achieve distortion d with m alternatives, we show that the communication complexity required is ~Θ (m/d) when using deterministic elicitation, and ~Θ (m/d3) when using randomized elicitation; both bounds are tight up to logarithmic factors. Our upper bound leverages recent advances in streaming algorithms. To establish our lower bound, we derive a new lower bound on a multi-party communication complexity problem. We then study the k-selection problem in voting, where the goal is to select a set of k alternatives. For a k-selection rule that achieves distortion d with m alternatives, we show that the best communication complexity is ~Θ (m/(kd)) when the rule uses deterministic elicitation and ~Θ (m/(kd3)) when the rule uses randomized elicitation. Our optimal bounds yield the non-trivial implication that the k-selection problem becomes strictly easier as k increases.

Proceedings Article
05 Oct 2020
TL;DR: In this paper, the authors considered the optimization formulation of personalized federated learning and established the first lower bounds for both the communication complexity and the local oracle complexity, and designed several optimal methods matching these lower bounds in almost all regimes.
Abstract: In this work, we consider the optimization formulation of personalized federated learning recently introduced by Hanzely and Richtarik (2020) which was shown to give an alternative explanation to the workings of local {\tt SGD} methods. Our first contribution is establishing the first lower bounds for this formulation, for both the communication complexity and the local oracle complexity. Our second contribution is the design of several optimal methods matching these lower bounds in almost all regimes. These are the first provably optimal methods for personalized federated learning. Our optimal methods include an accelerated variant of {\tt FedProx}, and an accelerated variance-reduced version of {\tt FedAvg}/Local {\tt SGD}. We demonstrate the practical superiority of our methods through extensive numerical experiments.

Posted Content
TL;DR: In this paper, the authors present an algorithm that achieves round synchronization with expected linear message complexity and expected constant latency, which is the first time for Byzantine state machine replication protocols with expected latency.
Abstract: State Machine Replication (SMR) solutions often divide time into rounds, with a designated leader driving decisions in each round. Progress is guaranteed once all correct processes synchronize to the same round, and the leader of that round is correct. Recently suggested Byzantine SMR solutions such as HotStuff, Tendermint, and LibraBFT achieve progress with a linear message complexity and a constant time complexity once such round synchronization occurs. But round synchronization itself incurs an additional cost. By Dolev and Reischuk's lower bound, any deterministic solution must have $\Omega(n^2)$ communication complexity. Yet the question of randomized round synchronization with an expected linear message complexity remained open. We present an algorithm that, for the first time, achieves round synchronization with expected linear message complexity and expected constant latency. Existing protocols can use our round synchronization algorithm to solve Byzantine SMR with the same asymptotic performance.

Posted Content
TL;DR: This paper proposes a construction which can transform any contractive compressor into an induced unbiased compressor, and shows that this approach leads to vast improvements over EF, including reduced memory requirements, better communication complexity guarantees and fewer assumptions.
Abstract: Modern large-scale machine learning applications require stochastic optimization algorithms to be implemented on distributed compute systems. A key bottleneck of such systems is the communication overhead for exchanging information across the workers, such as stochastic gradients. Among the many techniques proposed to remedy this issue, one of the most successful is the framework of compressed communication with error feedback (EF). EF remains the only known technique that can deal with the error induced by contractive compressors which are not unbiased, such as Top-$K$. In this paper, we propose a new and theoretically and practically better alternative to EF for dealing with contractive compressors. In particular, we propose a construction which can transform any contractive compressor into an induced unbiased compressor. Following this transformation, existing methods able to work with unbiased compressors can be applied. We show that our approach leads to vast improvements over EF, including reduced memory requirements, better communication complexity guarantees and fewer assumptions. We further extend our results to federated learning with partial participation following an arbitrary distribution over the nodes, and demonstrate the benefits thereof. We perform several numerical experiments which validate our theoretical findings.

Posted Content
TL;DR: This paper presents new protocols with improved communication complexity in almost all settings of BB and BA, i.e., protocols that solve BB/BA with long inputs of $l$ bits using lower costs than single-bit instances.
Abstract: Byzantine broadcast (BB) and Byzantine agreement (BA) are two most fundamental problems and essential building blocks in distributed computing, and improving their efficiency is of interest to both theoreticians and practitioners. In this paper, we study extension protocols of BB and BA, i.e., protocols that solve BB/BA with long inputs of $l$ bits using lower costs than $l$ single-bit instances. We present new protocols with improved communication complexity in almost all settings: authenticated BA/BB with $t

Book ChapterDOI
16 Nov 2020
TL;DR: This work shows asynchronous BA protocols with (expected) subquadratic communication complexity tolerating an adaptive adversary who can corrupt f ≤ (1− )n/3 of the parties (for any > 0) and shows a secure-computation protocol in the same threat model that has o(n) communication when computing no-input functionalities with short output.
Abstract: Understanding the communication complexity of Byzantine agreement (BA) is a fundamental problem in distributed computing. In particular, for protocols involving a large number of parties (as in, e.g., the context of blockchain protocols), it is important to understand the dependence of the communication on the number of parties n. Although adaptively secure BA protocols with \(o(n^2)\) communication are known in the synchronous and partially synchronous settings, no such protocols are known in the fully asynchronous case.

Proceedings Article
05 Jan 2020
TL;DR: The communication complexity of optimization tasks which generalize linear systems is considered, showing improved upper or lower bounds for every value of $p \ge 1$ and sampling and sketching techniques are neither optimal in the dependence on $d$ nor on the dependent on the approximation $\epsilon, thus motivating new techniques from optimization to solve these problems.
Abstract: We consider the communication complexity of a number of distributed optimization problems. We start with the problem of solving a linear system. Suppose there is a coordinator together with s servers P1, ..., Ps, the i-th of which holds a subset A(i)x = b (i) of ni constraints of a linear system in d variables, and the coordinator would like to output an x ∈ Rd for which A(i)x = b (i) for i = 1, ..., s. We assume each coefficient of each constraint is specified using L bits. We first resolve the randomized and deterministic communication complexity in the point-to-point model of communication, showing it is [MATH HERE] and [MATH HERE], respectively. We obtain similar results for the blackboard communication model. As a result of independent interest, we show the probability a random matrix with integer entries in {−2L, ..., 2L} is invertible is 1 − 2−Θ(dL), whereas previously only 1 − 2−Θ(d) was known. When there is no solution to the linear system, a natural alternative is to find the solution minimizing the ep loss, which is the ep regression problem. While this problem has been studied, we give improved upper or lower bounds for every value of p ≥ 1. One takeaway message is that sampling and sketching techniques, which are commonly used in earlier work on distributed optimization, are neither optimal in the dependence on d nor on the dependence on the approximation e, thus motivating new techniques from optimization to solve these problems. Towards this end, we consider the communication complexity of optimization tasks which generalize linear systems, such as linear, semidefinite, and convex programming. For linear programming, we first resolve the communication complexity when d is constant, showing it is [MATH HERE] in the point-to-point model. For general d and in the point-to-point model, we show an O(sd3 L) upper bound and an [MATH HERE] lower bound. In fact, we show if one perturbs the coefficients randomly by numbers as small as 2−Θ(L), then the upper bound is O(sd2 L) +poly(dL), and so this bound holds for almost all linear programs. Our study motivates understanding the bit complexity of linear programming, which is related to the running time in the unit cost RAM model with words of O(log(nd)) bits, and we give the fastest known algorithms for linear programming in this model.

Proceedings ArticleDOI
01 Jan 2020
TL;DR: This paper focuses on the specific case of actively secure three-party computation with an honest majority and is interested in solutions which allow to evaluate arithmetic circuits over real-world CPU word sizes, like 32- and 64-bit words.
Abstract: Secure multiparty computation (MPC) allows a set of mutually distrustful parties to compute a public function on their private inputs without revealing anything beyond the output of the computation. This paper focuses on the specific case of actively secure three-party computation with an honest majority. In particular, we are interested in solutions which allow to evaluate arithmetic circuits over real-world CPU word sizes, like 32- and 64-bit words. Our starting point is the novel compiler of Damgard et al. from CRYPTO 2018. First, we present an improved version of it which reduces the online communication complexity by a factor of 2. Next, we replace their preprocessing protocol (with arithmetic modulo a large prime) with a more efficient preprocessing which only performs arithmetic modulo powers of two. Finally, we present a novel "postprocessing" check which replaces the preprocessing phase. These protocols offer different efficiency tradeoffs and can therefore outperform each other in different deployment settings. We demonstrate this with benchmarks in a LAN and different WAN settings. Concretely, we achieve a throughput of 1 million 64-bit multiplications per second with parties located in different continents and 3 million in one location.

Journal ArticleDOI
TL;DR: Numerical tests on the IEEE 123-bus network not only corroborate that the proposed algorithms are more efficient in eliminating voltage violations and minimizing network loss compared with two benchmarks but also validate their effectiveness for online implementations.
Abstract: The unbalanced nature of the distribution networks (DNs) and communication asynchrony pose considerable challenges to the distributed voltage regulation. In this paper, two distributed voltage control algorithms are proposed to overcome these challenges in multiphase unbalanced DNs. The proposed algorithms can be leveraged in online implementations to cope with the fast-varying system operating conditions. By adopting the linearized multiphase DistFlow model, the voltage control problem is formulated as a convex quadratic programming problem for which a synchronous distributed algorithm is developed based on the dual ascent method. To account for communication delays, an asynchronous distributed algorithm is proposed evolving from the synchronous one by incorporating an event-triggered communication protocol. Furthermore, closed-form solutions to the optimization subproblems are derived to enhance the computational efficiency, and communication complexity is reduced significantly to the extent that only neighborhood information exchange is required. Finally, the convergence of the proposed algorithms to the global optimality is established analytically. Numerical tests on the IEEE 123-bus network not only corroborate that our proposed algorithms are more efficient in eliminating voltage violations and minimizing network loss compared with two benchmarks but also validate their effectiveness for online implementations.

Posted Content
TL;DR: This work aims to reduce communication time of two types of distributed deep learning architectures, centralized and decentralized, by improving the model generalization capability of deep neural network models.
Abstract: Distributed learning techniques such as federated learning have enabled multiple workers to train machine learning models together to reduce the overall training time. However, current distributed training algorithms (centralized or decentralized) suffer from the communication bottleneck on multiple low-bandwidth workers (also on the server under the centralized architecture). Although decentralized algorithms generally have lower communication complexity than the centralized counterpart, they still suffer from the communication bottleneck for workers with low network bandwidth. To deal with the communication problem while being able to preserve the convergence performance, we introduce a novel decentralized training algorithm with the following key features: 1) It does not require a parameter server to maintain the model during training, which avoids the communication pressure on any single peer. 2) Each worker only needs to communicate with a single peer at each communication round with a highly compressed model, which can significantly reduce the communication traffic on the worker. We theoretically prove that our sparsification algorithm still preserves convergence properties. 3) Each worker dynamically selects its peer at different communication rounds to better utilize the bandwidth resources. We conduct experiments with convolutional neural networks on 32 workers to verify the effectiveness of our proposed algorithm compared to seven existing methods. Experimental results show that our algorithm significantly reduces the communication traffic and generally select relatively high bandwidth peers.

Proceedings ArticleDOI
01 Aug 2020
TL;DR: An extensive empirical study is performed to understand how hashgraph’s structure affects performance, and it is observed that hashgraph can improve latency by an order of magnitude over HoneyBadgerBFT and BEAT, while keeping throughput constant with the same number of nodes.
Abstract: Atomic broadcast protocols are increasingly used to build distributed ledgers. The most robust protocols achieve byzantine fault tolerance (BFT) and operate in asynchronous networks. Recent proposals such as HoneyBadgerBFT (ACM CCS ‘16) and BEAT (ACM CCS ‘18) achieve optimal communication complexity, growing linearly as a function of the number of nodes present. Although asymptotically optimal, their practical performance precludes their use in demanding applications. Further performance improvements to HoneyBadgerBFT and BEAT are not obvious as they run two separate sub-protocols for broadcast and voting, each of which has already been optimized. We describe how hashgraph — an asynchronous BFT atomic broadcast protocol (ABFT) — departs in structure from prior work by not using communication to vote, only to broadcast transactions. We perform an extensive empirical study to understand how hashgraph’s structure affects performance. We observe that hashgraph can improve latency by an order of magnitude over HoneyBadgerBFT and BEAT, while keeping throughput constant with the same number of nodes; similarly, throughput can increase by up to an order of magnitude while maintaining latency. Furthermore, we test hashgraph’s capability for high performance, and conclude that it can achieve sufficiently high throughput and low latency to support demanding practical applications.