scispace - formally typeset
Search or ask a question

Showing papers presented at "International Conference on Distributed Computing in 2017"


Proceedings ArticleDOI
13 May 2017
TL;DR: It is proved that any o(\log n)-round randomized distributed algorithm for any LCL problem on bounded degree graphs can be automatically sped up to run in 2^{O(\sqrt{\log\log n})}$ rounds.
Abstract: Locally Checkable Labeling (LCL) problems include essentially all the classic problems of LOCAL distributed algorithms. In a recent enlightening revelation, Chang and Pettie [FOCS'17] showed that any LCL (on bounded degree graphs) that has an o(log n)-round randomized algorithm can be solved in T_(LLL)(n) rounds, which is the randomized complexity of solving (a relaxed variant of) the Lovasz Local Lemma (LLL) on bounded degree n-node graphs. Currently, the best known upper bound on T_(LLL)(n) is O(log n), by Chung, Pettie, and Su [PODC'14], while the best known lower bound is Omega(log log n), by Brandt et al. [STOC'16]. Chang and Pettie conjectured that there should be an O(log log n)-round algorithm (on bounded degree graphs). Making the first step of progress towards this conjecture, and providing a significant improvement on the algorithm of Chung et al. [PODC'14], we prove that T_(LLL)(n)= 2^O(sqrt(log log n)). Thus, any o(log n)-round randomized distributed algorithm for any LCL problem on bounded degree graphs can be automatically sped up to run in 2^O(sqrt(log log n)) rounds. Using this improvement and a number of other ideas, we also improve the complexity of a number of graph coloring problems (in arbitrary degree graphs) from the O(log n)-round results of Chung, Pettie and Su [PODC'14] to 2^O(sqrt(log log n)). These problems include defective coloring, frugal coloring, and list vertex-coloring.

67 citations


Proceedings ArticleDOI
Manuela Fischer1
02 Mar 2017
TL;DR: Improved deterministic distributed algorithms are presented for a number of well-studied matching problems, which are simpler, faster, more accurate, and/or more general than their known counterparts.
Abstract: We present improved deterministic distributed algorithms for a number of well-studied matching problems, which are simpler, faster, more accurate, and/or more general than their known counterparts. The common denominator of these results is a deterministic distributed rounding method for certain linear programs, which is the first such rounding method, to our knowledge. A sampling of our end results is as follows. - An O(log^2 Delta log n)-round deterministic distributed algorithm for computing a maximal matching, in n-node graphs with maximum degree Delta. This is the first improvement in about 20 years over the celebrated O(log^4 n)-round algorithm of Hanckowiak, Karonski, and Panconesi [SODA'98, PODC'99]. - A deterministic distributed algorithm for computing a (2+epsilon)-approximation of maximum matching in O(log^2 Delta log(1/epsilon) + log^* n) rounds. This is exponentially faster than the classic O(Delta + log^* n)-round 2-approximation of Panconesi and Rizzi [DIST'01]. With some modifications, the algorithm can also find an epsilon-maximal matching which leaves only an epsilon-fraction of the edges on unmatched nodes. - An O(log^2 Delta log(1/epsilon) + log^* n)-round deterministic distributed algorithm for computing a (2+epsilon)-approximation of a maximum weighted matching, and also for the more general problem of maximum weighted b-matching. These improve over the O(log^4 n log_(1+epsilon) W)-round (6+epsilon)-approximation algorithm of Panconesi and Sozio [DIST'10], where W denotes the maximum normalized weight. - A deterministic local computation algorithm for a (2+epsilon)-approximation of maximum matching with 2^O(log^2 Delta) log^* n queries. This improves almost exponentially over the previous deterministic constant approximations which have query-complexity of 2^Omega(Delta log Delta) log^* n.

54 citations


Proceedings Article
01 Jan 2017
TL;DR: This paper provides tools for derandomizing solutions to local problems, when the n nodes can only send O(\log n)-bit messages in each round of communication, and investigates the curious gap between randomized and deterministic solutions under bandwidth restrictions.
Abstract: This paper addresses the cornerstone family of local problems in distributed computing, and investigates the curious gap between randomized and deterministic solutions under bandwidth restrictions. Our main contribution is in providing tools for derandomizing solutions to local problems, when the n nodes can only send $$O(\log n)$$-bit messages in each round of communication. Our framework mostly follows by the derandomization approach of Luby (J Comput Syst Sci 47(2):250–286, 1993) combined with the power of all to all communication. Our key results are as follows: first, we show that in the congested clique model, which allows all-to-all communication, there is a deterministic maximal independent set algorithm that runs in $$O(\log ^2 {\varDelta })$$ rounds, where $${\varDelta }$$ is the maximum degree. When $${\varDelta }=O(n^{1/3})$$, the bound improves to $$O(\log {\varDelta })$$. In addition, we deterministically construct a $$(2k-1)$$-spanner with $$O(kn^{1+1/k}\log n)$$ edges in $$O(k \log n)$$ rounds in the congested clique model.

46 citations


Proceedings ArticleDOI
01 Jan 2017
TL;DR: It is formally proved that the standard Alice-Bob framework is incapable of providing a super-linear lower bound for exact weighted APSP, whose complexity remains an intriguing open question.
Abstract: We present the first super-linear lower bounds for natural graph problems in the CONGEST model, answering a long-standing open question. Specifically, we show that any exact computation of a minimum vertex cover or a maximum independent set requires a near-quadratic number of rounds in the CONGEST model, as well as any algorithm for computing the chromatic number of the graph. We further show that such strong lower bounds are not limited to NP-hard problems, by showing two simple graph problems in P which require a quadratic and near-quadratic number of rounds. Finally, we address the problem of computing an exact solution to weighted all-pairs-shortest-paths (APSP), which arguably may be considered as a candidate for having a super-linear lower bound. We show a simple linear lower bound for this problem, which implies a separation between the weighted and unweighted cases, since the latter is known to have a sub-linear complexity. We also formally prove that the standard Alice-Bob framework is incapable of providing a super-linear lower bound for exact weighted APSP, whose complexity remains an intriguing open question.

37 citations


Proceedings ArticleDOI
01 Jan 2017
TL;DR: The main features of the Recoverable Mutual Exclusion algorithm are that it satisfies FCFS, it ensures that each process recovers in a wait-free manner, and in the absence of failures, it guarantees a worst-case Remote Memory Reference (RMR) complexity of O(lg n) on both Cache Coherent (CC) and Distributed Shared Memory (DSM) machines.
Abstract: Traditional mutual exclusion locks are not resilient to failures: if there is a power outage, the memory is wiped out. Thus, when the system comes back on, the lock will have to be restored to the initial state, i.e., all processes are rolled back to the Remainder section and all variables are reset to their initial values. Recently, Golab and Ramaraju showed that we can improve this state of the art by exploiting the Non-Volatile RAM (NVRAM). They designed algorithms that, by maintaining shared variables in NVRAM, allow processes to recover from crashes on their own without a need for a global reset, even though a crash can wipe out the local memory of a process. We present a Recoverable Mutual Exclusion algorithm using the commonly supported CAS primitive. The main features of our algorithm are that it satisfies FCFS, it ensures that each process recovers in a wait-free manner, and in the absence of failures, it guarantees a worst-case Remote Memory Reference (RMR) complexity of O(lg n) on both Cache Coherent (CC) and Distributed Shared Memory (DSM) machines, where n is the number of processes for which the algorithm is designed. This bound matches the Omega(lg n) RMR lower bound by Attiya, Hendler, and Woelfel for Mutual Exclusion algorithms that use comparison primitives.

29 citations


Proceedings ArticleDOI
01 Jan 2017
TL;DR: This is the first sublinear-time distributed DFS algorithm, improving on a three decades-old O(n) algorithm of Awerbuch (1985), which remains the best known for general graphs.
Abstract: We present a randomized distributed algorithm that computes a Depth-First Search (DFS) tree in ~O(D) rounds, in any planar network G=(V,E) with diameter D, with high probability. This is the first sublinear-time distributed DFS algorithm, improving on a three decades-old O(n) algorithm of Awerbuch (1985), which remains the best known for general graphs. Furthermore, this ~O(D) round complexity is nearly-optimal as Omega(D) is a trivial lower bound. A key technical ingredient in our results is the development of a distributed method for (recursively) computing a separator path, which is a path whose removal from the graph leaves connected components that are all a constant factor smaller. We believe that the general method we develop for computing path separators recursively might be of broader interest, and may provide the first step towards solving many other problems.

29 citations


Proceedings ArticleDOI
05 Oct 2017
TL;DR: A novel axiomatic specification of GSC is proposed and its equivalence to the operational definition of the model is proved: it is shown that clients that do not communicate out-of-band cannot tell the difference between these models.
Abstract: Modern distributed systems often achieve availability and scalability by providing consistency guarantees about the data they manage weaker than linearizability. We consider a class of such consistency models that, despite this weakening, guarantee that clients eventually agree on a global sequence of operations, while seeing a subsequence of this final sequence at any given point of time. Examples of such models include the classical Total Store Order (TSO) and recently proposed dual TSO, Global Sequence Protocol (GSP) and Ordered Sequential Consistency. We define a unified model, called Global Sequence Consistency (GSC), that has the above models as its special cases, and investigate its key properties. First, we propose a condition under which multiple objects each satisfying GSC can be composed so that the whole set of objects satisfies GSC. Second, we prove an interesting relationship between special cases of GSC - GSP, TSO and dual TSO: we show that clients that do not communicate out-of-band cannot tell the difference between these models. To obtain these results, we propose a novel axiomatic specification of GSC and prove its equivalence to the operational definition of the model.

15 citations


Proceedings ArticleDOI
01 Jan 2017
TL;DR: A method to automatically produce concurrent data structures for NUMA architectures with real-time requirements is introduced and a summary of that work is presented.
Abstract: Recent work introduced a method to automatically produce concurrent data structures for NUMA architectures. We present a summary of that work.

15 citations


Proceedings Article
01 Jan 2017
TL;DR: This paper presents the first sub-logarithmic deterministic algorithm for connected components in the broadcast congested clique, and presents a proposed solution faster than a simple parallel implementation of the Boruvka’s algorithm.
Abstract: Recently, very fast deterministic and randomized algorithms have been obtained for connectivity and minimum spanning tree in the unicast congested clique. In contrast, no solution faster than a simple parallel implementation of the Boruvka’s algorithm has been known for both problems in the broadcast congested clique. In this announcement, we present the first sub-logarithmic deterministic algorithm for connected components in the broadcast congested clique. 1998 ACM Subject Classification F.1.2 Modes of Computation, F.2.3 Tradeoffs between Complexity Measures, F.2 Analysis of Algorithms and Problem Complexity

13 citations


Proceedings ArticleDOI
01 Jan 2017
TL;DR: This work studies distributed algorithms implemented in a simplified biologically inspired model for stochastic spiking neural networks, and shows that randomness does not provide significant computational advantages for similarity testing, and establishes a separation between feedforward networks whose gates spike with sigmoidal probability functions, and well-studied deterministicSigmoidal networks, whose gates output real number sigmoid values, and which can implement a neuro-RAM much more efficiently.
Abstract: We study distributed algorithms implemented in a simplified biologically inspired model for stochastic spiking neural networks. We focus on tradeoffs between computation time and network complexity, along with the role of noise and randomness in efficient neural computation. It is widely accepted that neural spike responses, and neural computation in general, is inherently stochastic. In recent work, we explored how this stochasticity could be leveraged to solve the 'winner-take-all' leader election task. Here, we focus on using randomness in neural algorithms for similarity testing and compression. In the most basic setting, given two n-length patterns of firing neurons, we wish to distinguish if the patterns are equal or epsilon-far from equal. Randomization allows us to solve this task with a very compact network, using O((sqrt(n) log n)/epsilon) auxiliary neurons, which is sublinear in the input size. At the heart of our solution is the design of a t-round neural random access memory, or indexing network, which we call a neuro-RAM. This module can be implemented with O(n/t) auxiliary neurons and is useful in many applications beyond similarity testing - e.g., we discuss its application to compression via random projection. Using a VC dimension-based argument, we show that the tradeoff between runtime and network size in our neuro-RAM is nearly optimal. To the best of our knowledge, we are the first to apply these techniques to stochastic spiking networks. Our result has several implications - since our neuro-RAM can be implemented with deterministic threshold gates, it demonstrates that, in contrast to similarity testing, randomness does not provide significant computational advantages for this problem. It also establishes a separation between our networks, which spike with a sigmoidal probability function, and well-studied deterministic sigmoidal networks, whose gates output real number values, and which can implement a neuro-RAM much more efficiently.

11 citations


Proceedings ArticleDOI
16 Oct 2017
TL;DR: A new algorithm for multi-writer, multi-reader atomic registers in the DCR model that guarantees safety under all conditions and termination under a natural condition is given, which improves on the best prior protocol for this problem by requiring fewer rounds, fewer nodes to participate in the quorum, and a less restrictive liveness condition.
Abstract: This paper considers the problem of building fault-tolerant shared objects when processes can crash and recover but lose their persistent state on recovery. This Diskless Crash-Recovery (DCR) model matches the way many long-lived systems are built. We show that it presents new challenges, as operations that are recorded at a quorum may not persist after some of the processes in that quorum crash and then recover. To address this problem, we introduce the notion of crash-consistent quorums, where no recoveries happen during the quorum responses. We show that relying on crash-consistent quorums enables a recovery procedure that can recover all operations that successfully finished. Crash-consistent quorums can be easily identified using a mechanism we term the crash vector, which tracks the causal relationship between crashes, recoveries, and other operations. We apply crash-consistent quorums and crash vectors to build two storage primitives. We give a new algorithm for multi-writer, multi-reader atomic registers in the DCR model that guarantees safety under all conditions and termination under a natural condition. It improves on the best prior protocol for this problem by requiring fewer rounds, fewer nodes to participate in the quorum, and a less restrictive liveness condition. We also present a more efficient single-writer, single-reader atomic set - a virtual stable storage abstraction. It can be used to lift any existing algorithm from the traditional Crash-Recovery model to the DCR model. We examine a specific application, state machine replication, and show that existing diskless protocols can violate their correctness guarantees, while ours offers a general and correct solution.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: In this article, the meeting problem for k>=2 searchers in a polygon P (possibly with holes) was studied, and it was shown that k = sigma+1 searchers are sufficient to solve the problem, where sigma = 1 corresponds to no rotational symmetry.
Abstract: The Meeting problem for k>=2 searchers in a polygon P (possibly with holes) consists in making the searchers move within P, according to a distributed algorithm, in such a way that at least two of them eventually come to see each other, regardless of their initial positions. The polygon is initially unknown to the searchers, and its edges obstruct both movement and vision. Depending on the shape of P, we minimize the number of searchers k for which the Meeting problem is solvable. Specifically, if P has a rotational symmetry of order sigma (where sigma=1 corresponds to no rotational symmetry), we prove that k=sigma+1 searchers are sufficient, and the bound is tight. Furthermore, we give an improved algorithm that optimally solves the Meeting problem with k=2 searchers in all polygons whose barycenter is not in a hole (which includes the polygons with no holes). Our algorithms can be implemented in a variety of standard models of mobile robots operating in Look-Compute-Move cycles. For instance, if the searchers have memory but are anonymous, asynchronous, and have no agreement on a coordinate system or a notion of clockwise direction, then our algorithms work even if the initial memory contents of the searchers are arbitrary and possibly misleading. Moreover, oblivious searchers can execute our algorithms as well, encoding information by carefully positioning themselves within the polygon. This code is computable with basic arithmetic operations (provided that the coordinates of the polygon's vertices are algebraic real numbers in some global coordinate system), and each searcher can geometrically construct its own destination point at each cycle using only a compass. We stress that such memoryless searchers may be located anywhere in the polygon when the execution begins, and hence the information they initially encode is arbitrary. Our algorithms use a self-stabilizing map construction subroutine which is of independent interest.

Proceedings Article
01 Jan 2017
TL;DR: This work proves that for all positive integers n and k, there is a linearizable object that is computationally equivalent to the k-set agreement task among n processes, and are the first linearizable objects proven to be computationally equivalents to set agreement tasks.
Abstract: We first prove that there are uncountably many objects with distinct computational powers. More precisely, we show that there is an uncountable set of objects such that for any two of them, at least one cannot be implemented from the other (and registers) in a wait-free manner. We then strengthen this result by showing that there are uncountably many linearizable objects with distinct computational powers. To do so, we prove that for all positive integers n and k, there is a linearizable object that is computationally equivalent to the k-set agreement task among n processes. To the best of our knowledge, these are the first linearizable objects proven to be computationally equivalent to set agreement tasks. 1998 ACM Subject Classification C.2.4 Distributed Systems

Proceedings Article
16 Oct 2017
TL;DR: This paper designs a rendezvous algorithm with appropriate properties for the task of approach of two mobile agents having the same limited range of vision and moving asynchronously in the plane and provides a positive answer to the question, does there exist a deterministic approach algorithm always working at a cost that is polynomial in this paper?
Abstract: In this paper we study the task of approach of two mobile agents having the same limited range of vision and moving asynchronously in the plane. This task consists in getting them in finite time within each other’s range of vision. The agents execute the same deterministic algorithm and are assumed to have a compass showing the cardinal directions as well as a unit measure. On the other hand, they do not share any global coordinates system (like GPS), cannot communicate and have distinct labels. Each agent knows its label but does not know the label of the other agent or the initial position of the other agent relative to its own. The route of an agent is a sequence of segments that are subsequently traversed in order to achieve approach. For each agent, the computation of its route depends only on its algorithm and its label. An adversary chooses the initial positions of both agents in the plane and controls the way each of them moves along every segment of the routes, in particular by arbitrarily varying the speeds of the agents. Roughly speaking, the goal of the adversary is to prevent the agents from solving the task, or at least to ensure that the agents have covered as much distance as possible before seeing each other. A deterministic approach algorithm is a deterministic algorithm that always allows two agents with any distinct labels to solve the task of approach regardless of the choices and the behavior of the adversary. The cost of a complete execution of an approach algorithm is the length of both parts of route travelled by the agents until approach is completed. Let Δ and l be the initial distance separating the agents and the length of (the binary representation of) the shortest label, respectively. Assuming thatΔandlare unknown to both agents, does there exist a deterministic approach algorithm always working at a cost that is polynomial inΔandl? Actually the problem of approach in the plane reduces to the network problem of rendezvous in an infinite oriented grid, which consists in ensuring that both agents end up meeting at the same time at a node or on an edge of the grid. By designing such a rendezvous algorithm with appropriate properties, as we do in this paper, we provide a positive answer to the above question. Our result turns out to be an important step forward from a computational point of view, as the other algorithms allowing to solve the same problem either have an exponential cost in the initial separating distance and in the labels of the agents, or require each agent to know its starting position in a global system of coordinates, or only work under a much less powerful adversary.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: In this article, the authors introduce the descriptor abstract data type (ADT) which captures the usage of descriptors by lock-free algorithms and develop a weak descriptor ADT which has weaker semantics, but can be implemented significantly more efficiently.
Abstract: In many lock-free algorithms, threads help one another, and each operation creates a descriptor that describes how other threads should help it. Allocating and reclaiming descriptors introduces significant space and time overhead. We introduce the first descriptor abstract data type (ADT), which captures the usage of descriptors by lock-free algorithms. We then develop a weak descriptor ADT which has weaker semantics, but can be implemented significantly more efficiently. We show how a large class of lock-free algorithms can be transformed to use weak descriptors, and demonstrate our technique by transforming several algorithms, including the leading k-compare-and-swap (k-CAS) algorithm. The original k-CAS algorithm allocates at least k+1 new descriptors per k-CAS. In contrast, our implementation allocates two descriptors per process, and each process simply reuses its two descriptors. Experiments on a variety of workloads show significant performance improvements over implementations that reclaim descriptors, and reductions of up to three orders of magnitude in peak memory usage.

Proceedings Article
01 Jan 2017
TL;DR: It is shown that, unlike in progressive STMs, software transactions in progressive HyTMs cannot avoid incremental validation, and algorithms providing progressiveness for a subset of transactions that are optimal in terms of hardware instrumentation are presented.
Abstract: State-of-the-art \emph{software transactional memory (STM)} implementations achieve good performance by carefully avoiding the overhead of \emph{incremental validation} (i.e., re-reading previously read data items to avoid inconsistency) while still providing \emph{progressiveness} (allowing transactional aborts only due to \emph{data conflicts}). Hardware transactional memory (HTM) implementations promise even better performance, but offer no progress guarantees. Thus, they must be combined with STMs, leading to \emph{hybrid} TMs (HyTMs) in which hardware transactions must be \emph{instrumented} (i.e., access metadata) to detect contention with software transactions. We show that, unlike in progressive STMs, software transactions in progressive HyTMs cannot avoid incremental validation. In fact, this result holds even if hardware transactions can \emph{read} metadata \emph{non-speculatively}. We then present \emph{opaque} HyTM algorithms providing \emph{progressiveness for a subset of transactions} that are optimal in terms of hardware instrumentation. We explore the concurrency vs. hardware instrumentation vs. software validation trade-offs for these algorithms. Our experiments with Intel and IBM POWER8 HTMs seem to suggest that (i) the \emph{cost of concurrency} also exists in practice, (ii) it is important to implement HyTMs that provide progressiveness for a maximal set of transactions without incurring high hardware instrumentation overhead or using global contending bottlenecks and (iii) there is no easy way to derive more efficient HyTMs by taking advantage of non-speculative accesses within hardware.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: This paper considers several closely-related problems in synchronous dynamic networks with oblivious adversaries, and proves novel \(\varOmega (d + \text{ poly )(m)\) lower bounds on their time complexity (in rounds), which are the first non-trivial lower bounds and also the first lower bounds with a \(\text{poly }(m)\) term.
Abstract: This paper considers several closely-related problems in synchronous dynamic networks with oblivious adversaries, and proves novel Omega(d + poly(m)) lower bounds on their time complexity (in rounds). Here d is the dynamic diameter of the dynamic network and m is the total number of nodes. Before this work, the only known lower bounds on these problems under oblivious adversaries were the trivial Omega(d) lower bounds. Our novel lower bounds are hence the first non-trivial lower bounds and also the first lower bounds with a poly(m) term. Our proof relies on a novel reduction from a certain two-party communication complexity problem. Our central proof technique is unique in the sense that we consider the communication complexity with a special leaker. The leaker helps Alice and Bob in the two-party problem, by disclosing to Alice and Bob certain "non-critical" information about the problem instance that they are solving.

Proceedings ArticleDOI
01 Aug 2017
TL;DR: In this article, the authors presented a deterministic distributed algorithm that given an unweighted n-vertex graph G = (V,E) and a parameter k > 2, constructs a (2k-1)-spanner with O(k n^(1+1/k)) edges within O(2k n/1/2 - 1/k) rounds for every even k.
Abstract: Graph spanners are fundamental graph structures with a wide range of applications in distributed networks. We consider a standard synchronous message passing model where in each round O(log n) bits can be transmitted over every edge (the CONGEST model). The state of the art of deterministic distributed spanner constructions suffers from large messages. The only exception is the work of Derbel et al., which computes an optimal-sized (2k-1)-spanner but uses O(n^(1-1/k)) rounds. In this paper, we significantly improve this bound. We present a deterministic distributed algorithm that given an unweighted n-vertex graph G = (V,E) and a parameter k > 2, constructs a (2k-1)-spanner with O(k n^(1+1/k)) edges within O(2^k n^(1/2 - 1/k)) rounds for every even k. For odd k, the number of rounds is O(2^k n^(1/2 - 1/(2k))). For the weighted case, we provide the first deterministic construction of a 3-spanner with O(n^(3/2)) edges that uses O(log n)-size messages and ~O(1) rounds. If the vertices have IDs in [1,Theta(n)], the spanner is computed in only 2 rounds!

Proceedings ArticleDOI
01 Feb 2017
TL;DR: In this paper, the authors considered the problem of estimating the graph size, where one is given only local access to the graph, and formally defined a query model in which one starts with a seed node and is allowed to make queries about neighbours of nodes that have already been seen.
Abstract: We consider the problem of estimating the graph size, where one is given only local access to the graph. We formally define a query model in which one starts with a seed node and is allowed to make queries about neighbours of nodes that have already been seen. In the case of undirected graphs, an estimator of Katzir et al. (2014) based on a sample from the stationary distribution pi uses O(1/||pi||_2 + d_avg) queries; we prove that this is tight. In addition, we establish this as a lower bound even when the algorithm is allowed to crawl the graph arbitrarily; the results of Katzir et al. give an upper bound that is worse by a multiplicative factor t_mix(1/n^4). The picture becomes significantly different in the case of directed graphs. We show that without strong assumptions on the graph structure, the number of nodes cannot be predicted to within a constant multiplicative factor without using a number of queries that are at least linear in the number of nodes; in particular, rapid mixing and small diameter, properties that most real-world networks exhibit, do not suffice. The question of interest is whether any algorithm can beat breadth-first search. We introduce a new parameter, generalising the well-studied conductance, such that if a suitable bound on it exists and is known to the algorithm, the number of queries required is sublinear in the number of edges; we show that this is tight.

Proceedings Article
01 Jan 2017
TL;DR: P8TM as discussed by the authors is a novel approach that mitigates this limitation on IBM's POWER8 architecture by leveraging a key combination of hardware and software techniques to support different execution paths and relies on self-tuning mechanisms aimed at dynamically switching between different execution modes to best adapt to the workload characteristics.
Abstract: Transactional memory (TM) aims at simplifying concurrent programming via the familiar abstraction of atomic transactions. Recently, Intel and IBM have integrated hardware based TM (HTM) implementations in commodity processors, paving the way for the mainstream adoption of the TM paradigm. Yet, existing HTM implementations suffer from a crucial limitation, which hampers the adoption of HTM as a general technique for regulating concurrent access to shared memory: the inability to execute transactions whose working sets exceed the capacity of CPU caches. In this article we propose P8TM, a novel approach that mitigates this limitation on IBM’s POWER8 architecture by leveraging a key combination of hardware and software techniques to support different execution paths. P8TM also relies on self-tuning mechanisms aimed at dynamically switching between different execution modes to best adapt to the workload characteristics. In-depth evaluation with several benchmarks indicates that P8TM can achieve striking performance gains in workloads that stress the capacity limitations of HTM, while achieving performance on par with HTM even in unfavourable workloads.

Proceedings ArticleDOI
16 Oct 2017
TL;DR: In this paper, the authors introduce error-sensitive proof-labeling schemes, which guarantee that the number of nodes detecting illegal states is linearly proportional to the edit distance between the current state and the set of legal states.
Abstract: Proof-labeling schemes are known mechanisms providing nodes of networks with certificates that can be verified locally by distributed algorithms. Given a boolean predicate on network states, such schemes enable to check whether the predicate is satisfied by the actual state of the network, by having nodes interacting with their neighbors only. Proof-labeling schemes are typically designed for enforcing fault-tolerance, by making sure that if the current state of the network is illegal with respect to some given predicate, then at least one node will detect it. Such a node can raise an alarm, or launch a recovery procedure enabling the system to return to a legal state. In this paper, we introduce error-sensitive proof-labeling schemes. These are proof-labeling schemes which guarantee that the number of nodes detecting illegal states is linearly proportional to the edit-distance between the current state and the set of legal states. By using error-sensitive proof-labeling schemes, states which are far from satisfying the predicate will be detected by many nodes, enabling fast return to legality. We provide a structural characterization of the set of boolean predicates on network states for which there exist error-sensitive proof-labeling schemes. This characterization allows us to show that classical predicates such as, e.g., acyclicity, and leader admit error-sensitive proof-labeling schemes, while others like regular subgraphs don't. We also focus on compact error-sensitive proof-labeling schemes. In particular, we show that the known proof-labeling schemes for spanning tree and minimum spanning tree, using certificates on O(logn) bits, and on O(log2n) bits, respectively, are error-sensitive, as long as the trees are locally represented by adjacency lists, and not just by parent pointers.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: It is shown that many techniques developed in the context of predicate detection are applicable to the stable marriage problem and techniques in computation slicing can be used to represent the set of all constrained stable matchings.
Abstract: We show that many techniques developed in the context of predicate detection are applicable to the stable marriage problem. The standard Gale-Shapley algorithm can be derived as a special case of detecting linear predicates. We also show that techniques in computation slicing can be used to represent the set of all constrained stable matchings.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: The first self-stabilizing algorithm for leader election in arbitrary topologies whose space complexity is O(max{log Delta, log log n}) bits per node, where n is the network size and Delta its degree is presented.
Abstract: We present the first self-stabilizing algorithm for leader election in arbitrary topologies whose space complexity is O(max{log Delta, log log n}) bits per node, where n is the network size and Delta its degree. This complexity is sub-logarithmic in n when Delta = n^o(1).

Proceedings Article
01 Jan 2017
TL;DR: This analysis shows that, starting from any initial configuration, the Undecided-State Dynamics reaches a monochromatic configuration within $O(\log^2 n)$ rounds, with high probability, and proves that if the initial configuration has bias $\Omega(\sqrt{n\log n})$, then the dynamics converges toward the initial majority color within a polylogarithmic number of rounds.

Proceedings Article
01 Jan 2017
TL;DR: It is proved that this protocol requires the optimal number of communication rounds, required by any protocol belonging to a specific class of crash-tolerant consensus protocols in directed graphs.
Abstract: We revisit the problem of distributed consensus in directed graphs tolerating crash failures; we improve the round and communication complexity of the existing protocols. Moreover, we prove that our protocol requires the optimal number of communication rounds, required by any protocol belonging to a specific class of crash-tolerant consensus protocols in directed graphs. 1998 ACM Subject Classification B.8.1 Reliability, Testing and Fault-Tolerance, E.1 Distributed data structures


Proceedings Article
01 Jan 2017
TL;DR: This paper exhibits a local graph dynamic that mixes provably fast, in contrast with previous results, which required at least Ω(N polylog(N) operations per node before the graph had properly mixed.
Abstract: In peer-to-peer networks, it is desirable that the logical topology of connections between the constituting nodes make a well-connected graph, i.e., a graph with low diameter and high expansion. At the same time, this graph should evolve only through local modifications. These requirements prompt the following question: are there local graph dynamics that i) create a well-connected graph in equilibrium, and ii) converge rapidly to this equilibrium? In this paper we provide an affirmative answer by exhibiting a local graph dynamic that mixes provably fast. Specifically, for a graph on N nodes, mixing has occurred after each node has performed O(polylog(N)) operations. This is in contrast with previous results, which required at least Ω(N polylog(N)) operations per node before the graph had properly mixed. 1998 ACM Subject Classification E.1 Graphs and Networks

Proceedings ArticleDOI
01 Jan 2017
TL;DR: This paper presents a fast coalescence protocol that converges in O(sqrt(n) log^2 n) parallel time with high probability in the model with an initial leader (equivalently, themodel with a base station), which achieves an substantial speed-up compared with the naive implementation taking Omega(n).
Abstract: The coalescence protocol plays an important role in the population protocol model. The conceptual structure of the protocol is for two agents holding two non-zero values a, b respectively to take a transition (a,b) -> (a+b, 0), where + is an arbitrary commutative binary operation. Obviously, it eventually aggregates the sum of all initial values. In this paper, we present a fast coalescence protocol that converges in O(sqrt(n) log^2 n) parallel time with high probability in the model with an initial leader (equivalently, the model with a base station), which achieves an substantial speed-up compared with the naive implementation taking Omega(n) time.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: The reported reseach answers this question affirmatively for bounded degree networks by introducing a stone age algorithm (operating under the most restrictive form of the model) that given a designated I/O node, constructs a tour in the network that enables the simulation of the Turing machine’s tape.
Abstract: What can be computed by a network of n randomized finite state machines communicating under the stone age model (a generalization of the beeping model’s communication scheme)? The inherent linear upper bound on the total space of the network implies that its global computational power is not larger than that of a randomized linear space Turing machine, but is this tight? The reported reseach answers this question affirmatively for bounded degree networks by introducing a stone age algorithm (operating under the most restrictive form of the model) that given a designated I/O node, constructs a tour in the network that enables the simulation of the Turing machine’s tape. To construct the tour, it is first shown how to 2-hop color the network concurrently with building a spanning tree with high probability.