scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Scalable and Energy-Efficient Concurrent Binary Search Tree With Fatnodes

TL;DR: A scalable and energy-efficient concurrent binary search tree with fatnodes (namely, FatCBST), and algorithms to perform basic operations on it, which scales well and also provides high performance-per-watt values as compared to the state-of-the-art implementations.
Abstract: In the recent past, devising algorithms for concurrent data structures has been driven by the need for scalability. Further, there is an increased traction across the industry towards power efficient concurrent data structure designs. In this context, we introduce a scalable and energy-efficient concurrent binary search tree with fatnodes (namely, FatCBST) , and present algorithms to perform basic operations on it. Unlike a single node with one value, a fatnode consists of a set of values. FatCBST minimizes structural changes while performing update operations on the tree. In addition, fatnodes help to exploit the spatial locality in the cache hierarchy and also reduce the height of the tree. FatCBST allows multiple threads to perform update operations on an existing fatnode simultaneously. Experimental results show that for low contention workloads as well as large set sizes, FatCBST scales well and also provides high performance-per-watt values as compared to the state-of-the-art implementations. For high contention workloads with small set sizes, FatCBST suffers from contention.
Citations
More filters
Journal ArticleDOI

590 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper defines linearizability, compares it to other correctness conditions, presents and demonstrates a method for proving the correctness of implementations, and shows how to reason about concurrent objects, given they are linearizable.
Abstract: A concurrent object is a data object shared by concurrent processes. Linearizability is a correctness condition for concurrent objects that exploits the semantics of abstract data types. It permits a high degree of concurrency, yet it permits programmers to specify and reason about concurrent objects using known techniques from the sequential domain. Linearizability provides the illusion that each operation applied by concurrent processes takes effect instantaneously at some point between its invocation and its response, implying that the meaning of a concurrent object's operations can be given by pre- and post-conditions. This paper defines linearizability, compares it to other correctness conditions, presents and demonstrates a method for proving the correctness of implementations, and shows how to reason about concurrent objects, given they are linearizable.

3,396 citations


"A Scalable and Energy-Efficient Con..." refers background in this paper

  • ...Linearizability is a correctness condition for concurrent objects [20], and it deals with linearization points (LPs) of each of the operations....

    [...]

  • ...deadlocks avoidance, (ii) liveness property—that ensures cooperation among threads while performing the tasks [19], and (iii) linearizabile property—that ensures a correctness condition for concurrent objects [20], and it deals with linearization points (LPs) of each operation....

    [...]

Journal ArticleDOI
TL;DR: A hierarchy of objects is derived such that no object at one level has a wait-free implementation in terms of objects at lower levels, and it is shown that atomic read/write registers, which have been the focus of much recent attention, are at the bottom of the hierarchy.
Abstract: A wait-free implementation of a concurrent data object is one that guarantees that any process can complete any operation in a finite number of steps, regardless of the execution speeds of the other processes. The problem of constructing a wait-free implementation of one data object from another lies at the heart of much recent work in concurrent algorithms, concurrent data structures, and multiprocessor architectures. First, we introduce a simple and general technique, based on reduction to a concensus protocol, for proving statements of the form, “there is no wait-free implementation of X by Y.” We derive a hierarchy of objects such that no object at one level has a wait-free implementation in terms of objects at lower levels. In particular, we show that atomic read/write registers, which have been the focus of much recent attention, are at the bottom of the hierarchy: thay cannot be used to construct wait-free implementations of many simple and familiar data types. Moreover, classical synchronization primitives such astest&set and fetch&add, while more powerful than read and write, are also computationally weak, as are the standard message-passing primitives. Second, nevertheless, we show that there do exist simple universal objects from which one can construct a wait-free implementation of any sequential object.

2,013 citations


"A Scalable and Energy-Efficient Con..." refers background in this paper

  • ...ules that can be generated from the sequential code and satisfy the linearizability [13]....

    [...]

Book
Maurice Herlihy1
14 Mar 2008
TL;DR: Transactional memory as discussed by the authors is a computational model in which threads synchronize by optimistic, lock-free transactions, and there is a growing community of researchers working on both software and hardware support for this approach.
Abstract: Computer architecture is about to undergo, if not another revolution, then a vigorous shaking-up. The major chip manufacturers have, for the time being, simply given up trying to make processors run faster. Instead, they have recently started shipping "multicore" architectures, in which multiple processors (cores) communicate directly through shared hardware caches, providing increased concurrency instead of increased clock speed.As a result, system designers and software engineers can no longer rely on increasing clock speed to hide software bloat. Instead, they must somehow learn to make effective use of increasing parallelism. This adaptation will not be easy. Conventional synchronization techniques based on locks and conditions are unlikely to be effective in such a demanding environment. Coarse-grained locks, which protect relatively large amounts of data, do not scale, and fine-grained locks introduce substantial software engineering problem.Transactional memory is a computational model in which threads synchronize by optimistic, lock-free transactions. This synchronization model promises to alleviate many (not all) of the problems associated with locking, and there is a growing community of researchers working on both software and hardware support for this approach. This talk will survey the area, with a focus on open research problems.

1,268 citations

Journal ArticleDOI

590 citations


"A Scalable and Energy-Efficient Con..." refers background or methods in this paper

  • ...deadlocks avoidance, (ii) liveness property—that ensures cooperation among threads while performing the tasks [19], and (iii) linearizabile property—that ensures a correctness condition for concurrent objects [20], and it deals with linearization points (LPs) of each operation....

    [...]

  • ...As the rebalancing thread executes continuously, to reduce the processing power, we use a simple backOff() scheme [19]....

    [...]

  • ...Safety property states that in a concurrent implementation, threads should never create a deadlock [19]....

    [...]

  • ...There are different types of synchronization techniques such as coarse-grained, fine-grained, optimistic, lazy, and non-blocking [19]....

    [...]

  • ...Progress guarantee states that lock-free algorithms should provide system-wide progress [19]....

    [...]

Proceedings ArticleDOI
Howard S. David1, Eugene Gorbatov1, Ulf R. Hanebutte1, Rahul Khanna1, Christian Le1 
18 Aug 2010
TL;DR: This paper proposes a new approach for measuring memory power and demonstrating its applicability to a novel power limiting algorithm and shows that it achieves up to 40% lower performance impact when compared to the state-of-art baseline across the power limiting range.
Abstract: The drive for higher performance and energy efficiency in data-centers has influenced trends toward increased power and cooling requirements in the facilities. Since enterprise servers rarely operate at their peak capacity, efficient power capping is deemed as a critical component of modern enterprise computing environments. In this paper we propose a new power measurement and power limiting architecture for main memory. Specifically, we describe a new approach for measuring memory power and demonstrate its applicability to a novel power limiting algorithm. We implement and evaluate our approach in the modern servers and show that we achieve up to 40% lower performance impact when compared to the state-of-art baseline across the power limiting range.

533 citations


"A Scalable and Energy-Efficient Con..." refers background in this paper

  • ...To measure the energy consumption, we consider jRAPL tool [27], which is a framework for profiling Java programs executing on CPUs with support of Running Average Power Limit (RAPL)....

    [...]

  • ...RAPL [28] facilitates fine granular power consumption calculation....

    [...]