A Scalable and Energy-Efficient Concurrent Binary Search Tree With Fatnodes

doi:10.1109/TSUSC.2020.2970034

Home
/
Papers
/
A Scalable and Energy-Efficient Concurrent Binary Search Tree With Fatnodes

Journal Article•DOI•

A Scalable and Energy-Efficient Concurrent Binary Search Tree With Fatnodes

Praveen Alapati¹, Venkata Kalyan Tavva², Madhu Mutyam¹•Institutions (2)

Indian Institute of Technology Madras¹, Indian Institute of Technology Ropar²

01 Oct 2020-Vol. 5, Iss: 4, pp 468-484

TL;DR: A scalable and energy-efficient concurrent binary search tree with fatnodes (namely, FatCBST), and algorithms to perform basic operations on it, which scales well and also provides high performance-per-watt values as compared to the state-of-the-art implementations.

read less

Abstract: In the recent past, devising algorithms for concurrent data structures has been driven by the need for scalability. Further, there is an increased traction across the industry towards power efficient concurrent data structure designs. In this context, we introduce a scalable and energy-efficient concurrent binary search tree with fatnodes (namely, FatCBST) , and present algorithms to perform basic operations on it. Unlike a single node with one value, a fatnode consists of a set of values. FatCBST minimizes structural changes while performing update operations on the tree. In addition, fatnodes help to exploit the spatial locality in the cache hierarchy and also reduce the height of the tree. FatCBST allows multiple threads to perform update operations on an existing fatnode simultaneously. Experimental results show that for low contention workloads as well as large set sizes, FatCBST scales well and also provides high performance-per-watt values as compared to the state-of-the-art implementations. For high contention workloads with small set sizes, FatCBST suffers from contention.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

The Art of Multiprocessor Programming

[...]

D.M. Hutton

17 Oct 2008-Kybernetes

590 citations

References

PDF

Open Access

More filters

Journal Article•DOI•

Linearizability: a correctness condition for concurrent objects

[...]

Maurice Herlihy¹, Jeannette M. Wing¹•Institutions (1)

Carnegie Mellon University¹

01 Jul 1990-ACM Transactions on Programming Languages and Systems

TL;DR: This paper defines linearizability, compares it to other correctness conditions, presents and demonstrates a method for proving the correctness of implementations, and shows how to reason about concurrent objects, given they are linearizable.

...read moreread less

Abstract: A concurrent object is a data object shared by concurrent processes. Linearizability is a correctness condition for concurrent objects that exploits the semantics of abstract data types. It permits a high degree of concurrency, yet it permits programmers to specify and reason about concurrent objects using known techniques from the sequential domain. Linearizability provides the illusion that each operation applied by concurrent processes takes effect instantaneously at some point between its invocation and its response, implying that the meaning of a concurrent object's operations can be given by pre- and post-conditions. This paper defines linearizability, compares it to other correctness conditions, presents and demonstrates a method for proving the correctness of implementations, and shows how to reason about concurrent objects, given they are linearizable.

...read moreread less

3,396 citations

"A Scalable and Energy-Efficient Con..." refers background in this paper

...Linearizability is a correctness condition for concurrent objects [20], and it deals with linearization points (LPs) of each of the operations....
[...]
...deadlocks avoidance, (ii) liveness property—that ensures cooperation among threads while performing the tasks [19], and (iii) linearizabile property—that ensures a correctness condition for concurrent objects [20], and it deals with linearization points (LPs) of each operation....
[...]

Journal Article•DOI•

Wait-free synchronization

[...]

Maurice Herlihy

01 Jan 1991-ACM Transactions on Programming Languages and Systems

TL;DR: A hierarchy of objects is derived such that no object at one level has a wait-free implementation in terms of objects at lower levels, and it is shown that atomic read/write registers, which have been the focus of much recent attention, are at the bottom of the hierarchy.

...read moreread less

Abstract: A wait-free implementation of a concurrent data object is one that guarantees that any process can complete any operation in a finite number of steps, regardless of the execution speeds of the other processes. The problem of constructing a wait-free implementation of one data object from another lies at the heart of much recent work in concurrent algorithms, concurrent data structures, and multiprocessor architectures. First, we introduce a simple and general technique, based on reduction to a concensus protocol, for proving statements of the form, “there is no wait-free implementation of X by Y.” We derive a hierarchy of objects such that no object at one level has a wait-free implementation in terms of objects at lower levels. In particular, we show that atomic read/write registers, which have been the focus of much recent attention, are at the bottom of the hierarchy: thay cannot be used to construct wait-free implementations of many simple and familiar data types. Moreover, classical synchronization primitives such astest&set and fetch&add, while more powerful than read and write, are also computationally weak, as are the standard message-passing primitives. Second, nevertheless, we show that there do exist simple universal objects from which one can construct a wait-free implementation of any sequential object.

...read moreread less

2,013 citations

"A Scalable and Energy-Efficient Con..." refers background in this paper

...ules that can be generated from the sequential code and satisfy the linearizability [13]....
[...]

Book•

The Art of Multiprocessor Programming

[...]

Maurice Herlihy¹•Institutions (1)

Brown University¹

14 Mar 2008

TL;DR: Transactional memory as discussed by the authors is a computational model in which threads synchronize by optimistic, lock-free transactions, and there is a growing community of researchers working on both software and hardware support for this approach.

...read moreread less

Abstract: Computer architecture is about to undergo, if not another revolution, then a vigorous shaking-up. The major chip manufacturers have, for the time being, simply given up trying to make processors run faster. Instead, they have recently started shipping "multicore" architectures, in which multiple processors (cores) communicate directly through shared hardware caches, providing increased concurrency instead of increased clock speed.As a result, system designers and software engineers can no longer rely on increasing clock speed to hide software bloat. Instead, they must somehow learn to make effective use of increasing parallelism. This adaptation will not be easy. Conventional synchronization techniques based on locks and conditions are unlikely to be effective in such a demanding environment. Coarse-grained locks, which protect relatively large amounts of data, do not scale, and fine-grained locks introduce substantial software engineering problem.Transactional memory is a computational model in which threads synchronize by optimistic, lock-free transactions. This synchronization model promises to alleviate many (not all) of the problems associated with locking, and there is a growing community of researchers working on both software and hardware support for this approach. This talk will survey the area, with a focus on open research problems.

...read moreread less

1,268 citations

Journal Article•DOI•

The Art of Multiprocessor Programming

[...]

D.M. Hutton

17 Oct 2008-Kybernetes

590 citations

"A Scalable and Energy-Efficient Con..." refers background or methods in this paper

...deadlocks avoidance, (ii) liveness property—that ensures cooperation among threads while performing the tasks [19], and (iii) linearizabile property—that ensures a correctness condition for concurrent objects [20], and it deals with linearization points (LPs) of each operation....
[...]
...As the rebalancing thread executes continuously, to reduce the processing power, we use a simple backOff() scheme [19]....
[...]
...Safety property states that in a concurrent implementation, threads should never create a deadlock [19]....
[...]
...There are different types of synchronization techniques such as coarse-grained, fine-grained, optimistic, lazy, and non-blocking [19]....
[...]
...Progress guarantee states that lock-free algorithms should provide system-wide progress [19]....
[...]

Proceedings Article•DOI•

RAPL: memory power estimation and capping

[...]

Howard S. David¹, Eugene Gorbatov¹, Ulf R. Hanebutte¹, Rahul Khanna¹, Christian Le¹ - Show less +1 more•Institutions (1)

Intel¹

18 Aug 2010

TL;DR: This paper proposes a new approach for measuring memory power and demonstrating its applicability to a novel power limiting algorithm and shows that it achieves up to 40% lower performance impact when compared to the state-of-art baseline across the power limiting range.

...read moreread less

Abstract: The drive for higher performance and energy efficiency in data-centers has influenced trends toward increased power and cooling requirements in the facilities. Since enterprise servers rarely operate at their peak capacity, efficient power capping is deemed as a critical component of modern enterprise computing environments. In this paper we propose a new power measurement and power limiting architecture for main memory. Specifically, we describe a new approach for measuring memory power and demonstrate its applicability to a novel power limiting algorithm. We implement and evaluate our approach in the modern servers and show that we achieve up to 40% lower performance impact when compared to the state-of-art baseline across the power limiting range.

...read moreread less

533 citations

"A Scalable and Energy-Efficient Con..." refers background in this paper

...To measure the energy consumption, we consider jRAPL tool [27], which is a framework for profiling Java programs executing on CPUs with support of Running Average Power Limit (RAPL)....
[...]
...RAPL [28] facilitates fine granular power consumption calculation....
[...]