scispace - formally typeset
Open AccessBook ChapterDOI

Reusable Concurrent Data Types

Reads0
Chats0
TLDR
Reusable CDTs with polymorphic synchronization against transaction-based, lock-based and lock-free synchronizations on SPARC and x86-64 architectures and they outperform all reusable Java CDTs.
Abstract
This paper contributes to address the fundamental challenge of building Concurrent Data Types CDT that are reusable and scalable at the same time. We do so by proposing the abstraction of Polymorphic Transactions PT: a new programming abstraction that offers different compatible transactions that can run concurrently in the same application. We outline the commonality of the problem in various object-oriented languages and implement PT and a reusable package in Java. With PT, annotating sequential ADTs guarantee novice programmers to obtain an atomic and deadlock-free CDT and let an advanced programmer leverage the application semantics to get higher performance. We compare our polymorphic synchronization against transaction-based, lock-based and lock-free synchronizations on SPARC and x86-64 architectures and we integrate our methodology to a travel reservation benchmark. Although our reusable CDTs are sometimes less efficient than non-composable handcrafted CDTs from the JDK, they outperform all reusable Java CDTs.

read more

Content maybe subject to copyright    Report

Reusable Concurrent Data Types
Vincent Gramoli
1
and Rachid Guerraoui
2
1
NICTA and University of Sydney
vincent.gramoli@sydney.edu.au
2
EPFL
rachid.guerraoui@epfl.ch
Abstract. This paper contributes to address the fundamental challenge of build-
ing Concurrent Data Types (CDT) that are reusable and scalable at the same time.
We do so by proposing the abstraction of Polymorphic Transactions (PT): a new
programming abstraction that offers different compatible transactions that can
run concurrently in the same application.
We outline the commonality of the problem in various object-oriented
languages and implement PT and a reusable package in Java. With PT, anno-
tating sequential ADTs guarantee novice programmers to obtain an atomic and
deadlock-free CDT and let an advanced programmer leverage the application se-
mantics to get higher performance.
We compare our polymorphic synchronization against transaction-based, lock-
based and lock-free synchronizations on SPARC and x86-64 architectures and
we integrate our methodology to a travel reservation benchmark. Although our
reusable CDTs are sometimes less efficient than non-composable handcrafted
CDTs from the JDK, they outperform all reusable Java CDTs.
1 Introduction
Abstract data types (ADTs) have shown to be instrumental in making sequential pro-
grams reusable [1]. ADTs promote (a) extensibility when an ADT is specialized through,
for example, inheritance by overriding or adding new methods, and (b) composability
when two ADTs are combined into another ADT whose methods invoke the original
ones. Key to this reusability is that there is no need to know the internals of an ADT
to reuse it: its interface suffices. With the latest technology development of multi-core
architectures many programs are expected to scale with a large number of cores: ADTs
need thus to be shared by many threads.
Unfortunately, most ADTs that export shared methods, often called Concurrent Data
Types (CDTs), are not reusable: the programmer can hardly build upon them. For ex-
ample, programmers cannot reuse the popular concurrent data types of C++, Java and
C# libraries. CDTs typically export a set of methods, guaranteeing that, even if invoked
concurrently, each of these methods always appears as if it was executed in sequence.
This property, known as atomicity (or linearizability [2]), lets the programmer reason
in terms of sequential accesses. However, atomicity is generally not preserved under
extension or composition, hence annihilating reusability.
Basically, CDTs are synchronized using either lock-based (i.e., mutual exclusion)
or lock-free primitives (e.g., compare-and-swap). On the one hand, CDTs that rely
R. Jones (Ed.): ECOOP 2014, LNCS 8586, pp. 182–206, 2014.
c
Springer-Verlag Berlin Heidelberg 2014

Reusable Concurrent Data Types 183
on locks have limited composability as a user could accidentally write two composite
methods that deadlock when calling in different order two existing methods that require
distinct locks. The same CDTs might not be extensible either as adding a new method
may require to know the lock granularity used by existing methods. On the other hand,
lock-free CDTs relying on hardware primitives can generally modify only one or two
memory words atomically, requiring the user to precisely identify these words before
obtaining a scalable and atomic composite method. Knowing these internals may, how-
ever, not even help extending lock-free CDTs as we will describe in Section 2.
Some synchronization schemes do enable reusability, yet their performance does
not scale with concurrency. Typically, Transactional Memory (TM) systems ensure that
within a sequence of shared memory reads/writes, all execute atomically (the transac-
tion commits) or none of them execute (the transaction aborts) [3,4]. One can exploit
TM to write an atomic CDT easily: it suffices to (a) write the bare sequential code of the
ADT and then (b) to encapsulate each of the methods of the resulting ADT into a trans-
action. Transactional methods commit only if their execution is equivalent to a serial
one. TMs typically provide composability [5] as a new composite operation encapsu-
lated in a transaction can invoke multiple existing methods from a (transactional) CDT.
Also, specific transactions facilitate extensibility by preventing anomalies when inherit-
ing from an existing CDT [6]. Nevertheless, classic transactions are overly conservative
and clearly hamper scalability simply because they cannot exploit the application se-
mantics [7,8,9,10,11,12].
In light of this lack of scalability, expert programmers would implement handcrafted
libraries whose semantics is difficult to understand to say the least: instead of being
simply equivalent to a sequential execution (or atomic), an iteration over a CDT would
typically return different results depending on the current status of concurrent updates
of the same CDT. This strategy clearly promotes scalability while preventing a program-
mer, who ignores the underlying implementation details, from reusing the abstraction.
Built-in C++ thread building block library, java.util.concurrent package and C# System
libraries all adopt this strategy, hence limiting the ability for novices to write concurrent
code in main object-oriented languages.
In this paper, we propose the Polymorphic Transaction (PT) methodology, which
helps write concurrent programs that are both scalable and reusable. Its main novelty
is not in providing a novel transaction semantics but in combining multiple of them to
Table 1. The use-cases in which we applied the PT methodology
Use-cases of the PT methodology Data structure Type Annotated Non-protected Total
methods methods
ReusableLinkedQueue Linked list Queue 13 2 15
ReusableVector Vector Collection 37 11 48
ReusableLinkedListSortedSet Linked list Set 11 4 15
ReusableHashMap Hash table Map 11 3 14
ReusableSkipListSet Skip list Set 11 4 15
Vacation Red-black trees Database 3 88 91
Total 86 112 198

184 V. Gramoli and R. Guerraoui
scale to high levels of parallelism as they let advanced programmers exploit the applica-
tion semantics. The PT methodology achieves better scalability than classic TM systems
because it ensures the atomicity of the CDT operations but not of their read/write se-
quences. It also retains the appealing simplicity of TM systems as novice programmers
obtain a safe (but less efficient) concurrent program if they ignore these semantics. In
summary, it gives a framework for all programmers to write software pieces that com-
bine with one another. To illustrate the performance potential of the PT methodology,
we implemented (a) the polymorphic software transactional memory (PSTM), (b) on
top of which we built a Java package of reusable CDTs that we use as a new TM bench-
mark suite on x86-64 and SPARC architectures, (c) we compared this library to the JDK
(incuding java.util.concurrent) and (d) we integrated our solution to the STAMP travel
reservation application, called vacation [13].
In contrast with lock-based and lock-free libraries, our library is reusable, thereby
simplifying the life of concurrent programmers. In fact, we prove that our semantics
combine with each other which translates into the composability and extensibility of
our library as opposed to mainstream Java, C++ and C# concurrent libraries. To write
an atomic (linearizable) CDT, the programmer writes a semantically equivalent bare
sequential ADT and annotates each of its methods with one of the existing transaction
forms without the need of altering the sequential code. To reuse existing CDTs, the
programmer can either (a) compose these CDTs by invoking their methods in a method
annotated with one existing transaction form or (b) extend these CDTs by inheriting
from them and adding new methods annotated with one of the transaction forms. If the
form of the annotation is omitted then the default form guarantees atomicity regardless
of the application semantics. The four forms of PSTM, detailed in Section 3, are as
follows:
Hand-over-hand: A form of transaction that allows update methods to run con-
currently. It builds upon a locking technique where each accessed location remains
protected until the next location(s) within the same sequence gets protected. This
technique is known as chain-locking,lock-coupling,or hand-over-hand locking [14].
As opposed to hand-over-handlocking, a hand-over-hand transaction may abort and
release all its locks rather than blocking, thus being deadlock-free.(Hand-over-hand
transactions guarantee elastic-opacity [9].)
Snapshot: A form of transaction that allows read-only methods to run concurrently
with updates. This form exploits multiversion concurrency control [15] to provide
snapshot isolation, a property of production database systems that allows reads to
execute at a different time from writes. Snapshot isolated transactions are prone to
the write-skew problem when they concurrently read a set of data and later update
disjoint subsets of these data, however, our form applies exclusively to read-only
methods and guarantees atomicity.
–Opacity:the default form of transaction. Similar to strict-serializability targeted
by database systems, opacity guarantees that transactions execute as if all their ac-
cesses were executed at some indivisible point in time (serializability) between the
time they are invoked and the time they return (strictness). In contrast with database
transactions, opaque transactions are guaranteed to never observe an inconsistent
state of the system (even transiently) be they doomed to abort or still pending [16].

Reusable Concurrent Data Types 185
Irrevocability: The form of a transaction that never aborts [17]. This form can
be used to enforce that an atomic series of accesses executes exactly once. It is
typically useful for executing I/O operations or invoking legacy code that cannot
be rolled back, however, this form should be avoided when possible as it prevents
transactions from executing concurrently.
A novel aspect of this work is to allow several transactional forms in the same ap-
plication hence raising a new interesting compatibility challenge: guaranteeing that
methods synchronized with different semantics do not affect the semantics of each
other when accessing the same mutable data concurrently. For example, consider a
hand-over-hand transaction, t
h
, reading x before a concurrent opaque transaction, t
o
,
writes x. This write-after-read (WAR) conflict would typically be detected by t
o
but ig-
nored by t
h
. Upon writing and detecting the conflict, if t
o
resolves the conflict by abort-
ing or delaying one of the two transactions, then concurrency would be suboptimal.
Conversely, if t
o
ignores the conflict, it may violate its semantics by committing: if say
a later conflict on y requires that t
o
be serialized before t
h
. To cope with this, we prevent
a WAR conflict from being resolved eagerly by the transaction that conflicts by writing,
instead it is always resolved by the transaction that conflicts by reading (regardless of
its form). This is described in Section 4 along with the resolution of write-after-write
(WAW) and read-after-write (RAW) conflicts.
To integrate our methodology in the Java programming language, we extended
the Deuce [18] bytecode instrumentation framework, so that synchronizing a bare
sequential method simply consists of annotating it with either a hand-over-hand,
a snapshot,anopaque or an irrevocable transaction. As detailed in Section 5,
the produced bytecode is automatically instrumented so that shared reads/writes
get redirected to the transactional reads/writes of the appropriate form featured
by PSTM. We only annotated few methods in our benchmarks (cf. Table 1): all
methods they call are automatically instrumented. We compared our reusable pack-
age to the JDK packages. First, we devised reusable CDTs using specific but
restrictive techniques from the JDK like java.util.Collections.synchronizedSet or
java.util.concurrent.copyOnWriteArraySet. Note that we could have also used our own
implementation of a universal construction [19] to achieve similar results. Second, we
tested mainstream non-reusable CDTs like the lock-based java.util.Vector or the lock-
free java.util.concurrent.ConcurrentLinkedQueue [20].
While our implementation could benefit from recent speculative hardware instruc-
tions, even in its software form, the PT methodology helps improving significantly the
performance of existing reusable techniques from the JDK (2.4× speedup). We also
tested as a baseline the performance of non-reusable but well-engineered JDK CDTs
and we observed great differences: while our CDTs could, in some executions, speedup
the performance of the non-reusable JDK CDTs by 4×, our experiments also outline
circumstances where reusability comes at a cost. All these experimental results are re-
ported in Section 6.
Finally, we discuss the related work in Section 7 and conclude in Section 8.

186 V. Gramoli and R. Guerraoui
2Overview
Most concurrent object-oriented libraries trade reusability off for efficiency. We distin-
guish their two reusability limitations, namely extensibility and composability issues,
and describe how the PT methodology addresses them.
2.1 Extensibility
Illustrating the issue. In Java, the ConcurrentLinkedQueue type of the JDK 7 exports
an inconsistent size method. The problem comes from the fact that this CDT aims at im-
plementing the lock-free algorithm from Michael and Scott designed to provide efficient
offer (i.e., push) and poll (i.e., pop) [20] but aims also at implementing the Collection
interface including a size method for a neat integration in the Java API. On the one
hand, a size method is useful to count the number of elements comprised in this col-
lection: although size remains optional, various Collection CDTs do provide it. On the
other hand, the algorithm of Michael and Scott was optimized to export deadlock-free
offer and poll without aiming at supporting a size method or allowing extensibility.
The problem of extending the Michael and Scott’s algorithm with a size,which
could access concurrently the same data as offer and poll, is far from being trivial,
precisely due to the way the algorithm was originally proposed. In short, the algorithm
was made deadlock-free by relying exclusively on compare-and-swap for synchroniza-
tion. Comparing-and-swappingversions of the data structure to compute the size would
annihilate effective concurrency while using locks to protect the data structure would
not prevent the offer and poll from concurrently updating the structure. This lack of
extensibility, which is inherent to the synchronization used, led expert programmers to
implement a non-atomic size method.
Specifically, this size consists of traversing the underlying linked list from the head
to the tail while elements are pushed at the head and popped at the tail. Assume that
some elements are moved from the tail to the head, one after the other, so that the size s
changes by ±1. As the size method does not protect the head and the tail of the queue, it
simply ignores any of these moved elements and returns an incorrect value way smaller
than s 1. Precisely because predicting the outcomes of this size requires to understand
the implementation internals, the resulting CDT is not reusable.
We reported this ConcurrentLinkedQueue issue to the JSR166 expert group. Follow-
ing up our report, this unexpected behavior has been warned in the documentation of
the class ConcurrentLinkedQueue on the JSR166 site since revision 1.54 and the is-
sue is still present in the JDK 7. Since then other researchers unaware of this warning
observed the same problem [21]. This size problem simply illustrates the more general
lack of extensibility. One may think of using ArrayBlockingQueue to obtain a correct
size that returns the current value of a counter, however, such a size implementation
requires to modify all insertion and removal methods to make them adjust the counter.
Apart from the size example, a programmer would have similar problems as soon as she
tries to extend these CDTs with, for example, a sum method.
The PT solution.
Figure 1 illustrates how to exploit the PT methodology to cope with
the ConcurrentLinkedQueue issue. It requires that the methods pop and push accessing

Citations
More filters
Journal Article

A Lazy Snapshot Algorithm with Eager Validation

TL;DR: This paper formally introduces a lazy snapshot algorithm that verifies at each object access that the view observed by a transaction is consistent and demonstrates that the performance is quite competitive by comparing other STMs with an STM that uses the algorithm.
Proceedings ArticleDOI

More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms

TL;DR: In this article, the authors present the most extensive comparison of synchronization techniques through a series of 31 data structure algorithms from the recent literature on 3 multicore platforms from Intel, Sun Microsystems and AMD.
Journal ArticleDOI

Elastic transactions

TL;DR: The elastic transaction model and an implementation of it are presented, then its simplicity and performance on various concurrent data structures, namely double-ended queue, hash table, linked list, and skip list are illustrated.
Journal ArticleDOI

More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms

GramoliVincent
- 24 Jan 2015 - 
TL;DR: This paper evaluates 5 different synchronization techniques through a series of 31 data structure algorithms from the R programming literature to present the most extensive comparison of synchronization techniques.
Proceedings ArticleDOI

Composing concurrency control

TL;DR: This work formalizes the usually desired requirements of concurrency control as well as stronger versions of these properties that enable composition and shows how to compose protocols satisfying these properties so that the resulting combined protocol also satisfies these properties.
References
More filters
Book

Concurrency Control and Recovery in Database Systems

TL;DR: In this article, the design and implementation of concurrency control and recovery mechanisms for transaction management in centralized and distributed database systems is described. But this can lead to interference between queries and updates.
Journal ArticleDOI

Linearizability: a correctness condition for concurrent objects

TL;DR: This paper defines linearizability, compares it to other correctness conditions, presents and demonstrates a method for proving the correctness of implementations, and shows how to reason about concurrent objects, given they are linearizable.
Proceedings ArticleDOI

Transactional memory: architectural support for lock-free data structures

TL;DR: Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.
Proceedings ArticleDOI

Software transactional memory

TL;DR: STM is used to provide a general highly concurrent method for translating sequential object implementations to non-blocking ones based on implementing a k-word compare&swap STM-transaction, a novel software method for supporting flexible transactional programming of synchronization operations.
Book

The Art of Multiprocessor Programming

TL;DR: Transactional memory as discussed by the authors is a computational model in which threads synchronize by optimistic, lock-free transactions, and there is a growing community of researchers working on both software and hardware support for this approach.
Related Papers (5)
Frequently Asked Questions (15)
Q1. What contributions have the authors mentioned in the paper "Reusable concurrent data types" ?

This paper contributes to address the fundamental challenge of building Concurrent Data Types ( CDT ) that are reusable and scalable at the same time. The authors do so by proposing the abstraction of Polymorphic Transactions ( PT ): a new programming abstraction that offers different compatible transactions that can run concurrently in the same application. 

Future work includes ( a ) formalizing a framework to derive incompatibilities of synchronization semantics and ( b ) optimizing their current implementation through concurrent irrevocable transactions [ 56 ] or transactional instruction extensions with Java opcodes to reduce overhead. 

Although monomorphic STMs stop scaling at 32 threads, they are more efficient than PSTM at lower levels of parallelism confirming their observations on micro-benchmarks. 

Note that requiring CDTs to be accessed transactionally can be enforced in Java through the use of pre-existing setters and getters as, for example, when accessing ThreadLocal variables. 

Composability is guaranteed by the fact that whatever forms protect original methods, the programmer always has the possibility to derive a composite annotated method that will execute atomically. 

The PT methodology achieves better scalability than classic TM systems because it ensures the atomicity of the CDT operations but not of their read/write sequences. 

In particular, while two traversals may be originally annotated as hand-over-hand ignoring some conflicts for the sake of concurrency, a new composite method annotated as opaque that reuses them switches their semantics to opaque. 

Its main novelty is not in providing a novel transaction semantics but in combining multiple of them toscale to high levels of parallelism as they let advanced programmers exploit the application semantics. 

The authors see two reasons: (a) some overhead is induced by the extra bookkeeping of their synchronizations that triggers the Java garbage collector more often, (b) the atomicity of the reusable size and updates precludes a lot of non-atomic executions allowed by the non-reusable skip list. 

Maintaining the minimum of versions per object that maximizes the variety of output histories comes at a cost [33]: the proposed useless-prefix multi-version (UP MV) STM guarantees this property but, as a drawback, does not support invisible reads. 

One can easily deduce a linearization point for each operation of a transaction form, at which the transaction of the corresponding form appears to execute instantaneously. 

A potential risk is that non-transactional accesses would typically observe transient states if they could access transactional CDTs as the authors do not provide strong atomicity [38]. 

Precisely because predicting the outcomes of this size requires to understand the implementation internals, the resulting CDT is not reusable. 

An advantage of their transaction annotations is that each method, be it private (e.g., ensureCapacityHelper) or public (e.g., ensureCapacity) can be annotated as a transaction. 

The PT methodology helps reaching this goal by allowing collaborative development of scalable libraries any programmer can compose and extend, hence confirming their recent observation [11].