What contributions have the authors mentioned in the paper "Reusable concurrent data types" ?

This paper contributes to address the fundamental challenge of building Concurrent Data Types ( CDT ) that are reusable and scalable at the same time. The authors do so by proposing the abstraction of Polymorphic Transactions ( PT ): a new programming abstraction that offers different compatible transactions that can run concurrently in the same application.

What are the future works mentioned in the paper "Reusable concurrent data types" ?

Future work includes ( a ) formalizing a framework to derive incompatibilities of synchronization semantics and ( b ) optimizing their current implementation through concurrent irrevocable transactions [ 56 ] or transactional instruction extensions with Java opcodes to reduce overhead.

How many threads are more efficient than monomorphic STMs?

Although monomorphic STMs stop scaling at 32 threads, they are more efficient than PSTM at lower levels of parallelism confirming their observations on micro-benchmarks.

How can a PT be enforced in Java?

Note that requiring CDTs to be accessed transactionally can be enforced in Java through the use of pre-existing setters and getters as, for example, when accessing ThreadLocal variables.

What is the meaning of a composite annotated method?

Composability is guaranteed by the fact that whatever forms protect original methods, the programmer always has the possibility to derive a composite annotated method that will execute atomically.

What is the semantics of a composite method annotated as opaque?

In particular, while two traversals may be originally annotated as hand-over-hand ignoring some conflicts for the sake of concurrency, a new composite method annotated as opaque that reuses them switches their semantics to opaque.

What is the main novelty of the Polymorphic Transaction (PT) methodology?

Its main novelty is not in providing a novel transaction semantics but in combining multiple of them toscale to high levels of parallelism as they let advanced programmers exploit the application semantics.

Why does the Java garbage collector cause overhead?

The authors see two reasons: (a) some overhead is induced by the extra bookkeeping of their synchronizations that triggers the Java garbage collector more often, (b) the atomicity of the reusable size and updates precludes a lot of non-atomic executions allowed by the non-reusable skip list.

What is the drawback of the proposed UP MV STM?

Maintaining the minimum of versions per object that maximizes the variety of output histories comes at a cost [33]: the proposed useless-prefix multi-version (UP MV) STM guarantees this property but, as a drawback, does not support invisible reads.

How can one deduce a linearization point for each operation of a transaction form?

One can easily deduce a linearization point for each operation of a transaction form, at which the transaction of the corresponding form appears to execute instantaneously.

What is the risk of a non-transactional access?

A potential risk is that non-transactional accesses would typically observe transient states if they could access transactional CDTs as the authors do not provide strong atomicity [38].

What is the advantage of the transaction annotations?

An advantage of their transaction annotations is that each method, be it private (e.g., ensureCapacityHelper) or public (e.g., ensureCapacity) can be annotated as a transaction.

How does the PT methodology help achieve this goal?

The PT methodology helps reaching this goal by allowing collaborative development of scalable libraries any programmer can compose and extend, hence confirming their recent observation [11].

(Open Access) Reusable Concurrent Data Types (2014) | Vincent Gramoli

Reusable Concurrent Data Types

Vincent Gramoli

and Rachid Guerraoui

NICTA and University of Sydney

vincent.gramoli@sydney.edu.au

EPFL

rachid.guerraoui@epfl.ch

Abstract. This paper contributes to address the fundamental challenge of build-

ing Concurrent Data Types (CDT) that are reusable and scalable at the same time.

We do so by proposing the abstraction of Polymorphic Transactions (PT): a new

programming abstraction that offers different compatible transactions that can

run concurrently in the same application.

We outline the commonality of the problem in various object-oriented

languages and implement PT and a reusable package in Java. With PT, anno-

tating sequential ADTs guarantee novice programmers to obtain an atomic and

deadlock-free CDT and let an advanced programmer leverage the application se-

mantics to get higher performance.

We compare our polymorphic synchronization against transaction-based, lock-

based and lock-free synchronizations on SPARC and x86-64 architectures and

we integrate our methodology to a travel reservation benchmark. Although our

reusable CDTs are sometimes less efﬁcient than non-composable handcrafted

CDTs from the JDK, they outperform all reusable Java CDTs.

1 Introduction

Abstract data types (ADTs) have shown to be instrumental in making sequential pro-

grams reusable [1]. ADTs promote (a) extensibility when an ADT is specialized through,

for example, inheritance by overriding or adding new methods, and (b) composability

when two ADTs are combined into another ADT whose methods invoke the original

ones. Key to this reusability is that there is no need to know the internals of an ADT

to reuse it: its interface sufﬁces. With the latest technology development of multi-core

architectures many programs are expected to scale with a large number of cores: ADTs

need thus to be shared by many threads.

Unfortunately, most ADTs that export shared methods, often called Concurrent Data

Types (CDTs), are not reusable: the programmer can hardly build upon them. For ex-

ample, programmers cannot reuse the popular concurrent data types of C++, Java and

C# libraries. CDTs typically export a set of methods, guaranteeing that, even if invoked

concurrently, each of these methods always appears as if it was executed in sequence.

This property, known as atomicity (or linearizability [2]), lets the programmer reason

in terms of sequential accesses. However, atomicity is generally not preserved under

extension or composition, hence annihilating reusability.

Basically, CDTs are synchronized using either lock-based (i.e., mutual exclusion)

or lock-free primitives (e.g., compare-and-swap). On the one hand, CDTs that rely

R. Jones (Ed.): ECOOP 2014, LNCS 8586, pp. 182–206, 2014.

 Springer-Verlag Berlin Heidelberg 2014

Reusable Concurrent Data Types 183

on locks have limited composability as a user could accidentally write two composite

methods that deadlock when calling in different order two existing methods that require

distinct locks. The same CDTs might not be extensible either as adding a new method

may require to know the lock granularity used by existing methods. On the other hand,

lock-free CDTs relying on hardware primitives can generally modify only one or two

memory words atomically, requiring the user to precisely identify these words before

obtaining a scalable and atomic composite method. Knowing these internals may, how-

ever, not even help extending lock-free CDTs as we will describe in Section 2.

Some synchronization schemes do enable reusability, yet their performance does

not scale with concurrency. Typically, Transactional Memory (TM) systems ensure that

within a sequence of shared memory reads/writes, all execute atomically (the transac-

tion commits) or none of them execute (the transaction aborts) [3,4]. One can exploit

TM to write an atomic CDT easily: it sufﬁces to (a) write the bare sequential code of the

ADT and then (b) to encapsulate each of the methods of the resulting ADT into a trans-

action. Transactional methods commit only if their execution is equivalent to a serial

one. TMs typically provide composability [5] as a new composite operation encapsu-

lated in a transaction can invoke multiple existing methods from a (transactional) CDT.

Also, speciﬁc transactions facilitate extensibility by preventing anomalies when inherit-

ing from an existing CDT [6]. Nevertheless, classic transactions are overly conservative

and clearly hamper scalability simply because they cannot exploit the application se-

mantics [7,8,9,10,11,12].

In light of this lack of scalability, expert programmers would implement handcrafted

libraries whose semantics is difﬁcult to understand to say the least: instead of being

simply equivalent to a sequential execution (or atomic), an iteration over a CDT would

typically return different results depending on the current status of concurrent updates

of the same CDT. This strategy clearly promotes scalability while preventing a program-

mer, who ignores the underlying implementation details, from reusing the abstraction.

Built-in C++ thread building block library, java.util.concurrent package and C# System

libraries all adopt this strategy, hence limiting the ability for novices to write concurrent

code in main object-oriented languages.

In this paper, we propose the Polymorphic Transaction (PT) methodology, which

helps write concurrent programs that are both scalable and reusable. Its main novelty

is not in providing a novel transaction semantics but in combining multiple of them to

Table 1. The use-cases in which we applied the PT methodology

Use-cases of the PT methodology Data structure Type Annotated Non-protected Total

methods methods

ReusableLinkedQueue Linked list Queue 13 2 15

ReusableVector Vector Collection 37 11 48

ReusableLinkedListSortedSet Linked list Set 11 4 15

ReusableHashMap Hash table Map 11 3 14

ReusableSkipListSet Skip list Set 11 4 15

Vacation Red-black trees Database 3 88 91

Total 86 112 198

184 V. Gramoli and R. Guerraoui

scale to high levels of parallelism as they let advanced programmers exploit the applica-

tion semantics. The PT methodology achieves better scalability than classic TM systems

because it ensures the atomicity of the CDT operations but not of their read/write se-

quences. It also retains the appealing simplicity of TM systems as novice programmers

obtain a safe (but less efﬁcient) concurrent program if they ignore these semantics. In

summary, it gives a framework for all programmers to write software pieces that com-

bine with one another. To illustrate the performance potential of the PT methodology,

we implemented (a) the polymorphic software transactional memory (PSTM), (b) on

top of which we built a Java package of reusable CDTs that we use as a new TM bench-

mark suite on x86-64 and SPARC architectures, (c) we compared this library to the JDK

(incuding java.util.concurrent) and (d) we integrated our solution to the STAMP travel

reservation application, called vacation [13].

In contrast with lock-based and lock-free libraries, our library is reusable, thereby

simplifying the life of concurrent programmers. In fact, we prove that our semantics

combine with each other which translates into the composability and extensibility of

our library as opposed to mainstream Java, C++ and C# concurrent libraries. To write

an atomic (linearizable) CDT, the programmer writes a semantically equivalent bare

sequential ADT and annotates each of its methods with one of the existing transaction

forms without the need of altering the sequential code. To reuse existing CDTs, the

programmer can either (a) compose these CDTs by invoking their methods in a method

annotated with one existing transaction form or (b) extend these CDTs by inheriting

from them and adding new methods annotated with one of the transaction forms. If the

form of the annotation is omitted then the default form guarantees atomicity regardless

of the application semantics. The four forms of PSTM, detailed in Section 3, are as

follows:

– Hand-over-hand: A form of transaction that allows update methods to run con-

currently. It builds upon a locking technique where each accessed location remains

protected until the next location(s) within the same sequence gets protected. This

technique is known as chain-locking,lock-coupling,or hand-over-hand locking [14].

As opposed to hand-over-handlocking, a hand-over-hand transaction may abort and

release all its locks rather than blocking, thus being deadlock-free.(Hand-over-hand

transactions guarantee elastic-opacity [9].)

– Snapshot: A form of transaction that allows read-only methods to run concurrently

with updates. This form exploits multiversion concurrency control [15] to provide

snapshot isolation, a property of production database systems that allows reads to

execute at a different time from writes. Snapshot isolated transactions are prone to

the write-skew problem when they concurrently read a set of data and later update

disjoint subsets of these data, however, our form applies exclusively to read-only

methods and guarantees atomicity.

–Opacity:the default form of transaction. Similar to strict-serializability targeted

by database systems, opacity guarantees that transactions execute as if all their ac-

cesses were executed at some indivisible point in time (serializability) between the

time they are invoked and the time they return (strictness). In contrast with database

transactions, opaque transactions are guaranteed to never observe an inconsistent

state of the system (even transiently) be they doomed to abort or still pending [16].

Reusable Concurrent Data Types 185

– Irrevocability: The form of a transaction that never aborts [17]. This form can

be used to enforce that an atomic series of accesses executes exactly once. It is

typically useful for executing I/O operations or invoking legacy code that cannot

be rolled back, however, this form should be avoided when possible as it prevents

transactions from executing concurrently.

A novel aspect of this work is to allow several transactional forms in the same ap-

plication hence raising a new interesting compatibility challenge: guaranteeing that

methods synchronized with different semantics do not affect the semantics of each

other when accessing the same mutable data concurrently. For example, consider a

hand-over-hand transaction, t

, reading x before a concurrent opaque transaction, t

writes x. This write-after-read (WAR) conﬂict would typically be detected by t

but ig-

nored by t

. Upon writing and detecting the conﬂict, if t

resolves the conﬂict by abort-

ing or delaying one of the two transactions, then concurrency would be suboptimal.

Conversely, if t

ignores the conﬂict, it may violate its semantics by committing: if say

a later conﬂict on y requires that t

be serialized before t

. To cope with this, we prevent

a WAR conﬂict from being resolved eagerly by the transaction that conﬂicts by writing,

instead it is always resolved by the transaction that conﬂicts by reading (regardless of

its form). This is described in Section 4 along with the resolution of write-after-write

(WAW) and read-after-write (RAW) conﬂicts.

To integrate our methodology in the Java programming language, we extended

the Deuce [18] bytecode instrumentation framework, so that synchronizing a bare

sequential method simply consists of annotating it with either a hand-over-hand,

a snapshot,anopaque or an irrevocable transaction. As detailed in Section 5,

the produced bytecode is automatically instrumented so that shared reads/writes

get redirected to the transactional reads/writes of the appropriate form featured

by PSTM. We only annotated few methods in our benchmarks (cf. Table 1): all

methods they call are automatically instrumented. We compared our reusable pack-

age to the JDK packages. First, we devised reusable CDTs using speciﬁc but

restrictive techniques from the JDK like java.util.Collections.synchronizedSet or

java.util.concurrent.copyOnWriteArraySet. Note that we could have also used our own

implementation of a universal construction [19] to achieve similar results. Second, we

tested mainstream non-reusable CDTs like the lock-based java.util.Vector or the lock-

free java.util.concurrent.ConcurrentLinkedQueue [20].

While our implementation could beneﬁt from recent speculative hardware instruc-

tions, even in its software form, the PT methodology helps improving signiﬁcantly the

performance of existing reusable techniques from the JDK (2.4× speedup). We also

tested as a baseline the performance of non-reusable but well-engineered JDK CDTs

and we observed great differences: while our CDTs could, in some executions, speedup

the performance of the non-reusable JDK CDTs by 4×, our experiments also outline

circumstances where reusability comes at a cost. All these experimental results are re-

ported in Section 6.

Finally, we discuss the related work in Section 7 and conclude in Section 8.

186 V. Gramoli and R. Guerraoui

2Overview

Most concurrent object-oriented libraries trade reusability off for efﬁciency. We distin-

guish their two reusability limitations, namely extensibility and composability issues,

and describe how the PT methodology addresses them.

2.1 Extensibility

Illustrating the issue. In Java, the ConcurrentLinkedQueue type of the JDK 7 exports

an inconsistent size method. The problem comes from the fact that this CDT aims at im-

plementing the lock-free algorithm from Michael and Scott designed to provide efﬁcient

oﬀer (i.e., push) and poll (i.e., pop) [20] but aims also at implementing the Collection

interface including a size method for a neat integration in the Java API. On the one

hand, a size method is useful to count the number of elements comprised in this col-

lection: although size remains optional, various Collection CDTs do provide it. On the

other hand, the algorithm of Michael and Scott was optimized to export deadlock-free

oﬀer and poll without aiming at supporting a size method or allowing extensibility.

The problem of extending the Michael and Scott’s algorithm with a size,which

could access concurrently the same data as oﬀer and poll, is far from being trivial,

precisely due to the way the algorithm was originally proposed. In short, the algorithm

was made deadlock-free by relying exclusively on compare-and-swap for synchroniza-

tion. Comparing-and-swappingversions of the data structure to compute the size would

annihilate effective concurrency while using locks to protect the data structure would

not prevent the oﬀer and poll from concurrently updating the structure. This lack of

extensibility, which is inherent to the synchronization used, led expert programmers to

implement a non-atomic size method.

Speciﬁcally, this size consists of traversing the underlying linked list from the head

to the tail while elements are pushed at the head and popped at the tail. Assume that

some elements are moved from the tail to the head, one after the other, so that the size s

changes by ±1. As the size method does not protect the head and the tail of the queue, it

simply ignores any of these moved elements and returns an incorrect value way smaller

than s− 1. Precisely because predicting the outcomes of this size requires to understand

the implementation internals, the resulting CDT is not reusable.

We reported this ConcurrentLinkedQueue issue to the JSR166 expert group. Follow-

ing up our report, this unexpected behavior has been warned in the documentation of

the class ConcurrentLinkedQueue on the JSR166 site since revision 1.54 and the is-

sue is still present in the JDK 7. Since then other researchers unaware of this warning

observed the same problem [21]. This size problem simply illustrates the more general

lack of extensibility. One may think of using ArrayBlockingQueue to obtain a correct

size that returns the current value of a counter, however, such a size implementation

requires to modify all insertion and removal methods to make them adjust the counter.

Apart from the size example, a programmer would have similar problems as soon as she

tries to extend these CDTs with, for example, a sum method.

The PT solution.

Figure 1 illustrates how to exploit the PT methodology to cope with

the ConcurrentLinkedQueue issue. It requires that the methods pop and push accessing

Reusable Concurrent Data Types

Figures

Citations

A Lazy Snapshot Algorithm with Eager Validation

More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms

Elastic transactions

More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms

Composing concurrency control

References

Concurrency Control and Recovery in Database Systems

Linearizability: a correctness condition for concurrent objects

Transactional memory: architectural support for lock-free data structures

Software transactional memory

The Art of Multiprocessor Programming

Related Papers (5)

Transactional locking II

A lazy snapshot algorithm with eager validation

Software transactional memory for dynamic-sized data structures

Transactional boosting: a methodology for highly-concurrent transactional objects

Dynamic performance tuning of word-based software transactional memory

Frequently Asked Questions (15)

Q1. What contributions have the authors mentioned in the paper "Reusable concurrent data types" ?

Q2. What are the future works mentioned in the paper "Reusable concurrent data types" ?

Q3. How many threads are more efficient than monomorphic STMs?

Q4. How can a PT be enforced in Java?

Q5. What is the meaning of a composite annotated method?

Q6. What is the main novelty of the PT methodology?

Q7. What is the semantics of a composite method annotated as opaque?

Q8. What is the main novelty of the Polymorphic Transaction (PT) methodology?

Q9. Why does the Java garbage collector cause overhead?

Q10. What is the drawback of the proposed UP MV STM?

Q11. How can one deduce a linearization point for each operation of a transaction form?

Q12. What is the risk of a non-transactional access?

Q13. Why is the resulting CDT not reusable?

Q14. What is the advantage of the transaction annotations?

Q15. How does the PT methodology help achieve this goal?