What have the authors contributed in "Stretching transactional memory" ?

The authors revisit the main STM design choices from the perspective of complex workloads and propose a new STM, which they call SwissTM. Beyond SwissTM, the authors present the most complete evaluation to date of the individual impact of various STM design choices on the ability to support the mixed workloads of large applications.

What future works have the authors mentioned in the paper "Stretching transactional memory" ?

Further experiments might be needed in this direction. Two main directions along which the authors plan to improve the semantical guarantees of SwissTM are: ( 1 ) adding compiler support, and ( 2 ) making SwissTM privatizationsafe. There exists a number of STM C/C++ compilers that have open interfaces supporting different STM libraries ( e. g. [ 14, 29 ] ) and the authors plan to integrate SwissTM with one of them. While this algorithm is simple, it would probably significantly impact performance of SwissTM [ 42 ] and the authors plan to investigate other options, possibly using techniques similar to [ 28 ] or [ 25 ].

What is the way to prevent a transaction from re-executing?

It might seem beneficial to make transactions restart as soon as possible after conflicts that force them to rollback, as waiting just decreases the reaction time before the transaction re-executes.

What are the main directions of the improvement of SwissTM?

Two main directions along which the authors plan to improve the semantical guarantees of SwissTM are: (1) adding compiler support, and (2) making SwissTM privatizationsafe.

What is the role of T in detecting conflicts?

When T is invisible, T has the sole responsibility of detecting conflicts on x with transactions that write x concurrently, i.e., validating its read set.

What is the main reason for SwissTM outperforming other STMs?

SwissTM significantly outperforms all other STMs for both read-dominated and read-write workloads, while also achieving superior scalability.

How do the authors map memory word m to a lock table entry?

To map memory word m to a lock table entry, the authors take the address a of m, shift it to the right by 4 (it would be 5 with 64-bit words).

What is the impact of using different lock granularities on performance?

It is interesting to note here that, while using different lock granularities does impact performance, the impact of using coarser lock granularities is not significant enough to prevent SwissTM from scaling (e.g. due to increased number of false conflicts).

Why does the lazy conflict detection STM react so slowly to write/write conflicts?

Because of this, lazy conflict detection STMs react too slowly to write/write conflicts (which are good signs that transactions cannot proceed in parallel) and results in transactions performing work that has to be rolled back later.

(Open Access) On the correctness of transactional memory (2008) | Rachid Guerraoui

Stretching Transactional Memory

Aleksandar Dragojevi´c Rachid Guerraoui Michał Kapałka

Ecole Polytechnique F´ed´erale de Lausanne, School of Computer and Communication Sciences, I&C, Switzerland

{aleksandar.dragojevic, rachid.guerraoui, michal.kapalka}@epﬂ.ch

Abstract

Transactional memory (TM) is an appealing abstraction for pro-

gramming multi-core systems. Potential target applications for TM,

such as business software and video games, are likely to involve

complex data structures and large transactions, requiring speciﬁc

software solutions (STM). Sofar, however, STMs have been mainly

evaluated and optimized for smaller scale benchmarks.

We revisit the main STM design choices from the perspec-

tive of complex workloads and propose a new STM, which we

call SwissTM. In short, SwissTM is lock- and word-based and

uses (1) optimistic (commit-time) conﬂict detection for read/write

conﬂicts and pessimistic (encounter-time) conﬂict detection for

write/write conﬂicts, as well as (2) a new two-phase contention

manager that ensures the progress of long transactions while induc-

ing no overhead on short ones. SwissTM outperforms state-of-the-

art STM implementations, namely RSTM, TL2, and TinySTM, in

our experiments on STMBench7, STAMP, Lee-TM and red-black

tree benchmarks.

Beyond SwissTM, we present the most complete evaluation to

date of the individual impact of various STM design choices on the

ability to support the mixed workloads of large applications.

Categories and Subject Descriptors D.1.3 [Programming Tech-

niques]: Concurrent Programming; D.2.8 [Software Engineering]:

Metrics—performance measures

General Terms Measurement, Performance, Experimentation

Keywords Software transactional memories, Benchmarks

1. Introduction

Transactional memory (TM) is an appealing abstraction for making

concurrent programming accessible to a wide community of non-

expert programmers while avoiding the pitfalls of critical sections.

With a TM, application threads communicate by executing opera-

tions on shared data inside lightweight in-memory transactions. A

transaction performs a number of actions and then either commits,

in which case all the actions are applied to shared data atomically,

or aborts, in which case the effects of those actions are rolled back

and never visible to other transactions. From a programmer’s per-

spective, the TM paradigm is very promising as it promotes pro-

gram composition [20], in contrast to explicit locking, while still

providing the illusion that all shared objects are protected by some

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for proﬁt or commercial advantage and that copies bear this notice and the full citation

on the ﬁrst page. To copy otherwise, to republish, to post on servers or to redistribute

to lists, requires prior speciﬁc permission and/or a fee.

PLDI’09,

June 15–20, 2009, Dublin, Ireland.

 2009 ACM 978-1-60558-392-1/09/06. ..$5.00

global lock. Yet, it offers the possibility of performance comparable

to hand-crafted, ﬁne-grained locking.

A possible target of TMs are large applications such as business

software or video games: the size of these applications make them

ideal candidates to beneﬁt from emerging multi-core architectures.

Such applications typically involve dynamic and non-uniform data

structures consisting of many objects of various complexity. For

example, a video gameplay simulation can use up to 10, 000 active

interacting game objects, each having mutable state, being updated

30–60 times per second, and causing changes to 5–10 other objects

on every update [40]. Unless a TM is used, making such code

thread-safe and scalable on multi-cores is a daunting task [40]. The

big size and complexity of such applications can, in turn, easily

lead to large transactions, for these can naturally be composed [20].

Some TM interfaces [1], in fact, promote theencapsulation of entire

applications within very few transactions.

The motivation of this work is to explore the ability of software

mechanisms to effectively support mixed workloads consisting of

small and large transactions, as well as possibly complex data

structures. We believe this to be of practical relevance because even

if hardware TM support becomes widely available in the future, it is

likely that only smaller-scale transactional workloads will be fully

executed in hardware, while software support will still be needed

for transactions with large read and write sets. For example, the

hybrid hardware/software scheme proposed in [26] switches from

full hardware TM to full software TM when it encounters large

transactions. The ability of STM systems to effectively deal with

large transactions will be crucial in these settings as well.

Since the seminal paper on a software TM (STM) that supported

dynamic data structures and unbounded transactions [22], all mod-

ern STMs are supposed to handle complex workloads [22, 27, 10,

31, 21, 2, 35, 29]. A wide variety of STM techniques, mainly in-

spired by database algorithms, have been explored. The big chal-

lenge facing STM researchers is to determine the right combination

of strategies that suit the requirements of concurrent applications—

requirements that are signiﬁcantly different than those of database

applications. So far, however, most STM experiments have been

performed using benchmarks characterized by small transactions,

simple and uniform data structures, or regular data access patterns.

While such experiments reveal performance differences between

STM implementations, they are not fully representative of com-

plex workloads that STMs are likely to get exposed to once used in

real applications. Worse, they can mislead STM implementors by

promoting certain strategies that may perform well in small-scale

applications but are counter-productive with complex workloads.

Examples of such strategies, which we discuss in more details later

in the paper, include the following.

1. The commit-time locking scheme, used for instance in TL2 [10],

is indeed effective for short transactions, but might waste sig-

niﬁcant work of longer transactions that eventually abort due

to write/write conﬂicts. This is because write/write conﬂicts,

which usually lead to transaction aborts

, are detected too late.

2. The encounter-time locking scheme, used by most STMs, e.g.,

TinySTM [31], McRT-STM [35, 29], and Bartok-STM [21] im-

mediately aborts a transaction that tries to read a memory loca-

tion locked by another transaction. Hence, read/write conﬂicts,

which can often be handled without aborts, are detected very

early and resolved by aborting readers. Long transactions that

write memory locations commonly read by other transactions

might thus end up blocking many other transactions, and for a

long time, thus slowing down the system overall.

3. The timid contention management scheme, used by many

STMs, especially word-based ones such as TL2 and TinySTM,

and which aborts transactions immediately upon a conﬂict, fa-

vorsshort transactions. Contention managers such as Greedy [16]

or Serializer [34] are more appropriate for large transactions,

but are hardly ever used due to the overhead they impose on

short transactions.

It is appealing but challenging to come up with strategies that

account both for long transactions and complex workloads, as well

as for short transactions and simple data structures: these might

indeed typically co-exist in real applications. This paper is a ﬁrst

step towards taking that challenge up. We perform that step through

SwissTM, a new lock- and word-based STM. The main distinctive

features of SwissTM are:

•

A conﬂict detection scheme that detects (a) write/write con-

ﬂicts eagerly, in order to prevent transactions that are doomed

to abort from running and wasting resources, and (b) read/write

conﬂicts late, in order to optimistically allow more parallelism

between transactions. In short, transactions eagerly acquire ob-

jects for writing, which helps detect write/write conﬂicts as

soon as they appear. This also avoids wasting work of trans-

actions that are already doomed to abort after a write/write con-

ﬂict. By using invisible reads and allowing transactions to read

objects acquired for writing, SwissTM detects read/write con-

ﬂicts late, thus increasing inter-transaction parallelism. A time-

based scheme [10, 33] is used to reduce the cost of transaction

validation with invisible reads.

•

A two-phase contention manager that incurs no overhead on

read-only and short read-write transactions while favoring the

progress of transactions that have performed a signiﬁcant num-

ber of updates. Basically, transactions that are short or read-only

use the simple but inexpensive timid contention management

scheme, aborting on ﬁrst encountered conﬂict. Transactions that

are more complex switch dynamically to the Greedy mecha-

nism that involves more overhead but favors these transactions,

preventing starvation. Additionally, transactions that abort due

to write/write conﬂicts back-off for a period proportional to the

number of their successive aborts, hence reducing contention

on memory hot spots.

We evaluate SwissTM with state-of-the-art STMs by using

benchmarks that cover a large part of the complexity space. We

start with STMBench7 [18], which involves (1) non-uniform data

structures of signiﬁcant size, and (2) a mix of operations of various

length and data access patterns. Then, we move to Lee-TM [4]—a

benchmark with large but regular transactions—and STAMP [8]—

a collection of realistic medium-scale workloads. Finally, we eval-

uate SwissTM with a red-black tree microbenchmark that involves

very short and simple transactions. SwissTM outperforms state-of-

Pure write/write conﬂicts do not necessarily lead to transaction aborts, but

are very rare—most transactions read memory locations before updating

them.

STM design choices

Acquire Reads CM Effectiveness

lazy invisible any +

eager visible any +

eager invisible Polka +

eager invisible timid or Greedy ++

mixed invisible timid or Greedy +++

mixed invisible 2-phase ++++

Table 1. A summary comparison of the effectiveness of selected

combinations of STM design choices in mixed workloads.

the-art STMs—RSTM [27], TL2 [10], and TinySTM [31]—in all

the considered benchmarks. For example, in the read-dominated

workload of STMBench7 (90% of read-only operations), SwissTM

outperforms the other STMs by up to 65%, and in the write-

dominated workload (10% of read-only operations)—by up to

10%. Also, SwissTM provides a better scalability than the other

STMs, especially for read-dominated and read-write (60% of read-

only operations) workloads of STMBench7.

We compare SwissTM to RSTM, TL2, and TinySTM for two

reasons.

•

They constitute the state-of-the-art performance-wise, among

the publicly available library-based STMs. Furthermore, just

like SwissTM, they can be used to manually instrument concur-

rent applications with transactional accesses. Indeed, our goal

is to evaluate the performance of the core STM algorithm, not

the efﬁciency of the higher layers such as STM compilers. We

did not use for instance McRT-STM [35, 29], because it does

not expose such a low-level API to a programmer. Evaluat-

ing STM-aware compilers (which naturally introduce additional

overheads above the low-level STM interface [42, 6]) is largely

an orthogonal issue;

•

They represent a wide spectrum of known TM design choices:

obstruction-free vs. lock-based implementation, eager vs. lazy

updates, invisible vs. visible reads, and word-level vs. object-

level access granularity. They also allow for experiments with

a variety of contention management strategies, from simply

aborting a transaction on a conﬂict, through exponential back-

off, up to advanced contention managers like Greedy [16],

Serializer [34], or Polka [41].

We report on our SwissTM (trial-and-error) experience, which

we believe is interesting in its own right. It is the ﬁrst to date that

evaluates the ability of software solutions to provide good per-

formance to large transactions and complex objects without intro-

ducing signiﬁcant overheads on short transactions and simple data

structures. We evaluate the individual impact of various STM de-

sign choices on the ability to support mixed workloads. A summary

of our observations, is presented in Table 1.

From an implementation perspective, we also evaluate the im-

pact of the locking granularity. Word-based STM implementations

used so far either word-level locking (e.g., TL2 and TinySTM) or

cache-line level locking (e.g., McRT-STM C/C++). Our sensitivity

analysis shows that a lock granularity of four words outperforms

both word-level and cache line-level locking by 4% and 5% re-

spectively across all benchmarks we considered.

To summarize, the main contributions of this paper are (1) the

design and implementation of an STM that performs particularly

well with large-scale complex transactional workloads while hav-

ing good performance in small-scale ones, and (2) an extensive ex-

perimental evaluation of STM strategies and implementations from

the perspective of complex applications with mixed workloads.

The rest of the paper is organized as follows. In Section 2,

we give a short overview of STM design space and benchmarks.

We then present SwissTM in Section 3. In Sections 4 and 5, we

present the results of our experimental evaluation: ﬁrst, we com-

pare the performance of SwissTM to that of TL2, TinySTM, and

RSTM, and, second, we evaluate the individual impact of the de-

sign choices underlying SwissTM.

2. Background

Transactional memory was ﬁrst proposed in hardware (HTM) [23].

So far, most HTMs support only limited-size transactions and of-

ten do not ensure transaction progress upon speciﬁc system events,

e.g., interrupts, context switches, or function calls [9]. While there

have been proposals for truly dynamic HTMs (e.g. [3, 32]), it

is very likely that actual HTM implementations will still have

some of these limitations. Hybrid approaches either execute short

transactions in hardware and fall back to software for longer ones

(e.g., [26]), or accelerate certain operations of an STM in hard-

ware. This work focuses on pure software solutions (STM) [37].

In this section, we survey some distinctive features of STMs and

discuss the three representative STMs we focus on in our evalua-

tion: RSTM [27], TL2 [10], and TinySTM [31] (see [24] for a full

survey). We also give a short description of the benchmarks used in

our experiments.

2.1 STM Design Space

The main task of an STM is to detect conﬂicts among concurrent

transactions and resolve them. Deciding what to do when conﬂicts

arise is performed by a (conceptually) separate component called

a contention manager [22]. A concept closely related to conﬂict

detection is that of validation. Validating a transaction consists of

checking its read set (i.e., the set of locations

the transaction has

already read) for consistency.

Two classes of STMs can be distinguished, word-based and

object-based, depending on the granularity at which they perform

logging. RSTM is object-based while TL2 and TinySTM are word-

based. There are also two general classes of STM implementa-

tions: lock-based and obstruction-free. Lock-based STMs, ﬁrst pro-

posed in [19, 12], implement some variant of the two-phase lock-

ing protocol [13]. Obstruction-free STMs [22] do not use any

blocking mechanisms (such as locks), and guarantee progress even

when some of the transactions are delayed. RSTM (version 3) is

obstruction-free, while TL2 and TinySTM internally use locks.

Conﬂict detection. Most STMs employ the single-writer-multiple-

readers strategy; accesses to the same location by concurrent trans-

actions conﬂict when at least one of the accesses is a write (update).

In order to commit, a transaction T must eventually acquire every

location x that is updated by T. Acquisition can be eager, i.e., at

the time of the ﬁrst update operation of T on x, or lazy, i.e., at the

commit time of T. A transaction T that reads x can be either visible

or invisible [27] to other transactions accessing x. When T is invis-

ible, T has the sole responsibility of detecting conﬂicts on x with

transactions that write x concurrently, i.e., validating its read set.

The time complexity of a basic validation algorithm is proportional

to the size of the read set, but can be boosted with a global commit

counter heuristic (RSTM), or a time-based scheme [10, 31] (TL2

and TinySTM).

A mixed invalidation conﬂict detection scheme (ﬁrst proposed

in [39]) eagerly detects write/write conﬂicts while lazily detecting

read/write conﬂicts (it is a mix between pure lazy and pure eager

schemes). A similar conﬂict detection scheme is provided by more

These are memory words in word-based STMs and objects in object-based

STMs.

general (but also more expensive) multi-versioning schemes used in

LSA-STM [33] and JVSTM [7]. Mixed invalidation, which under-

lies SwissTM, has never been used with lock-based or word-based

STMs, nor has it been evaluated with any large-scale workload.

RSTM supports lazy and eager acquisition, as well as visi-

ble and invisible reads (i.e., four algorithm variants). TL2 and

TinySTM use, respectively, lazy and eager acquisition. Both TL2

and TinySTM employ invisible reads.

Contention management. The contention manager decides what

a given transaction (attacker) should do in case of a conﬂict with

another transaction (victim). Possible outcomes are: aborting the

attacker, aborting the victim, or forcing the attacker to retry after

some period.

The simplest scheme (which we call timid) is to always abort the

attacker (possibly with a short back-off). This is the default scheme

in TL2 and TinySTM. More involved contention managers were

proposed in [41, 36, 16], and are provided with RSTM. They can

also be combined at run-time [15]. Polka [41] assigns every trans-

action a priority that is equal to the number of objects the trans-

action accessed so far. Whenever the attacker waits, its priority is

temporarily increased by one. If the attacker has a lower priority

than the victim, it will be forced to wait (using exponential back-

off to calculate the wait interval), otherwise the victim gets aborted.

Greedy assigns each transaction a unique, monotonically increas-

ing timestamp on its start. The transaction with the lower times-

tamp always wins. An important property of Greedy is that, un-

like other contention managers we mention, it avoids starvation of

transactions. Polka has been shown to provide best performance in

smaller-scale benchmarks previously [41], while our experiments

show that Greedy performs better in large-scale workloads (Sec-

tion 5). Serializer is very similar to Greedy except that it assigns a

new timestamp to a transaction on every restart, and thus does not

prevent starvation or even livelocks of transactions.

2.2 STM Benchmarks

In this section, we give an overview of the benchmarks we use

in our experiments. These represent a large spectrum of workload

types: from simple data structures with small transactions (the red-

black tree microbenchmark) to complex applications with possibly

long transactions (STMBench7). All the benchmarks we used are

implemented in C/C++.

STMBench7. STMBench7 [18] is a synthetic benchmark which

workloads aim at representing realistic, complex, object-oriented

applications that are an important target for STMs. STMBench7

exhibits a large variety of operations (from very short, read-only

operations to very long ones that modify large parts of the data

structure) and workloads (from workloads consisting mostly of

read-only transactions to write-dominated ones). The data structure

used by STMBench7 is many orders of magnitude larger than in

other typical STM benchmarks. Also, its transactions are longer

and access larger numbers of objects.

STMBench7 is inherently object-based and its implementations

also use standard language libraries. A thin wrapper, described

in [11], is thus necessary to use STMBench7 with word-based

STMs (TL2, TinySTM, and SwissTM).

STAMP. STAMP [8] is a TM benchmarking suite that consists of

eight different transactional programs and ten workloads.

STAMP

applications are representative of various real-world workloads, in-

cluding bioinformatics, engineering, computer graphics, and ma-

chine learning. While STAMP covers a broad range of possible

STM uses, its does not involve very long transactions, such as those

that might be produced by average, non-expert programmers or

We used STAMP version 0.9.9.

generated automatically by a compiler along the lines of [1]. Fur-

thermore, some STAMP algorithms (e.g., bayes) split logical op-

erations into multiple transactions and use intricate programming

techniques that might not be representative of average program-

mers’ skills.

Lee-TM. Lee-TM [4] is a benchmark that offers large, realistic

workloads and is based on Lee’s circuit routing algorithm. The al-

gorithm takes pairs of points (e.g., of an integrated circuit) as its

input and produces non-intersecting routes between them. While

transactions of Lee-TM are signiﬁcant in size, they exhibit very

regular access patterns—every transaction ﬁrst reads a large num-

ber of locations (searching for suitable paths) and then updates a

small number of them (setting up a path). Moreover, the bench-

mark uses very simple objects (each can be represented as a single

integer variable). It is worth noting that STAMP contains an appli-

cation (called labyrinth) that uses the same algorithm as Lee-TM.

However, Lee-TM uses real-world input sets that make it more re-

alistic than labyrinth. Lee-TM distribution includes two input data

sets: memory and main circuit boards.

Red-black tree. The prevailing way of measuring the perfor-

mance of STMs has been through microbenchmarks. The widely

used (ﬁrst in [22]) red-black tree microbenchmark consists of

short transactions that insert, lookup, and remove elements from

a red-black tree data structure. Short and simple transactions of mi-

crobenchmarks are good for testing mechanics of STM itself and

comparing low-level details of various implementations.

3. SwissTM

SwissTM is a lock-based STM that uses invisible reads and counter

based heuristics (the same as in TinySTM and TL2). It features

eager write/write and lazy read/write conﬂict detection, as well as

a two-phase contention manager with random linear back-off. The

API of SwissTM is word-based, as it enables transactional access

to arbitrary memory words. SwissTM uses a redo-logging scheme

(partially to support its conﬂict detection scheme).

3.1 Programming model

Similarly to most other STM libraries, SwissTM guarantees opac-

ity [17]. Opacity is similar to serializability in database sys-

tems [30]. The main difference is that all transactions always ob-

serve consistent states of the system. This means that transactions

cannot, e.g., use stale values, and that they do not require periodic

validation or sandboxing to prevent inﬁnite loops or crashes due to

accesses to inconsistent memory states.

SwissTM is a weakly atomic STM, i.e., it does not provide

any guarantees for the code that accesses the same data from both

inside and outside of transactions. SwissTM is not privatization

safe [38]. This could make programming with SwissTM slightly

more difﬁcult in certain cases, but did not affect us, as none of the

benchmarks we use requires privatization-safe STM.

When programming with SwissTM, programmers have to re-

place all memory references to shared data from inside transactions

with SwissTM calls for reading and writing memory words. The

programming model can be improved by using an STM compiler

(as in e.g. [21, 2, 14, 29]). While the compiler instrumentation can

degrade performance due to over-instrumentation [42] and possi-

bly even change the characteristics of the workload slightly (e.g.

numbers and ratio of transactional read and write operations), the

compiler instrumentation remains a largely orthogonal issue to the

performance of an STM library.

Other three STMs we compare to in our experiments provide

the same semantical guarantees as SwissTM. Also, strengthening

the guarantees (as described in Section 6) would have a similar

performance impact on all STMs we use.

3.2 Algorithm

We give the pseudo-code of SwissTM in Algorithm 1. The algo-

rithm invokes contention manager functions (cm-*), which are de-

ﬁned in Algorithm 2 and described below. All transactions share a

global commit counter commit-ts incremented by every non-read-

only transaction upon commit. Each memory word m is mapped

to a pair of locks in a global lock table: r-lock (read) and w-lock

(write). Lock w-lock is acquired by a writer T of m (eagerly) to pre-

vent other transactions from writing to m. Lock r-lock is acquired

by T at commit time to prevent other transactions from reading

word m and, as a result, observing inconsistent states of words writ-

ten by T. In addition, when r-lock is unlocked, it contains the ver-

sion number of m. Every transaction T has a transaction descriptor

tx that contains (among other data): (1) the value of commit-ts read

at the start or subsequent validation of T , and (2) read and write

logs of T.

Transaction start. Every transaction T, upon its start, reads the

global counter commit-ts and stores its value in tx.valid-ts (line 2).

Reading. When reading location addr, transaction T ﬁrst reads

the value of w-lock to detect possible read-after-write cases. If T is

the owner of w-lock, then T can return the value from its write log

immediately, which is the last value T has written to addr (line 6).

Otherwise, i.e., when some other transaction owns w-lock or when

w-lock is unlocked, T reads the value of r-lock, then the value of

addr, and then again the value of r-lock. Transaction T repeats these

three reads until (1) two values of r-lock are the same, meaning

that T has read consistent values of r-lock and addr, and (2) r-lock

is unlocked (lines 8–15). When r-lock is unlocked, it contains the

current version v of addr. If v is lower or equal to the validation

timestamp tx.valid-ts of T (which means that addr has not changed

since T’s last validation or start), T returns the value at addr read

in line 18. Otherwise, T revalidates its read set. If the revalidation

does not succeed, T rolls back (line 17). If it succeeds, the read

operation returns and T extends its validation timestamp tx.valid-ts

to the current value of commit-ts (line 56).

Writing. Whenever some transaction T writes to a memory lo-

cation addr, T ﬁrst checks if T is the owner of the lock w-lock

corresponding to addr. If it is, T updates the value of addr in its

write log and returns (lines 21–23). Otherwise, T tries to acquire

w-lock by atomically replacing, using a compare-and-swap (CAS)

operation, value unlocked with the pointer to the T’s write log en-

try that contains the new value of addr (line 29). If CAS does not

succeed, T asks the contention manager whether to rollback and

retry or wait for the current owner of the lock to ﬁnish (line 26). In

order to guarantee opacity, T has to revalidate its read set if the cur-

rent version of addr (contained in r-lock) is higher than its validity

timestamp tx.valid-ts (lines 31–32).

Validation. To validate itself, T compares the versions of all

memory locations read so far to their versions at the point they

were initially read by T (lines 51–52). These versions are stored in

T’s read log. If there is a mismatch between any version numbers,

the validation fails (line 52).

Commit. A read-only transaction T can commit immediately, as

its read log is guaranteed to be consistent (line 35). A transaction T

that is not read-only ﬁrst locks all read locks of memory locations T

has written to (line 36). Then, T increments commit-ts (line 37) and

re-validates its read log. If the validation does not succeed, T roll-

backs and restarts (lines 38–41). Upon successful validation, T tra-

verses its write set, updates values of all written memory locations,

and releases the corresponding read and write locks (lines 42–45).

When releasing read locks, T writes the new value of commit-ts to

those locks.

Algorithm 1: Pseudo-code representation of SwissTM.

function start(tx)1

tx.valid-ts ← commit-ts;2

cm-start(tx);3

function read-word(tx, addr)4

(r-lock, w-lock) ← map-addr-to-locks(addr);5

if is-locked-by(w-lock, tx) then return get-value(w-lock, addr);6

version ← read(r-lock);7

while true do8

if version = locked then9

version ← read(r-lock);10

continue;11

value ← read(addr);12

version2 ← read(r-lock);13

if version = version2 then break;14

version2 ← version;15

add-to-read-log(tx, r-lock, version);16

if version > tx.valid-ts and not extend(tx) then rollback(tx);17

return value;18

function write-word(tx, addr, value)19

(r-lock, w-lock) ← map-addr-to-locks(addr);20

if is-locked-by(w-lock, tx) then21

update-log-entry(w-lock, addr, value);22

return;23

while true do24

if is-locked(w-lock) then25

if cm-should-abort(tx, w-lock) then rollback(tx);26

else continue;27

log-entry ← add-to-write-log(tx, w-lock, addr, value);28

if compare&swap(w-lock, unlocked, log-entry) then29

break;30

if read(r-lock) > tx.valid-ts and not extend(tx) then31

rollback(tx);32

cm-on-write(tx);33

function commit(tx)34

if is-read-only(tx) then return;35

for log-entry in tx.read-log do write(log-entry.r-lock, locked);36

ts ← increment&get(commit-ts);37

if ts > tx.valid-ts + 1 and not validate(tx) then38

for log-entry in tx.read-log do39

write(log-entry.r-lock, log-entry.version);40

rollback(tx);41

for log-entry in tx.write-log do42

write(log-entry.addr, log-entry.value);43

write(log-entry.r-lock, ts);44

write(log-entry.w-lock, unlocked);45

function rollback(tx)46

for log-entry in tx.write-log do47

write(log-entry.w-lock, unlocked);48

cm-on-rollback(tx);49

function validate(tx)50

for log-entry in tx.read-log do51

if log-entry.version 6= read(log-entry.r-lock) and not52

is-locked-by(log-entry.r-lock, tx) then return false;

return true;53

function extend(tx)54

ts ← read(commit-ts);55

if validate(tx) then tx.valid-ts ← ts; return true;56

return false;57

Algorithm 2: Pseudo-code of the two-phase contention man-

ager (W

is a constant)

function cm-start(tx)1

if not-restart(tx) then tx.cm-ts ← ∞ ;2

function cm-on-write(tx)3

if tx.cm-ts = ∞ and size(tx.write-log) = W

then4

tx.cm-ts ← increment&get(greedy-ts) ;

function cm-should-abort(tx, w-lock)5

if tx.cm-ts = ∞ then return true;6

lock-owner = owner(w-lock);7

if lock-owner.cm-ts < tx.cm-ts then return true;8

else abort(lock-owner); return false;9

function cm-on-rollback(tx)10

wait-random(tx.succ-abort-count);11

Rollback. On rollback, transaction T releases all write locks it

holds (lines 47–48), and then restarts itself.

Contention management. We give the pseudo-code of our two-

phase contention manager in Algorithm 2. The contention manager

gets invoked by Algorithm 1 (1) at transaction start (cm-start in

line 3), (2) on a write/write conﬂict (cm-should-abort in line 26),

(3) after a successful write (cm-on-write in line 33), and (4) after

restart (cm-on-rollback in line 49). Every transaction, upon exe-

cuting its W

write (where we set W

to 10), increments global

counter greedy-ts and stores its value in tx.cm-ts (line 4). Hence,

short transactions (those that execute less than W

writes) do not

access greedy-ts that would otherwise become a memory hot spot—

this reduces contention and the number of cache misses. Trans-

actions that have already incremented greedy-ts are in the second

phase of the contention management scheme, and others are in the

ﬁrst phase. Upon a conﬂict, a transaction that is still in the ﬁrst

phase gets restarted immediately (line 6). If both conﬂicting trans-

actions are already in the second phase, the transaction with the

higher value of cm-ts is restarted (lines 8–9). This prioritizes trans-

actions that have performed more work. Conceptually, transactions

in the ﬁrst phase have an inﬁnite value of cm-ts (set in line 2). This

means that (longer) transactions, which are in the second phase,

have higher priority than (short) transactions that are in the ﬁrst

phase. After restarting, transactions are delayed using a random-

ized back-off scheme (line 11). This reduces probability of hav-

ing some transaction aborted many times repeatedly because of the

same conﬂict.

3.3 Implementation Highlights

We implemented SwissTM in C++ (g++ 4.0.1 compiler). We used

the (fairly portable) atomic

ops library [5] for atomic operations

implementation. Currently, SwissTM works on 32-bit x86 Linux

2.6.x and OS X 10.5 platforms (64-bit port is in progress).

Lock table. To map memory word m to a lock table entry, we

take the address a of m, shift it to the right by 4 (it would be 5

with 64-bit words). This makes each lock map to consecutive four

memory words (we empirically selected this value, as explained in

Section 5). Then, we set all high order bits to zero. As the lock

table contains 2

entries in our implementation, we just perform

logical AND operation between shifted address and 2

− 1 to get

the index into the table. Figure 1 depicts the mapping scheme.

Having multiple consecutive memory words mapped to the same

lock table entry can result in false conﬂict, when unrelated memory

words get locked together, but this does not cause any problems in

practice.

On the correctness of transactional memory

Figures

Citations

NOrec: streamlining STM by abolishing ownership records

Transactional Memory, 2nd Edition

No compromises: distributed transactions with consistency, availability, and performance

Transactional Memory Architecture and Implementation for IBM System Z

Stretching transactional memory

References

Transaction Processing: Concepts and Techniques

Linearizability: a correctness condition for concurrent objects

Transactional memory: architectural support for lock-free data structures

Software transactional memory

A critique of ANSI SQL isolation levels

Related Papers (5)

Transactional locking II

Transactional memory: architectural support for lock-free data structures

Software transactional memory for dynamic-sized data structures

Software transactional memory

The serializability of concurrent database updates

Frequently Asked Questions (14)

Q1. What have the authors contributed in "Stretching transactional memory" ?

Q2. What future works have the authors mentioned in the paper "Stretching transactional memory" ?

Q3. What are the potential targets of TMs?

Q4. What is the motivation of this work?

Q5. What is the way to prevent a transaction from re-executing?

Q6. What are the main directions of the improvement of SwissTM?

Q7. What is the role of T in detecting conflicts?

Q8. What is the main reason for SwissTM outperforming other STMs?

Q9. How do the authors map memory word m to a lock table entry?

Q10. What are the main contributions of this paper?

Q11. What are the advantages of using microbenchmarks?

Q12. What are the main characteristics of the STM experiments?

Q13. What is the impact of using different lock granularities on performance?

Q14. Why does the lazy conflict detection STM react so slowly to write/write conflicts?