scispace - formally typeset
Open AccessProceedings ArticleDOI

On the correctness of transactional memory

Reads0
Chats0
TLDR
Opacity is defined as a property of concurrent transaction histories and its graph theoretical interpretation is given and it is proved that every single-version TM system that uses invisible reads and does not abort non-conflicting transactions requires, in the worst case, k steps for an operation to terminate.
Abstract
Transactional memory (TM) is perceived as an appealing alternative to critical sections for general purpose concurrent programming. Despite the large amount of recent work on TM implementations, however, very little effort has been devoted to precisely defining what guarantees these implementations should provide. A formal description of such guarantees is necessary in order to check the correctness of TM systems, as well as to establish TM optimality results and inherent trade-offs.This paper presents opacity, a candidate correctness criterion for TM implementations. We define opacity as a property of concurrent transaction histories and give its graph theoretical interpretation. Opacity captures precisely the correctness requirements that have been intuitively described by many TM designers. Most TM systems we know of do ensure opacity.At a very first approximation, opacity can be viewed as an extension of the classical database serializability property with the additional requirement that even non-committed transactions are prevented from accessing inconsistent states. Capturing this requirement precisely, in the context of general objects, and without precluding pragmatic strategies that are often used by modern TM implementations, such as versioning, invisible reads, lazy updates, and open nesting, is not trivial.As a use case of opacity, we prove the first lower bound on the complexity of TM implementations. Basically, we show that every single-version TM system that uses invisible reads and does not abort non-conflicting transactions requires, in the worst case, ?(k) steps for an operation to terminate, where k is the total number of objects shared by transactions. This (tight) bound precisely captures an inherent trade-off in the design of TM systems. The bound also highlights a fundamental gap between systems in which transactions can be fully isolated from the outside environment, e.g., databases or certain specialized transactional languages, and systems that lack such isolation capabilities, e.g., general TM frameworks.

read more

Content maybe subject to copyright    Report

Stretching Transactional Memory
Aleksandar Dragojevi´c Rachid Guerraoui Michał Kapałka
Ecole Polytechnique F´ed´erale de Lausanne, School of Computer and Communication Sciences, I&C, Switzerland
{aleksandar.dragojevic, rachid.guerraoui, michal.kapalka}@epfl.ch
Abstract
Transactional memory (TM) is an appealing abstraction for pro-
gramming multi-core systems. Potential target applications for TM,
such as business software and video games, are likely to involve
complex data structures and large transactions, requiring specific
software solutions (STM). Sofar, however, STMs have been mainly
evaluated and optimized for smaller scale benchmarks.
We revisit the main STM design choices from the perspec-
tive of complex workloads and propose a new STM, which we
call SwissTM. In short, SwissTM is lock- and word-based and
uses (1) optimistic (commit-time) conflict detection for read/write
conflicts and pessimistic (encounter-time) conflict detection for
write/write conflicts, as well as (2) a new two-phase contention
manager that ensures the progress of long transactions while induc-
ing no overhead on short ones. SwissTM outperforms state-of-the-
art STM implementations, namely RSTM, TL2, and TinySTM, in
our experiments on STMBench7, STAMP, Lee-TM and red-black
tree benchmarks.
Beyond SwissTM, we present the most complete evaluation to
date of the individual impact of various STM design choices on the
ability to support the mixed workloads of large applications.
Categories and Subject Descriptors D.1.3 [Programming Tech-
niques]: Concurrent Programming; D.2.8 [Software Engineering]:
Metrics—performance measures
General Terms Measurement, Performance, Experimentation
Keywords Software transactional memories, Benchmarks
1. Introduction
Transactional memory (TM) is an appealing abstraction for making
concurrent programming accessible to a wide community of non-
expert programmers while avoiding the pitfalls of critical sections.
With a TM, application threads communicate by executing opera-
tions on shared data inside lightweight in-memory transactions. A
transaction performs a number of actions and then either commits,
in which case all the actions are applied to shared data atomically,
or aborts, in which case the effects of those actions are rolled back
and never visible to other transactions. From a programmer’s per-
spective, the TM paradigm is very promising as it promotes pro-
gram composition [20], in contrast to explicit locking, while still
providing the illusion that all shared objects are protected by some
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
PLDI’09,
June 15–20, 2009, Dublin, Ireland.
Copyright
c
2009 ACM 978-1-60558-392-1/09/06. ..$5.00
global lock. Yet, it offers the possibility of performance comparable
to hand-crafted, fine-grained locking.
A possible target of TMs are large applications such as business
software or video games: the size of these applications make them
ideal candidates to benefit from emerging multi-core architectures.
Such applications typically involve dynamic and non-uniform data
structures consisting of many objects of various complexity. For
example, a video gameplay simulation can use up to 10, 000 active
interacting game objects, each having mutable state, being updated
30–60 times per second, and causing changes to 5–10 other objects
on every update [40]. Unless a TM is used, making such code
thread-safe and scalable on multi-cores is a daunting task [40]. The
big size and complexity of such applications can, in turn, easily
lead to large transactions, for these can naturally be composed [20].
Some TM interfaces [1], in fact, promote theencapsulation of entire
applications within very few transactions.
The motivation of this work is to explore the ability of software
mechanisms to effectively support mixed workloads consisting of
small and large transactions, as well as possibly complex data
structures. We believe this to be of practical relevance because even
if hardware TM support becomes widely available in the future, it is
likely that only smaller-scale transactional workloads will be fully
executed in hardware, while software support will still be needed
for transactions with large read and write sets. For example, the
hybrid hardware/software scheme proposed in [26] switches from
full hardware TM to full software TM when it encounters large
transactions. The ability of STM systems to effectively deal with
large transactions will be crucial in these settings as well.
Since the seminal paper on a software TM (STM) that supported
dynamic data structures and unbounded transactions [22], all mod-
ern STMs are supposed to handle complex workloads [22, 27, 10,
31, 21, 2, 35, 29]. A wide variety of STM techniques, mainly in-
spired by database algorithms, have been explored. The big chal-
lenge facing STM researchers is to determine the right combination
of strategies that suit the requirements of concurrent applications—
requirements that are significantly different than those of database
applications. So far, however, most STM experiments have been
performed using benchmarks characterized by small transactions,
simple and uniform data structures, or regular data access patterns.
While such experiments reveal performance differences between
STM implementations, they are not fully representative of com-
plex workloads that STMs are likely to get exposed to once used in
real applications. Worse, they can mislead STM implementors by
promoting certain strategies that may perform well in small-scale
applications but are counter-productive with complex workloads.
Examples of such strategies, which we discuss in more details later
in the paper, include the following.
1. The commit-time locking scheme, used for instance in TL2 [10],
is indeed effective for short transactions, but might waste sig-
nificant work of longer transactions that eventually abort due

to write/write conflicts. This is because write/write conflicts,
which usually lead to transaction aborts
1
, are detected too late.
2. The encounter-time locking scheme, used by most STMs, e.g.,
TinySTM [31], McRT-STM [35, 29], and Bartok-STM [21] im-
mediately aborts a transaction that tries to read a memory loca-
tion locked by another transaction. Hence, read/write conflicts,
which can often be handled without aborts, are detected very
early and resolved by aborting readers. Long transactions that
write memory locations commonly read by other transactions
might thus end up blocking many other transactions, and for a
long time, thus slowing down the system overall.
3. The timid contention management scheme, used by many
STMs, especially word-based ones such as TL2 and TinySTM,
and which aborts transactions immediately upon a conflict, fa-
vorsshort transactions. Contention managers such as Greedy [16]
or Serializer [34] are more appropriate for large transactions,
but are hardly ever used due to the overhead they impose on
short transactions.
It is appealing but challenging to come up with strategies that
account both for long transactions and complex workloads, as well
as for short transactions and simple data structures: these might
indeed typically co-exist in real applications. This paper is a first
step towards taking that challenge up. We perform that step through
SwissTM, a new lock- and word-based STM. The main distinctive
features of SwissTM are:
A conflict detection scheme that detects (a) write/write con-
flicts eagerly, in order to prevent transactions that are doomed
to abort from running and wasting resources, and (b) read/write
conflicts late, in order to optimistically allow more parallelism
between transactions. In short, transactions eagerly acquire ob-
jects for writing, which helps detect write/write conflicts as
soon as they appear. This also avoids wasting work of trans-
actions that are already doomed to abort after a write/write con-
flict. By using invisible reads and allowing transactions to read
objects acquired for writing, SwissTM detects read/write con-
flicts late, thus increasing inter-transaction parallelism. A time-
based scheme [10, 33] is used to reduce the cost of transaction
validation with invisible reads.
A two-phase contention manager that incurs no overhead on
read-only and short read-write transactions while favoring the
progress of transactions that have performed a significant num-
ber of updates. Basically, transactions that are short or read-only
use the simple but inexpensive timid contention management
scheme, aborting on first encountered conflict. Transactions that
are more complex switch dynamically to the Greedy mecha-
nism that involves more overhead but favors these transactions,
preventing starvation. Additionally, transactions that abort due
to write/write conflicts back-off for a period proportional to the
number of their successive aborts, hence reducing contention
on memory hot spots.
We evaluate SwissTM with state-of-the-art STMs by using
benchmarks that cover a large part of the complexity space. We
start with STMBench7 [18], which involves (1) non-uniform data
structures of significant size, and (2) a mix of operations of various
length and data access patterns. Then, we move to Lee-TM [4]—a
benchmark with large but regular transactions—and STAMP [8]—
a collection of realistic medium-scale workloads. Finally, we eval-
uate SwissTM with a red-black tree microbenchmark that involves
very short and simple transactions. SwissTM outperforms state-of-
1
Pure write/write conflicts do not necessarily lead to transaction aborts, but
are very rare—most transactions read memory locations before updating
them.
STM design choices
Acquire Reads CM Effectiveness
lazy invisible any +
eager visible any +
eager invisible Polka +
eager invisible timid or Greedy ++
mixed invisible timid or Greedy +++
mixed invisible 2-phase ++++
Table 1. A summary comparison of the effectiveness of selected
combinations of STM design choices in mixed workloads.
the-art STMs—RSTM [27], TL2 [10], and TinySTM [31]—in all
the considered benchmarks. For example, in the read-dominated
workload of STMBench7 (90% of read-only operations), SwissTM
outperforms the other STMs by up to 65%, and in the write-
dominated workload (10% of read-only operations)—by up to
10%. Also, SwissTM provides a better scalability than the other
STMs, especially for read-dominated and read-write (60% of read-
only operations) workloads of STMBench7.
We compare SwissTM to RSTM, TL2, and TinySTM for two
reasons.
They constitute the state-of-the-art performance-wise, among
the publicly available library-based STMs. Furthermore, just
like SwissTM, they can be used to manually instrument concur-
rent applications with transactional accesses. Indeed, our goal
is to evaluate the performance of the core STM algorithm, not
the efficiency of the higher layers such as STM compilers. We
did not use for instance McRT-STM [35, 29], because it does
not expose such a low-level API to a programmer. Evaluat-
ing STM-aware compilers (which naturally introduce additional
overheads above the low-level STM interface [42, 6]) is largely
an orthogonal issue;
They represent a wide spectrum of known TM design choices:
obstruction-free vs. lock-based implementation, eager vs. lazy
updates, invisible vs. visible reads, and word-level vs. object-
level access granularity. They also allow for experiments with
a variety of contention management strategies, from simply
aborting a transaction on a conflict, through exponential back-
off, up to advanced contention managers like Greedy [16],
Serializer [34], or Polka [41].
We report on our SwissTM (trial-and-error) experience, which
we believe is interesting in its own right. It is the first to date that
evaluates the ability of software solutions to provide good per-
formance to large transactions and complex objects without intro-
ducing significant overheads on short transactions and simple data
structures. We evaluate the individual impact of various STM de-
sign choices on the ability to support mixed workloads. A summary
of our observations, is presented in Table 1.
From an implementation perspective, we also evaluate the im-
pact of the locking granularity. Word-based STM implementations
used so far either word-level locking (e.g., TL2 and TinySTM) or
cache-line level locking (e.g., McRT-STM C/C++). Our sensitivity
analysis shows that a lock granularity of four words outperforms
both word-level and cache line-level locking by 4% and 5% re-
spectively across all benchmarks we considered.
To summarize, the main contributions of this paper are (1) the
design and implementation of an STM that performs particularly
well with large-scale complex transactional workloads while hav-
ing good performance in small-scale ones, and (2) an extensive ex-
perimental evaluation of STM strategies and implementations from
the perspective of complex applications with mixed workloads.

The rest of the paper is organized as follows. In Section 2,
we give a short overview of STM design space and benchmarks.
We then present SwissTM in Section 3. In Sections 4 and 5, we
present the results of our experimental evaluation: first, we com-
pare the performance of SwissTM to that of TL2, TinySTM, and
RSTM, and, second, we evaluate the individual impact of the de-
sign choices underlying SwissTM.
2. Background
Transactional memory was first proposed in hardware (HTM) [23].
So far, most HTMs support only limited-size transactions and of-
ten do not ensure transaction progress upon specific system events,
e.g., interrupts, context switches, or function calls [9]. While there
have been proposals for truly dynamic HTMs (e.g. [3, 32]), it
is very likely that actual HTM implementations will still have
some of these limitations. Hybrid approaches either execute short
transactions in hardware and fall back to software for longer ones
(e.g., [26]), or accelerate certain operations of an STM in hard-
ware. This work focuses on pure software solutions (STM) [37].
In this section, we survey some distinctive features of STMs and
discuss the three representative STMs we focus on in our evalua-
tion: RSTM [27], TL2 [10], and TinySTM [31] (see [24] for a full
survey). We also give a short description of the benchmarks used in
our experiments.
2.1 STM Design Space
The main task of an STM is to detect conflicts among concurrent
transactions and resolve them. Deciding what to do when conflicts
arise is performed by a (conceptually) separate component called
a contention manager [22]. A concept closely related to conflict
detection is that of validation. Validating a transaction consists of
checking its read set (i.e., the set of locations
2
the transaction has
already read) for consistency.
Two classes of STMs can be distinguished, word-based and
object-based, depending on the granularity at which they perform
logging. RSTM is object-based while TL2 and TinySTM are word-
based. There are also two general classes of STM implementa-
tions: lock-based and obstruction-free. Lock-based STMs, first pro-
posed in [19, 12], implement some variant of the two-phase lock-
ing protocol [13]. Obstruction-free STMs [22] do not use any
blocking mechanisms (such as locks), and guarantee progress even
when some of the transactions are delayed. RSTM (version 3) is
obstruction-free, while TL2 and TinySTM internally use locks.
Conflict detection. Most STMs employ the single-writer-multiple-
readers strategy; accesses to the same location by concurrent trans-
actions conflict when at least one of the accesses is a write (update).
In order to commit, a transaction T must eventually acquire every
location x that is updated by T. Acquisition can be eager, i.e., at
the time of the first update operation of T on x, or lazy, i.e., at the
commit time of T. A transaction T that reads x can be either visible
or invisible [27] to other transactions accessing x. When T is invis-
ible, T has the sole responsibility of detecting conflicts on x with
transactions that write x concurrently, i.e., validating its read set.
The time complexity of a basic validation algorithm is proportional
to the size of the read set, but can be boosted with a global commit
counter heuristic (RSTM), or a time-based scheme [10, 31] (TL2
and TinySTM).
A mixed invalidation conflict detection scheme (first proposed
in [39]) eagerly detects write/write conflicts while lazily detecting
read/write conflicts (it is a mix between pure lazy and pure eager
schemes). A similar conflict detection scheme is provided by more
2
These are memory words in word-based STMs and objects in object-based
STMs.
general (but also more expensive) multi-versioning schemes used in
LSA-STM [33] and JVSTM [7]. Mixed invalidation, which under-
lies SwissTM, has never been used with lock-based or word-based
STMs, nor has it been evaluated with any large-scale workload.
RSTM supports lazy and eager acquisition, as well as visi-
ble and invisible reads (i.e., four algorithm variants). TL2 and
TinySTM use, respectively, lazy and eager acquisition. Both TL2
and TinySTM employ invisible reads.
Contention management. The contention manager decides what
a given transaction (attacker) should do in case of a conflict with
another transaction (victim). Possible outcomes are: aborting the
attacker, aborting the victim, or forcing the attacker to retry after
some period.
The simplest scheme (which we call timid) is to always abort the
attacker (possibly with a short back-off). This is the default scheme
in TL2 and TinySTM. More involved contention managers were
proposed in [41, 36, 16], and are provided with RSTM. They can
also be combined at run-time [15]. Polka [41] assigns every trans-
action a priority that is equal to the number of objects the trans-
action accessed so far. Whenever the attacker waits, its priority is
temporarily increased by one. If the attacker has a lower priority
than the victim, it will be forced to wait (using exponential back-
off to calculate the wait interval), otherwise the victim gets aborted.
Greedy assigns each transaction a unique, monotonically increas-
ing timestamp on its start. The transaction with the lower times-
tamp always wins. An important property of Greedy is that, un-
like other contention managers we mention, it avoids starvation of
transactions. Polka has been shown to provide best performance in
smaller-scale benchmarks previously [41], while our experiments
show that Greedy performs better in large-scale workloads (Sec-
tion 5). Serializer is very similar to Greedy except that it assigns a
new timestamp to a transaction on every restart, and thus does not
prevent starvation or even livelocks of transactions.
2.2 STM Benchmarks
In this section, we give an overview of the benchmarks we use
in our experiments. These represent a large spectrum of workload
types: from simple data structures with small transactions (the red-
black tree microbenchmark) to complex applications with possibly
long transactions (STMBench7). All the benchmarks we used are
implemented in C/C++.
STMBench7. STMBench7 [18] is a synthetic benchmark which
workloads aim at representing realistic, complex, object-oriented
applications that are an important target for STMs. STMBench7
exhibits a large variety of operations (from very short, read-only
operations to very long ones that modify large parts of the data
structure) and workloads (from workloads consisting mostly of
read-only transactions to write-dominated ones). The data structure
used by STMBench7 is many orders of magnitude larger than in
other typical STM benchmarks. Also, its transactions are longer
and access larger numbers of objects.
STMBench7 is inherently object-based and its implementations
also use standard language libraries. A thin wrapper, described
in [11], is thus necessary to use STMBench7 with word-based
STMs (TL2, TinySTM, and SwissTM).
STAMP. STAMP [8] is a TM benchmarking suite that consists of
eight different transactional programs and ten workloads.
3
STAMP
applications are representative of various real-world workloads, in-
cluding bioinformatics, engineering, computer graphics, and ma-
chine learning. While STAMP covers a broad range of possible
STM uses, its does not involve very long transactions, such as those
that might be produced by average, non-expert programmers or
3
We used STAMP version 0.9.9.

generated automatically by a compiler along the lines of [1]. Fur-
thermore, some STAMP algorithms (e.g., bayes) split logical op-
erations into multiple transactions and use intricate programming
techniques that might not be representative of average program-
mers’ skills.
Lee-TM. Lee-TM [4] is a benchmark that offers large, realistic
workloads and is based on Lee’s circuit routing algorithm. The al-
gorithm takes pairs of points (e.g., of an integrated circuit) as its
input and produces non-intersecting routes between them. While
transactions of Lee-TM are significant in size, they exhibit very
regular access patterns—every transaction first reads a large num-
ber of locations (searching for suitable paths) and then updates a
small number of them (setting up a path). Moreover, the bench-
mark uses very simple objects (each can be represented as a single
integer variable). It is worth noting that STAMP contains an appli-
cation (called labyrinth) that uses the same algorithm as Lee-TM.
However, Lee-TM uses real-world input sets that make it more re-
alistic than labyrinth. Lee-TM distribution includes two input data
sets: memory and main circuit boards.
Red-black tree. The prevailing way of measuring the perfor-
mance of STMs has been through microbenchmarks. The widely
used (first in [22]) red-black tree microbenchmark consists of
short transactions that insert, lookup, and remove elements from
a red-black tree data structure. Short and simple transactions of mi-
crobenchmarks are good for testing mechanics of STM itself and
comparing low-level details of various implementations.
3. SwissTM
SwissTM is a lock-based STM that uses invisible reads and counter
based heuristics (the same as in TinySTM and TL2). It features
eager write/write and lazy read/write conflict detection, as well as
a two-phase contention manager with random linear back-off. The
API of SwissTM is word-based, as it enables transactional access
to arbitrary memory words. SwissTM uses a redo-logging scheme
(partially to support its conflict detection scheme).
3.1 Programming model
Similarly to most other STM libraries, SwissTM guarantees opac-
ity [17]. Opacity is similar to serializability in database sys-
tems [30]. The main difference is that all transactions always ob-
serve consistent states of the system. This means that transactions
cannot, e.g., use stale values, and that they do not require periodic
validation or sandboxing to prevent infinite loops or crashes due to
accesses to inconsistent memory states.
SwissTM is a weakly atomic STM, i.e., it does not provide
any guarantees for the code that accesses the same data from both
inside and outside of transactions. SwissTM is not privatization
safe [38]. This could make programming with SwissTM slightly
more difficult in certain cases, but did not affect us, as none of the
benchmarks we use requires privatization-safe STM.
When programming with SwissTM, programmers have to re-
place all memory references to shared data from inside transactions
with SwissTM calls for reading and writing memory words. The
programming model can be improved by using an STM compiler
(as in e.g. [21, 2, 14, 29]). While the compiler instrumentation can
degrade performance due to over-instrumentation [42] and possi-
bly even change the characteristics of the workload slightly (e.g.
numbers and ratio of transactional read and write operations), the
compiler instrumentation remains a largely orthogonal issue to the
performance of an STM library.
Other three STMs we compare to in our experiments provide
the same semantical guarantees as SwissTM. Also, strengthening
the guarantees (as described in Section 6) would have a similar
performance impact on all STMs we use.
3.2 Algorithm
We give the pseudo-code of SwissTM in Algorithm 1. The algo-
rithm invokes contention manager functions (cm-*), which are de-
fined in Algorithm 2 and described below. All transactions share a
global commit counter commit-ts incremented by every non-read-
only transaction upon commit. Each memory word m is mapped
to a pair of locks in a global lock table: r-lock (read) and w-lock
(write). Lock w-lock is acquired by a writer T of m (eagerly) to pre-
vent other transactions from writing to m. Lock r-lock is acquired
by T at commit time to prevent other transactions from reading
word m and, as a result, observing inconsistent states of words writ-
ten by T. In addition, when r-lock is unlocked, it contains the ver-
sion number of m. Every transaction T has a transaction descriptor
tx that contains (among other data): (1) the value of commit-ts read
at the start or subsequent validation of T , and (2) read and write
logs of T.
Transaction start. Every transaction T, upon its start, reads the
global counter commit-ts and stores its value in tx.valid-ts (line 2).
Reading. When reading location addr, transaction T rst reads
the value of w-lock to detect possible read-after-write cases. If T is
the owner of w-lock, then T can return the value from its write log
immediately, which is the last value T has written to addr (line 6).
Otherwise, i.e., when some other transaction owns w-lock or when
w-lock is unlocked, T reads the value of r-lock, then the value of
addr, and then again the value of r-lock. Transaction T repeats these
three reads until (1) two values of r-lock are the same, meaning
that T has read consistent values of r-lock and addr, and (2) r-lock
is unlocked (lines 8–15). When r-lock is unlocked, it contains the
current version v of addr. If v is lower or equal to the validation
timestamp tx.valid-ts of T (which means that addr has not changed
since Ts last validation or start), T returns the value at addr read
in line 18. Otherwise, T revalidates its read set. If the revalidation
does not succeed, T rolls back (line 17). If it succeeds, the read
operation returns and T extends its validation timestamp tx.valid-ts
to the current value of commit-ts (line 56).
Writing. Whenever some transaction T writes to a memory lo-
cation addr, T first checks if T is the owner of the lock w-lock
corresponding to addr. If it is, T updates the value of addr in its
write log and returns (lines 21–23). Otherwise, T tries to acquire
w-lock by atomically replacing, using a compare-and-swap (CAS)
operation, value unlocked with the pointer to the Ts write log en-
try that contains the new value of addr (line 29). If CAS does not
succeed, T asks the contention manager whether to rollback and
retry or wait for the current owner of the lock to finish (line 26). In
order to guarantee opacity, T has to revalidate its read set if the cur-
rent version of addr (contained in r-lock) is higher than its validity
timestamp tx.valid-ts (lines 31–32).
Validation. To validate itself, T compares the versions of all
memory locations read so far to their versions at the point they
were initially read by T (lines 51–52). These versions are stored in
Ts read log. If there is a mismatch between any version numbers,
the validation fails (line 52).
Commit. A read-only transaction T can commit immediately, as
its read log is guaranteed to be consistent (line 35). A transaction T
that is not read-only first locks all read locks of memory locations T
has written to (line 36). Then, T increments commit-ts (line 37) and
re-validates its read log. If the validation does not succeed, T roll-
backs and restarts (lines 38–41). Upon successful validation, T tra-
verses its write set, updates values of all written memory locations,
and releases the corresponding read and write locks (lines 42–45).
When releasing read locks, T writes the new value of commit-ts to
those locks.

Algorithm 1: Pseudo-code representation of SwissTM.
function start(tx)1
tx.valid-ts commit-ts;2
cm-start(tx);3
function read-word(tx, addr)4
(r-lock, w-lock) map-addr-to-locks(addr);5
if is-locked-by(w-lock, tx) then return get-value(w-lock, addr);6
version read(r-lock);7
while true do8
if version = locked then9
version read(r-lock);10
continue;11
value read(addr);12
version2 read(r-lock);13
if version = version2 then break;14
version2 version;15
add-to-read-log(tx, r-lock, version);16
if version > tx.valid-ts and not extend(tx) then rollback(tx);17
return value;18
function write-word(tx, addr, value)19
(r-lock, w-lock) map-addr-to-locks(addr);20
if is-locked-by(w-lock, tx) then21
update-log-entry(w-lock, addr, value);22
return;23
while true do24
if is-locked(w-lock) then25
if cm-should-abort(tx, w-lock) then rollback(tx);26
else continue;27
log-entry add-to-write-log(tx, w-lock, addr, value);28
if compare&swap(w-lock, unlocked, log-entry) then29
break;30
if read(r-lock) > tx.valid-ts and not extend(tx) then31
rollback(tx);32
cm-on-write(tx);33
function commit(tx)34
if is-read-only(tx) then return;35
for log-entry in tx.read-log do write(log-entry.r-lock, locked);36
ts increment&get(commit-ts);37
if ts > tx.valid-ts + 1 and not validate(tx) then38
for log-entry in tx.read-log do39
write(log-entry.r-lock, log-entry.version);40
rollback(tx);41
for log-entry in tx.write-log do42
write(log-entry.addr, log-entry.value);43
write(log-entry.r-lock, ts);44
write(log-entry.w-lock, unlocked);45
function rollback(tx)46
for log-entry in tx.write-log do47
write(log-entry.w-lock, unlocked);48
cm-on-rollback(tx);49
function validate(tx)50
for log-entry in tx.read-log do51
if log-entry.version 6= read(log-entry.r-lock) and not52
is-locked-by(log-entry.r-lock, tx) then return false;
return true;53
function extend(tx)54
ts read(commit-ts);55
if validate(tx) then tx.valid-ts ts; return true;56
return false;57
Algorithm 2: Pseudo-code of the two-phase contention man-
ager (W
n
is a constant)
function cm-start(tx)1
if not-restart(tx) then tx.cm-ts ;2
function cm-on-write(tx)3
if tx.cm-ts = and size(tx.write-log) = W
n
then4
tx.cm-ts increment&get(greedy-ts) ;
function cm-should-abort(tx, w-lock)5
if tx.cm-ts = then return true;6
lock-owner = owner(w-lock);7
if lock-owner.cm-ts < tx.cm-ts then return true;8
else abort(lock-owner); return false;9
function cm-on-rollback(tx)10
wait-random(tx.succ-abort-count);11
Rollback. On rollback, transaction T releases all write locks it
holds (lines 47–48), and then restarts itself.
Contention management. We give the pseudo-code of our two-
phase contention manager in Algorithm 2. The contention manager
gets invoked by Algorithm 1 (1) at transaction start (cm-start in
line 3), (2) on a write/write conflict (cm-should-abort in line 26),
(3) after a successful write (cm-on-write in line 33), and (4) after
restart (cm-on-rollback in line 49). Every transaction, upon exe-
cuting its W
n
th
write (where we set W
n
to 10), increments global
counter greedy-ts and stores its value in tx.cm-ts (line 4). Hence,
short transactions (those that execute less than W
n
writes) do not
access greedy-ts that would otherwise become a memory hot spot—
this reduces contention and the number of cache misses. Trans-
actions that have already incremented greedy-ts are in the second
phase of the contention management scheme, and others are in the
first phase. Upon a conflict, a transaction that is still in the rst
phase gets restarted immediately (line 6). If both conflicting trans-
actions are already in the second phase, the transaction with the
higher value of cm-ts is restarted (lines 8–9). This prioritizes trans-
actions that have performed more work. Conceptually, transactions
in the first phase have an infinite value of cm-ts (set in line 2). This
means that (longer) transactions, which are in the second phase,
have higher priority than (short) transactions that are in the first
phase. After restarting, transactions are delayed using a random-
ized back-off scheme (line 11). This reduces probability of hav-
ing some transaction aborted many times repeatedly because of the
same conflict.
3.3 Implementation Highlights
We implemented SwissTM in C++ (g++ 4.0.1 compiler). We used
the (fairly portable) atomic
ops library [5] for atomic operations
implementation. Currently, SwissTM works on 32-bit x86 Linux
2.6.x and OS X 10.5 platforms (64-bit port is in progress).
Lock table. To map memory word m to a lock table entry, we
take the address a of m, shift it to the right by 4 (it would be 5
with 64-bit words). This makes each lock map to consecutive four
memory words (we empirically selected this value, as explained in
Section 5). Then, we set all high order bits to zero. As the lock
table contains 2
22
entries in our implementation, we just perform
logical AND operation between shifted address and 2
22
1 to get
the index into the table. Figure 1 depicts the mapping scheme.
Having multiple consecutive memory words mapped to the same
lock table entry can result in false conflict, when unrelated memory
words get locked together, but this does not cause any problems in
practice.

Citations
More filters
Proceedings ArticleDOI

NOrec: streamlining STM by abolishing ownership records

TL;DR: An ownership-record-free software transactional memory (STM) system that combines extremely low overhead with unusually clean semantics is presented, and the experience suggests that NOrec may be an ideal candidate for such a software system.
Book

Transactional Memory, 2nd Edition

TL;DR: This book presents an overview of the state of the art in the design and implementation of transactional memory systems, as of early spring 2010.
Proceedings ArticleDOI

No compromises: distributed transactions with consistency, availability, and performance

TL;DR: It is shown that a main memory distributed computing platform called FaRM can provide distributed transactions with strict serializability, high performance, durability, and high availability in modern data centers.
Proceedings ArticleDOI

Transactional Memory Architecture and Implementation for IBM System Z

TL;DR: The implementation in the IBM zEnterprise EC12 (zEC12) microprocessor generation, focusing on how transactional memory can be embedded into the existing cache design and multiprocessor shared-memory infrastructure, is described.
Proceedings ArticleDOI

Stretching transactional memory

TL;DR: SwissTM is lock- and word-based and uses a new two-phase contention manager that ensures the progress of long transactions while inducing no overhead on short ones, and outperforms state-of-the-art STM implementations, namely RSTM, TL2, and TinySTM.
References
More filters
Book

Transaction Processing: Concepts and Techniques

Jim Gray, +1 more
TL;DR: Using transactions as a unifying conceptual framework, the authors show how to build high-performance distributed systems and high-availability applications with finite budgets and risk.
Journal ArticleDOI

Linearizability: a correctness condition for concurrent objects

TL;DR: This paper defines linearizability, compares it to other correctness conditions, presents and demonstrates a method for proving the correctness of implementations, and shows how to reason about concurrent objects, given they are linearizable.
Proceedings ArticleDOI

Transactional memory: architectural support for lock-free data structures

TL;DR: Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.
Proceedings ArticleDOI

Software transactional memory

TL;DR: STM is used to provide a general highly concurrent method for translating sequential object implementations to non-blocking ones based on implementing a k-word compare&swap STM-transaction, a novel software method for supporting flexible transactional programming of synchronization operations.
Proceedings ArticleDOI

A critique of ANSI SQL isolation levels

TL;DR: It is shown that these phenomena and the ANSI SQL definitions fail to properly characterize several popular isolation levels, including the standard locking implementations of the levels covered, and new phenomena that better characterize isolation types are introduced.
Frequently Asked Questions (14)
Q1. What have the authors contributed in "Stretching transactional memory" ?

The authors revisit the main STM design choices from the perspective of complex workloads and propose a new STM, which they call SwissTM. Beyond SwissTM, the authors present the most complete evaluation to date of the individual impact of various STM design choices on the ability to support the mixed workloads of large applications. 

Further experiments might be needed in this direction. Two main directions along which the authors plan to improve the semantical guarantees of SwissTM are: ( 1 ) adding compiler support, and ( 2 ) making SwissTM privatizationsafe. There exists a number of STM C/C++ compilers that have open interfaces supporting different STM libraries ( e. g. [ 14, 29 ] ) and the authors plan to integrate SwissTM with one of them. While this algorithm is simple, it would probably significantly impact performance of SwissTM [ 42 ] and the authors plan to investigate other options, possibly using techniques similar to [ 28 ] or [ 25 ]. 

A possible target of TMs are large applications such as business software or video games: the size of these applications make them ideal candidates to benefit from emerging multi-core architectures. 

The motivation of this work is to explore the ability of software mechanisms to effectively support mixed workloads consisting of small and large transactions, as well as possibly complex data structures. 

It might seem beneficial to make transactions restart as soon as possible after conflicts that force them to rollback, as waiting just decreases the reaction time before the transaction re-executes. 

Two main directions along which the authors plan to improve the semantical guarantees of SwissTM are: (1) adding compiler support, and (2) making SwissTM privatizationsafe. 

When T is invisible, T has the sole responsibility of detecting conflicts on x with transactions that write x concurrently, i.e., validating its read set. 

SwissTM significantly outperforms all other STMs for both read-dominated and read-write workloads, while also achieving superior scalability. 

To map memory word m to a lock table entry, the authors take the address a of m, shift it to the right by 4 (it would be 5 with 64-bit words). 

To summarize, the main contributions of this paper are (1) the design and implementation of an STM that performs particularly well with large-scale complex transactional workloads while having good performance in small-scale ones, and (2) an extensive experimental evaluation of STM strategies and implementations from the perspective of complex applications with mixed workloads. 

Short and simple transactions of microbenchmarks are good for testing mechanics of STM itself and comparing low-level details of various implementations. 

So far, however, most STM experiments have been performed using benchmarks characterized by small transactions, simple and uniform data structures, or regular data access patterns. 

It is interesting to note here that, while using different lock granularities does impact performance, the impact of using coarser lock granularities is not significant enough to prevent SwissTM from scaling (e.g. due to increased number of false conflicts). 

Because of this, lazy conflict detection STMs react too slowly to write/write conflicts (which are good signs that transactions cannot proceed in parallel) and results in transactions performing work that has to be rolled back later.