Transactional locking II

Home
/
Papers
/
Transactional locking II

Journal Article•

Transactional locking II

01 Jan 2006-Lecture Notes in Computer Science (Springer)-pp 194-208

TL;DR: This paper introduces the transactional locking II (TL2) algorithm, a software transactional memory (STM) algorithm based on a combination of commit-time locking and a novel global version-clock based validation technique, which is ten-fold faster than a single lock.

read less

Abstract: The transactional memory programming paradigm is gaining momentum as the approach of choice for replacing locks in concurrent programming. This paper introduces the transactional locking II (TL2) algorithm, a software transactional memory (STM) algorithm based on a combination of commit-time locking and a novel global version-clock based validation technique. TL2 improves on state-of-the-art STMs in the following ways: (1) unlike all other STMs it fits seamlessly with any systems memory life-cycle, including those using malloc/free (2) unlike all other lock-based STMs it efficiently avoids periods of unsafe execution, that is, using its novel version-clock validation, user code is guaranteed to operate only on consistent memory states, and (3) in a sequence of high performance benchmarks, while providing these new properties, it delivered overall performance comparable to (and in many cases better than) that of all former STM algorithms, both lock-based and non-blocking. Perhaps more importantly, on various benchmarks, TL2 delivers performance that is competitive with the best hand-crafted fine-grained concurrent structures. Specifically, it is ten-fold faster than a single lock. We believe these characteristics make TL2 a viable candidate for deployment of transactional memory today, long before hardware transactional support is available.

...read moreread less

Citations

PDF

Open Access

More filters

Book•

The Art of Multiprocessor Programming

[...]

Maurice Herlihy¹•Institutions (1)

Brown University¹

14 Mar 2008

TL;DR: Transactional memory as discussed by the authors is a computational model in which threads synchronize by optimistic, lock-free transactions, and there is a growing community of researchers working on both software and hardware support for this approach.

...read moreread less

Abstract: Computer architecture is about to undergo, if not another revolution, then a vigorous shaking-up. The major chip manufacturers have, for the time being, simply given up trying to make processors run faster. Instead, they have recently started shipping "multicore" architectures, in which multiple processors (cores) communicate directly through shared hardware caches, providing increased concurrency instead of increased clock speed.As a result, system designers and software engineers can no longer rely on increasing clock speed to hide software bloat. Instead, they must somehow learn to make effective use of increasing parallelism. This adaptation will not be easy. Conventional synchronization techniques based on locks and conditions are unlikely to be effective in such a demanding environment. Coarse-grained locks, which protect relatively large amounts of data, do not scale, and fine-grained locks introduce substantial software engineering problem.Transactional memory is a computational model in which threads synchronize by optimistic, lock-free transactions. This synchronization model promises to alleviate many (not all) of the problems associated with locking, and there is a growing community of researchers working on both software and hardware support for this approach. This talk will survey the area, with a focus on open research problems.

...read moreread less

1,268 citations

Proceedings Article•DOI•

STAMP: Stanford Transactional Applications for Multi-Processing

[...]

Chi Cao Minh¹, JaeWoong Chung¹, Christos Kozyrakis¹, Kunle Olukotun¹•Institutions (1)

Stanford University¹

30 Sep 2008

TL;DR: This paper introduces the Stanford Transactional Application for Multi-Processing (STAMP), a comprehensive benchmark suite for evaluating TM systems and uses the suite to evaluate six different TM systems, identify their shortcomings, and motivate further research on their performance characteristics.

...read moreread less

Abstract: Transactional Memory (TM) is emerging as a promising technology to simplify parallel programming. While several TM systems have been proposed in the research literature, we are still missing the tools and workloads necessary to analyze and compare the proposals. Most TM systems have been evaluated using microbenchmarks, which may not be representative of any real-world behavior, or individual applications, which do not stress a wide range of execution scenarios. We introduce the Stanford Transactional Application for Multi-Processing (STAMP), a comprehensive benchmark suite for evaluating TM systems. STAMP includes eight applications and thirty variants of input parameters and data sets in order to represent several application domains and cover a wide range of transactional execution cases (frequent or rare use of transactions, large or small transactions, high or low contention, etc.). Moreover, STAMP is portable across many types of TM systems, including hardware, software, and hybrid systems. In this paper, we provide descriptions and a detailed characterization of the applications in STAMP. We also use the suite to evaluate six different TM systems, identify their shortcomings, and motivate further research on their performance characteristics.

...read moreread less

934 citations

Cites background from "Transactional locking II"

...The largest read and write set sizes are found in labyrinth+, with values of 783 and 779 cache lines, respectively (24.5 KB and 24.3 KB, respectively)....
[...]
...TM microbenchmarks are typically composed of transactions that execute a few operations on a data structure like a hash table or red-black tree [12, 13, 25, 27, 34]....
[...]
...To benefit from the increasing number of cores per chip, application developers have to develop parallel programs and deal with cumbersome issues such as synchronization tradeoffs, deadlock avoidance, and races....
[...]

Proceedings Article•DOI•

Hybrid transactional memory

[...]

Peter C. Damron¹, Alexandra Fedorova², Yossi Lev³, Victor Luchangco⁴, Mark Moir⁴, Daniel S. Nussbaum⁴ - Show less +2 more•Institutions (4)

Sun Microsystems¹, Harvard University², Brown University³, Sun Microsystems Laboratories⁴

20 Oct 2006

TL;DR: Using a simulated multiprocessor with HTM support, the viability of the HyTM approach is demonstrated: it can provide performance and scalability approaching that of an unbounded HTM implementation, without the need to support all transactions with complicatedHTM support.

...read moreread less

Abstract: Transactional memory (TM) promises to substantially reduce the difficulty of writing correct, efficient, and scalable concurrent programs But "bounded" and "best-effort" hardware TM proposals impose unreasonable constraints on programmers, while more flexible software TM implementations are considered too slow Proposals for supporting "unbounded" transactions in hardware entail significantly higher complexity and risk than best-effort designsWe introduce Hybrid Transactional Memory (HyTM), an approach to implementing TMin software so that it can use best effort hardware TM (HTM) to boost performance but does not depend on HTM Thus programmers can develop and test transactional programs in existing systems today, and can enjoy the performance benefits of HTM support when it becomes availableWe describe our prototype HyTM system, comprising a compiler and a library The compiler allows a transaction to be attempted using best-effort HTM, and retried using the software library if it fails We have used our prototype to "transactify" part of the Berkeley DB system, as well as several benchmarks By disabling the optional use of HTM, we can run all of these tests on existing systems Furthermore, by using a simulated multiprocessor with HTM support, we demonstrate the viability of the HyTM approach: it can provide performance and scalability approaching that of an unbounded HTM implementation, without the need to support all transactions with complicated HTM support

...read moreread less

525 citations

Cites background from "Transactional locking II"

...2.3 Hybrid transactional memory Kumar et al. [12] recently proposed using HTM to optimize the Dynamic Software Transactional Memory (DSTM) of Herlihy et al....
[...]
...4.4 rand-array benchmark In this section, we report on our simulation studies, in which we used the simple rand-array microbenchmark to evaluate HyTM s ability to mix HTM and STM transactions, and to compare its performance in various con.gurations against standard lock-based approaches....
[...]
...The key idea to achieving correct interaction between software transactions (i.e., those executed using the STM library) and hardware transactions (i.e., those executed using HTM support) is to augment hardware transactions with additional code that ensures that the transaction does not commit if it con.icts with an ongoing software transaction....
[...]
...Apart from boosting the performance of HyTM, best-effort HTM also supports a number of other useful purposes, such as selectively eliding locks, optimizing nonblocking data structures, and optimizing the Dynamic Software Transactional Memory (DSTM) system of Herlihy et al....
[...]
...2 Independently, Harris and Fraser also developed an STM that uses a table of ownership records [8]....
[...]

Proceedings Article•DOI•

Speedy transactions in multicore in-memory databases

[...]

Stephen Tu¹, Wenting Zheng¹, Eddie Kohler², Barbara Liskov¹, Samuel Madden¹ - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, Harvard University²

03 Nov 2013

TL;DR: A commit protocol based on optimistic concurrency control that provides serializability while avoiding all shared-memory writes for records that were only read, which achieves excellent performance and scalability on modern multicore machines.

...read moreread less

Abstract: Silo is a new in-memory database that achieves excellent performance and scalability on modern multicore machines. Silo was designed from the ground up to use system memory and caches efficiently. For instance, it avoids all centralized contention points, including that of centralized transaction ID assignment. Silo's key contribution is a commit protocol based on optimistic concurrency control that provides serializability while avoiding all shared-memory writes for records that were only read. Though this might seem to complicate the enforcement of a serial order, correct logging and recovery is provided by linking periodically-updated epochs with the commit protocol. Silo provides the same guarantees as any serializable database without unnecessary scalability bottlenecks or much additional latency. Silo achieves almost 700,000 transactions per second on a standard TPC-C workload mix on a 32-core machine, as well as near-linear scalability. Considered per core, this is several times higher than previously reported results.

...read moreread less

509 citations

Cites result from "Transactional locking II"

...Recent STM systems, including TL2 [9], are based on optimistic concurrency control and maintain read and write sets similar to those in OCC databases....
[...]

Proceedings Article•DOI•

On the correctness of transactional memory

[...]

Rachid Guerraoui¹, Michal Kapalka¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

20 Feb 2008

TL;DR: Opacity is defined as a property of concurrent transaction histories and its graph theoretical interpretation is given and it is proved that every single-version TM system that uses invisible reads and does not abort non-conflicting transactions requires, in the worst case, k steps for an operation to terminate.

...read moreread less

Abstract: Transactional memory (TM) is perceived as an appealing alternative to critical sections for general purpose concurrent programming. Despite the large amount of recent work on TM implementations, however, very little effort has been devoted to precisely defining what guarantees these implementations should provide. A formal description of such guarantees is necessary in order to check the correctness of TM systems, as well as to establish TM optimality results and inherent trade-offs.This paper presents opacity, a candidate correctness criterion for TM implementations. We define opacity as a property of concurrent transaction histories and give its graph theoretical interpretation. Opacity captures precisely the correctness requirements that have been intuitively described by many TM designers. Most TM systems we know of do ensure opacity.At a very first approximation, opacity can be viewed as an extension of the classical database serializability property with the additional requirement that even non-committed transactions are prevented from accessing inconsistent states. Capturing this requirement precisely, in the context of general objects, and without precluding pragmatic strategies that are often used by modern TM implementations, such as versioning, invisible reads, lazy updates, and open nesting, is not trivial.As a use case of opacity, we prove the first lower bound on the complexity of TM implementations. Basically, we show that every single-version TM system that uses invisible reads and does not abort non-conflicting transactions requires, in the worst case, ?(k) steps for an operation to terminate, where k is the total number of objects shared by transactions. This (tight) bound precisely captures an inherent trade-off in the design of TM systems. The bound also highlights a fundamental gap between systems in which transactions can be fully isolated from the outside environment, e.g., databases or certain specialized transactional languages, and systems that lack such isolation capabilities, e.g., general TM frameworks.

...read moreread less

500 citations

Cites background from "Transactional locking II"

...These strategies are usually implemented using the single-writer multiple-readers pattern, with either explicit locks (e.g., TL2) or “virtual”, revocable ones (e.g., obstruction-free TMs, such as DSTM, ASTM and SXM), sometimes with a multi-versioning scheme (e.g., LSA-STM and JVSTM) or specialized optimization strategies....
[...]
...This is confirmed by the already mentioned counterexample TM implementations that have the time complexity either constant or at least independent of k (e.g., RSTM, JVSTM, TL2, etc.)....
[...]
...Most transactional memory systems we know of ensure opacity, including DSTM [14], ASTM [18], SXM [13], JVSTM [5], TL2 [6], LSA-STM [25] and RSTM [19]....
[...]
...However, it is commonly argued that sandboxing is expensive and applicable only to specific run-time environments [6]....
[...]
...It is worth noting that TL2 has a constant time complexity, although it ensures opacity, uses invisible reads, and is singleversion....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Space/time trade-offs in hash coding with allowable errors

[...]

Burton H. Bloom

01 Jul 1970-Communications of The ACM

TL;DR: Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.

...read moreread less

Abstract: In this paper trade-offs among certain computational factors in hash coding are analyzed. The paradigm problem considered is that of testing a series of messages one-by-one for membership in a given set of messages. Two new hash-coding methods are examined and compared with a particular conventional hash-coding method. The computational factors considered are the size of the hash area (space), the time required to identify a message as a nonmember of the given set (reject time), and an allowable error frequency.The new methods are intended to reduce the amount of space required to contain the hash-coded information from that associated with conventional methods. The reduction in space is accomplished by exploiting the possibility that a small fraction of errors of commission may be tolerable in some applications, in particular, applications in which a large amount of data is involved and a core resident hash area is consequently not feasible using conventional methods.In such applications, it is envisaged that overall performance could be improved by using a smaller core resident hash area in conjunction with the new methods and, when necessary, by using some secondary and perhaps time-consuming test to “catch” the small fraction of errors associated with the new methods. An example is discussed which illustrates possible areas of application for the new methods.Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.

...read moreread less

7,390 citations

"Transactional locking II" refers methods in this paper

...The transactional load first checks (using a Bloom filter [24]) to see if the load address already appears in the write-set....
[...]

Proceedings Article•DOI•

Transactional memory: architectural support for lock-free data structures

[...]

Maurice Herlihy, J. Eliot B. Moss¹•Institutions (1)

University of Massachusetts Amherst¹

01 May 1993

TL;DR: Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.

...read moreread less

Abstract: A shared data structure is lock-free if its operations do not require mutual exclusion. If one process is interrupted in the middle of an operation, other processes will not be prevented from operating on that object. In highly concurrent systems, lock-free data structures avoid common problems associated with conventional locking techniques, including priority inversion, convoying, and difficulty of avoiding deadlock. This paper introduces transactional memory, a new multiprocessor architecture intended to make lock-free synchronization as efficient (and easy to use) as conventional techniques based on mutual exclusion. Transactional memory allows programmers to define customized read-modify-write operations that apply to multiple, independently-chosen words of memory. It is implemented by straightforward extensions to any multiprocessor cache-coherence protocol. Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.

...read moreread less

2,406 citations

"Transactional locking II" refers background in this paper

...The transactional memory programming paradigm of Herlihy and Moss [1] is gaining momentum as the approach of choice for replacing locks in concurrent programming....
[...]

Proceedings Article•DOI•

Software transactional memory for dynamic-sized data structures

[...]

Maurice Herlihy¹, Victor Luchangco², Mark Moir², William N. Scherer³•Institutions (3)

Brown University¹, Sun Microsystems Laboratories², University of Rochester³

13 Jul 2003

TL;DR: A new form of software transactional memory designed to support dynamic-sized data structures, and a novel non-blocking implementation of this STM that uses modular contention managers to ensure progress in practice.

...read moreread less

Abstract: We propose a new form of software transactional memory (STM) designed to support dynamic-sized data structures, and we describe a novel non-blocking implementation. The non-blocking property we consider is obstruction-freedom. Obstruction-freedom is weaker than lock-freedom; as a result, it admits substantially simpler and more efficient implementations. A novel feature of our obstruction-free STM implementation is its use of modular contention managers to ensure progress in practice. We illustrate the utility of our dynamic STM with a straightforward implementation of an obstruction-free red-black tree, thereby demonstrating a sophisticated non-blocking dynamic data structure that would be difficult to implement by other means. We also present the results of simple preliminary performance experiments that demonstrate that an "early release" feature of our STM is useful for reducing contention, and that our STM lends itself to the effective use of modular contention managers.

...read moreread less

1,068 citations

"Transactional locking II" refers methods in this paper

...We present here a set of microbenchmarks that have become standard in the community [25], comparing a sequential red-black tree made concurrent using various algorithms representing state-of-the-art non-blocking [6] and lock-based [5, 18] STMs....
[...]

Journal Article•DOI•

Software transactional memory

[...]

Nir N. Shavit¹, Dan Touitou¹•Institutions (1)

Tel Aviv University¹

12 Feb 1997-Distributed Computing

TL;DR: STM is used to provide a general highly concurrent method for translating sequential object implementations to non-blocking ones based on implementing a k-word compare&swap STM-transaction, and outperforms Herlihy’s translation method for sufficiently large numbers of processors.

...read moreread less

Abstract: As we learn from the literature, flexibility in choosing synchronization operations greatly simplifies the task of designing highly concurrent programs. Unfortunately, existing hardware is inflexible and is at best on the level of a Load–Linked/Store–Conditional operation on a single word. Building on the hardware based transactional synchronization methodology of Herlihy and Moss, we offer software transactional memory (STM), a novel software method for supporting flexible transactional programming of synchronization operations. STM is non-blocking, and can be implemented on existing machines using only a Load–Linked/Store–Conditional operation. We use STM to provide a general highly concurrent method for translating sequential object implementations to non-blocking ones based on implementing a k-word compare&swap STM-transaction. Empirical evidence collected on simulated multiprocessor architectures shows that our method always outperforms the non-blocking translation methods in the style of Barnes, and outperforms Herlihy’s translation method for sufficiently large numbers of processors. The key to the efficiency of our software-transactional approach is that unlike Barnes style methods, it is not based on a costly “recursive helping” policy.

...read moreread less

880 citations

"Transactional locking II" refers methods in this paper

...STM design has come a long way since the first STM algorithm by Shavit and Touitou [12], which provided a non-blocking implementation of static transactions (see [5–8, 15, 9–13])....
[...]

Journal Article•DOI•

Transactional Memory Coherence and Consistency

[...]

Lance Hammond¹, Vicky Wong¹, Michael Chen¹, Brian D. Carlstrom¹, John D. Davis¹, Ben Hertzberg¹, Manohar K. Prabhu¹, Honggo Wijaya¹, Christos Kozyrakis¹, Kunle Olukotun¹ - Show less +6 more•Institutions (1)

Stanford University¹

02 Mar 2004

TL;DR: To explore the costs and benefits of TCC, the characteristics of an optimal transaction-based memory system are studied, and how different design parameters could affect the performance of real systems are examined.

...read moreread less

Abstract: In this paper, we propos a new shared memory model: Transactionalmemory Coherence and Consistency (TCC).TCC providesa model in which atomic transactions are always the basicunit of parallel work, communication, memory coherence, andmemory reference consistency.TCC greatly simplifies parallelsoftware by eliminating the need for synchronization using conventionallocks and semaphores, along with their complexities.TCC hardware must combine all writes from each transaction regionin a program into a single packet and broadcast this packetto the permanent shared memory state atomically as a large block.This simplifies the coherence hardware because it reduces theneed for small, low-latency messages and completely eliminatesthe need for conventional snoopy cache coherence protocols, asmultiple speculatively written versions of a cache line may safelycoexist within the system.Meanwhile, automatic, hardware-controlledrollback of speculative transactions resolves any correctnessviolations that may occur when several processors attemptto read and write the same data simultaneously.The cost of thissimplified scheme is higher interprocessor bandwidth.To explore the costs and benefits of TCC, we study the characterisitcsof an optimal transaction-based memory system, and examinehow different design parameters could affect the performanceof real systems.Across a spectrum of applications, the TCC modelitself did not limit available parallelism.Most applications areeasily divided into transactions requiring only small write buffers,on the order of 4-8 KB.The broadcast requirements of TCCare high, but are well within the capabilities of CMPs and small-scaleSMPs with high-speed interconnects.

...read moreread less

771 citations

"Transactional locking II" refers background in this paper

...Providing large scale transactions in hardware tends to introduce large degrees of complexity into the design [1–4]....
[...]
...There are currently proposals for hardware implementations of transactional memory (HTM) [1–4], purely software based ones, i....
[...]