scispace - formally typeset
Search or ask a question
Journal Article

Transactional locking II

TL;DR: This paper introduces the transactional locking II (TL2) algorithm, a software transactional memory (STM) algorithm based on a combination of commit-time locking and a novel global version-clock based validation technique, which is ten-fold faster than a single lock.
Abstract: The transactional memory programming paradigm is gaining momentum as the approach of choice for replacing locks in concurrent programming. This paper introduces the transactional locking II (TL2) algorithm, a software transactional memory (STM) algorithm based on a combination of commit-time locking and a novel global version-clock based validation technique. TL2 improves on state-of-the-art STMs in the following ways: (1) unlike all other STMs it fits seamlessly with any systems memory life-cycle, including those using malloc/free (2) unlike all other lock-based STMs it efficiently avoids periods of unsafe execution, that is, using its novel version-clock validation, user code is guaranteed to operate only on consistent memory states, and (3) in a sequence of high performance benchmarks, while providing these new properties, it delivered overall performance comparable to (and in many cases better than) that of all former STM algorithms, both lock-based and non-blocking. Perhaps more importantly, on various benchmarks, TL2 delivers performance that is competitive with the best hand-crafted fine-grained concurrent structures. Specifically, it is ten-fold faster than a single lock. We believe these characteristics make TL2 a viable candidate for deployment of transactional memory today, long before hardware transactional support is available.
Citations
More filters
Book
Maurice Herlihy1
14 Mar 2008
TL;DR: Transactional memory as discussed by the authors is a computational model in which threads synchronize by optimistic, lock-free transactions, and there is a growing community of researchers working on both software and hardware support for this approach.
Abstract: Computer architecture is about to undergo, if not another revolution, then a vigorous shaking-up. The major chip manufacturers have, for the time being, simply given up trying to make processors run faster. Instead, they have recently started shipping "multicore" architectures, in which multiple processors (cores) communicate directly through shared hardware caches, providing increased concurrency instead of increased clock speed.As a result, system designers and software engineers can no longer rely on increasing clock speed to hide software bloat. Instead, they must somehow learn to make effective use of increasing parallelism. This adaptation will not be easy. Conventional synchronization techniques based on locks and conditions are unlikely to be effective in such a demanding environment. Coarse-grained locks, which protect relatively large amounts of data, do not scale, and fine-grained locks introduce substantial software engineering problem.Transactional memory is a computational model in which threads synchronize by optimistic, lock-free transactions. This synchronization model promises to alleviate many (not all) of the problems associated with locking, and there is a growing community of researchers working on both software and hardware support for this approach. This talk will survey the area, with a focus on open research problems.

1,268 citations

Proceedings ArticleDOI
30 Sep 2008
TL;DR: This paper introduces the Stanford Transactional Application for Multi-Processing (STAMP), a comprehensive benchmark suite for evaluating TM systems and uses the suite to evaluate six different TM systems, identify their shortcomings, and motivate further research on their performance characteristics.
Abstract: Transactional Memory (TM) is emerging as a promising technology to simplify parallel programming. While several TM systems have been proposed in the research literature, we are still missing the tools and workloads necessary to analyze and compare the proposals. Most TM systems have been evaluated using microbenchmarks, which may not be representative of any real-world behavior, or individual applications, which do not stress a wide range of execution scenarios. We introduce the Stanford Transactional Application for Multi-Processing (STAMP), a comprehensive benchmark suite for evaluating TM systems. STAMP includes eight applications and thirty variants of input parameters and data sets in order to represent several application domains and cover a wide range of transactional execution cases (frequent or rare use of transactions, large or small transactions, high or low contention, etc.). Moreover, STAMP is portable across many types of TM systems, including hardware, software, and hybrid systems. In this paper, we provide descriptions and a detailed characterization of the applications in STAMP. We also use the suite to evaluate six different TM systems, identify their shortcomings, and motivate further research on their performance characteristics.

934 citations


Cites background from "Transactional locking II"

  • ...The largest read and write set sizes are found in labyrinth+, with values of 783 and 779 cache lines, respectively (24.5 KB and 24.3 KB, respectively)....

    [...]

  • ...TM microbenchmarks are typically composed of transactions that execute a few operations on a data structure like a hash table or red-black tree [12, 13, 25, 27, 34]....

    [...]

  • ...To benefit from the increasing number of cores per chip, application developers have to develop parallel programs and deal with cumbersome issues such as synchronization tradeoffs, deadlock avoidance, and races....

    [...]

Proceedings ArticleDOI
20 Oct 2006
TL;DR: Using a simulated multiprocessor with HTM support, the viability of the HyTM approach is demonstrated: it can provide performance and scalability approaching that of an unbounded HTM implementation, without the need to support all transactions with complicatedHTM support.
Abstract: Transactional memory (TM) promises to substantially reduce the difficulty of writing correct, efficient, and scalable concurrent programs But "bounded" and "best-effort" hardware TM proposals impose unreasonable constraints on programmers, while more flexible software TM implementations are considered too slow Proposals for supporting "unbounded" transactions in hardware entail significantly higher complexity and risk than best-effort designsWe introduce Hybrid Transactional Memory (HyTM), an approach to implementing TMin software so that it can use best effort hardware TM (HTM) to boost performance but does not depend on HTM Thus programmers can develop and test transactional programs in existing systems today, and can enjoy the performance benefits of HTM support when it becomes availableWe describe our prototype HyTM system, comprising a compiler and a library The compiler allows a transaction to be attempted using best-effort HTM, and retried using the software library if it fails We have used our prototype to "transactify" part of the Berkeley DB system, as well as several benchmarks By disabling the optional use of HTM, we can run all of these tests on existing systems Furthermore, by using a simulated multiprocessor with HTM support, we demonstrate the viability of the HyTM approach: it can provide performance and scalability approaching that of an unbounded HTM implementation, without the need to support all transactions with complicated HTM support

525 citations


Cites background from "Transactional locking II"

  • ...2.3 Hybrid transactional memory Kumar et al. [12] recently proposed using HTM to optimize the Dynamic Software Transactional Memory (DSTM) of Herlihy et al....

    [...]

  • ...4.4 rand-array benchmark In this section, we report on our simulation studies, in which we used the simple rand-array microbenchmark to evaluate HyTM s ability to mix HTM and STM transactions, and to compare its performance in various con.gurations against standard lock-based approaches....

    [...]

  • ...The key idea to achieving correct interaction between software transactions (i.e., those executed using the STM library) and hard­ware transactions (i.e., those executed using HTM support) is to augment hardware transactions with additional code that ensures that the transaction does not commit if it con.icts with an ongo­ing software transaction....

    [...]

  • ...Apart from boosting the performance of HyTM, best-effort HTM also supports a number of other useful purposes, such as selectively eliding locks, optimizing nonblocking data structures, and optimiz­ing the Dynamic Software Transactional Memory (DSTM) system of Herlihy et al....

    [...]

  • ...2 Independently, Harris and Fraser also developed an STM that uses a table of ownership records [8]....

    [...]

Proceedings ArticleDOI
03 Nov 2013
TL;DR: A commit protocol based on optimistic concurrency control that provides serializability while avoiding all shared-memory writes for records that were only read, which achieves excellent performance and scalability on modern multicore machines.
Abstract: Silo is a new in-memory database that achieves excellent performance and scalability on modern multicore machines. Silo was designed from the ground up to use system memory and caches efficiently. For instance, it avoids all centralized contention points, including that of centralized transaction ID assignment. Silo's key contribution is a commit protocol based on optimistic concurrency control that provides serializability while avoiding all shared-memory writes for records that were only read. Though this might seem to complicate the enforcement of a serial order, correct logging and recovery is provided by linking periodically-updated epochs with the commit protocol. Silo provides the same guarantees as any serializable database without unnecessary scalability bottlenecks or much additional latency. Silo achieves almost 700,000 transactions per second on a standard TPC-C workload mix on a 32-core machine, as well as near-linear scalability. Considered per core, this is several times higher than previously reported results.

509 citations


Cites result from "Transactional locking II"

  • ...Recent STM systems, including TL2 [9], are based on optimistic concurrency control and maintain read and write sets similar to those in OCC databases....

    [...]

Proceedings ArticleDOI
20 Feb 2008
TL;DR: Opacity is defined as a property of concurrent transaction histories and its graph theoretical interpretation is given and it is proved that every single-version TM system that uses invisible reads and does not abort non-conflicting transactions requires, in the worst case, k steps for an operation to terminate.
Abstract: Transactional memory (TM) is perceived as an appealing alternative to critical sections for general purpose concurrent programming. Despite the large amount of recent work on TM implementations, however, very little effort has been devoted to precisely defining what guarantees these implementations should provide. A formal description of such guarantees is necessary in order to check the correctness of TM systems, as well as to establish TM optimality results and inherent trade-offs.This paper presents opacity, a candidate correctness criterion for TM implementations. We define opacity as a property of concurrent transaction histories and give its graph theoretical interpretation. Opacity captures precisely the correctness requirements that have been intuitively described by many TM designers. Most TM systems we know of do ensure opacity.At a very first approximation, opacity can be viewed as an extension of the classical database serializability property with the additional requirement that even non-committed transactions are prevented from accessing inconsistent states. Capturing this requirement precisely, in the context of general objects, and without precluding pragmatic strategies that are often used by modern TM implementations, such as versioning, invisible reads, lazy updates, and open nesting, is not trivial.As a use case of opacity, we prove the first lower bound on the complexity of TM implementations. Basically, we show that every single-version TM system that uses invisible reads and does not abort non-conflicting transactions requires, in the worst case, ?(k) steps for an operation to terminate, where k is the total number of objects shared by transactions. This (tight) bound precisely captures an inherent trade-off in the design of TM systems. The bound also highlights a fundamental gap between systems in which transactions can be fully isolated from the outside environment, e.g., databases or certain specialized transactional languages, and systems that lack such isolation capabilities, e.g., general TM frameworks.

500 citations


Cites background from "Transactional locking II"

  • ...These strategies are usually implemented using the single-writer multiple-readers pattern, with either explicit locks (e.g., TL2) or “virtual”, revocable ones (e.g., obstruction-free TMs, such as DSTM, ASTM and SXM), sometimes with a multi-versioning scheme (e.g., LSA-STM and JVSTM) or specialized optimization strategies....

    [...]

  • ...This is confirmed by the already mentioned counterexample TM implementations that have the time complexity either constant or at least independent of k (e.g., RSTM, JVSTM, TL2, etc.)....

    [...]

  • ...Most transactional memory systems we know of ensure opacity, including DSTM [14], ASTM [18], SXM [13], JVSTM [5], TL2 [6], LSA-STM [25] and RSTM [19]....

    [...]

  • ...However, it is commonly argued that sandboxing is expensive and applicable only to specific run-time environments [6]....

    [...]

  • ...It is worth noting that TL2 has a constant time complexity, although it ensures opacity, uses invisible reads, and is singleversion....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.
Abstract: In this paper trade-offs among certain computational factors in hash coding are analyzed. The paradigm problem considered is that of testing a series of messages one-by-one for membership in a given set of messages. Two new hash-coding methods are examined and compared with a particular conventional hash-coding method. The computational factors considered are the size of the hash area (space), the time required to identify a message as a nonmember of the given set (reject time), and an allowable error frequency.The new methods are intended to reduce the amount of space required to contain the hash-coded information from that associated with conventional methods. The reduction in space is accomplished by exploiting the possibility that a small fraction of errors of commission may be tolerable in some applications, in particular, applications in which a large amount of data is involved and a core resident hash area is consequently not feasible using conventional methods.In such applications, it is envisaged that overall performance could be improved by using a smaller core resident hash area in conjunction with the new methods and, when necessary, by using some secondary and perhaps time-consuming test to “catch” the small fraction of errors associated with the new methods. An example is discussed which illustrates possible areas of application for the new methods.Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.

7,390 citations


"Transactional locking II" refers methods in this paper

  • ...The transactional load first checks (using a Bloom filter [24]) to see if the load address already appears in the write-set....

    [...]

Proceedings ArticleDOI
01 May 1993
TL;DR: Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.
Abstract: A shared data structure is lock-free if its operations do not require mutual exclusion. If one process is interrupted in the middle of an operation, other processes will not be prevented from operating on that object. In highly concurrent systems, lock-free data structures avoid common problems associated with conventional locking techniques, including priority inversion, convoying, and difficulty of avoiding deadlock. This paper introduces transactional memory, a new multiprocessor architecture intended to make lock-free synchronization as efficient (and easy to use) as conventional techniques based on mutual exclusion. Transactional memory allows programmers to define customized read-modify-write operations that apply to multiple, independently-chosen words of memory. It is implemented by straightforward extensions to any multiprocessor cache-coherence protocol. Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.

2,406 citations


"Transactional locking II" refers background in this paper

  • ...The transactional memory programming paradigm of Herlihy and Moss [1] is gaining momentum as the approach of choice for replacing locks in concurrent programming....

    [...]

Proceedings ArticleDOI
13 Jul 2003
TL;DR: A new form of software transactional memory designed to support dynamic-sized data structures, and a novel non-blocking implementation of this STM that uses modular contention managers to ensure progress in practice.
Abstract: We propose a new form of software transactional memory (STM) designed to support dynamic-sized data structures, and we describe a novel non-blocking implementation. The non-blocking property we consider is obstruction-freedom. Obstruction-freedom is weaker than lock-freedom; as a result, it admits substantially simpler and more efficient implementations. A novel feature of our obstruction-free STM implementation is its use of modular contention managers to ensure progress in practice. We illustrate the utility of our dynamic STM with a straightforward implementation of an obstruction-free red-black tree, thereby demonstrating a sophisticated non-blocking dynamic data structure that would be difficult to implement by other means. We also present the results of simple preliminary performance experiments that demonstrate that an "early release" feature of our STM is useful for reducing contention, and that our STM lends itself to the effective use of modular contention managers.

1,068 citations


"Transactional locking II" refers methods in this paper

  • ...We present here a set of microbenchmarks that have become standard in the community [25], comparing a sequential red-black tree made concurrent using various algorithms representing state-of-the-art non-blocking [6] and lock-based [5, 18] STMs....

    [...]

Journal ArticleDOI
TL;DR: STM is used to provide a general highly concurrent method for translating sequential object implementations to non-blocking ones based on implementing a k-word compare&swap STM-transaction, and outperforms Herlihy’s translation method for sufficiently large numbers of processors.
Abstract: As we learn from the literature, flexibility in choosing synchronization operations greatly simplifies the task of designing highly concurrent programs. Unfortunately, existing hardware is inflexible and is at best on the level of a Load–Linked/Store–Conditional operation on a single word. Building on the hardware based transactional synchronization methodology of Herlihy and Moss, we offer software transactional memory (STM), a novel software method for supporting flexible transactional programming of synchronization operations. STM is non-blocking, and can be implemented on existing machines using only a Load–Linked/Store–Conditional operation. We use STM to provide a general highly concurrent method for translating sequential object implementations to non-blocking ones based on implementing a k-word compare&swap STM-transaction. Empirical evidence collected on simulated multiprocessor architectures shows that our method always outperforms the non-blocking translation methods in the style of Barnes, and outperforms Herlihy’s translation method for sufficiently large numbers of processors. The key to the efficiency of our software-transactional approach is that unlike Barnes style methods, it is not based on a costly “recursive helping” policy.

880 citations


"Transactional locking II" refers methods in this paper

  • ...STM design has come a long way since the first STM algorithm by Shavit and Touitou [12], which provided a non-blocking implementation of static transactions (see [5–8, 15, 9–13])....

    [...]

Journal ArticleDOI
02 Mar 2004
TL;DR: To explore the costs and benefits of TCC, the characteristics of an optimal transaction-based memory system are studied, and how different design parameters could affect the performance of real systems are examined.
Abstract: In this paper, we propos a new shared memory model: Transactionalmemory Coherence and Consistency (TCC).TCC providesa model in which atomic transactions are always the basicunit of parallel work, communication, memory coherence, andmemory reference consistency.TCC greatly simplifies parallelsoftware by eliminating the need for synchronization using conventionallocks and semaphores, along with their complexities.TCC hardware must combine all writes from each transaction regionin a program into a single packet and broadcast this packetto the permanent shared memory state atomically as a large block.This simplifies the coherence hardware because it reduces theneed for small, low-latency messages and completely eliminatesthe need for conventional snoopy cache coherence protocols, asmultiple speculatively written versions of a cache line may safelycoexist within the system.Meanwhile, automatic, hardware-controlledrollback of speculative transactions resolves any correctnessviolations that may occur when several processors attemptto read and write the same data simultaneously.The cost of thissimplified scheme is higher interprocessor bandwidth.To explore the costs and benefits of TCC, we study the characterisitcsof an optimal transaction-based memory system, and examinehow different design parameters could affect the performanceof real systems.Across a spectrum of applications, the TCC modelitself did not limit available parallelism.Most applications areeasily divided into transactions requiring only small write buffers,on the order of 4-8 KB.The broadcast requirements of TCCare high, but are well within the capabilities of CMPs and small-scaleSMPs with high-speed interconnects.

771 citations


"Transactional locking II" refers background in this paper

  • ...Providing large scale transactions in hardware tends to introduce large degrees of complexity into the design [1–4]....

    [...]

  • ...There are currently proposals for hardware implementations of transactional memory (HTM) [1–4], purely software based ones, i....

    [...]