scispace - formally typeset
Search or ask a question
Book

Transactional Memory

12 Jan 2007-
TL;DR: This book presents an overview of the state of the art in the design and implementation of transactional memory systems, as of early summer 2006.
Abstract: The advent of multicore processors has renewed interest in the idea of incorporating transactions into the programming model used to write parallel programs. This approach, known as transactional memory, offers an alternative, and hopefully better, way to coordinate concurrent threads. The ACI (atomicity, consistency, isolation) properties of transactions provide a foundation to ensure that concurrent reads and writes of shared data do not produce inconsistent or incorrect results. At a higher level, a computation wrapped in a transaction executes atomically – either it completes successfully and commits its result in its entirety or it aborts. In addition, isolation ensures the transaction produces the same result as if no other transactions were executing concurrently. Although transactions are not a parallel programming panacea, they shift much of the burden of synchronizing and coordinating parallel computations from a programmer to a compiler, runtime system, and hardware. The challenge for the system implementers is to build an efficient transactional memory infrastructure. This book presents an overview of the state of the art in the design and implementation of transactional memory systems, as of early summer 2006.
Citations
More filters
Proceedings ArticleDOI
09 Jun 2007
TL;DR: For certain workloads, SigTM can match the performance of a full-featured hardware TM system, while for workloads with large read-sets it can be up to two times slower.
Abstract: We propose signature-accelerated transactional memory (SigTM), ahybrid TM system that reduces the overhead of software transactions. SigTM uses hardware signatures to track the read-set and write-set forpending transactions and perform conflict detection between concurrent threads. All other transactional functionality, including dataversioning, is implemented in software. Unlike previously proposed hybrid TM systems, SigTM requires no modifications to the hardware caches, which reduces hardware cost and simplifies support for nested transactions and multithreaded processor cores. SigTM is also the first hybrid TM system to provide strong isolation guarantees between transactional blocks and non-transactional accesses without additional read and write barriers in non-transactional code.Using a set of parallel programs that make frequent use of coarse-grain transactions, we show that SigTM accelerates software transactions by 30% to 280%. For certain workloads, SigTM can match the performance of a full-featured hardware TM system, while for workloads with large read-sets it can be up to two times slower. Overall, we show that SigTM combines the performance characteristics and strong isolation guarantees of hardware TM implementations with the low cost and flexibility of software TM systems.

340 citations


Cites background or methods from "Transactional Memory"

  • ...The lack of strong isolation causes this code to behave unpredictably with all STM approaches [18]....

    [...]

  • ...Instead of arguing whether or not TM should eliminate the data races in lock-based synchronization, we examine the privatization code in Figure 3 [18]....

    [...]

  • ...An important feature for TM systems is strong isolation, which facilitates predictable code behavior [18]....

    [...]

  • ...There are several alternative implementations for both approaches [1, 18]....

    [...]

  • ...As as result, STM systems may produce incorrect or unpredictable results even for simple parallel programs that would work correctly with lock-based synchronization [18, 12, 24]....

    [...]

Proceedings ArticleDOI
07 Mar 2009
TL;DR: CTrigger focuses on a special type of interleavings that are inherently correlated to atomicity violation bugs, and uses trace analysis to systematically identify (likely) feasible unserializable interleAVings with low occurrence-probability that are exposed in large programs.
Abstract: Multicore hardware is making concurrent programs pervasive. Unfortunately, concurrent programs are prone to bugs. Among different types of concurrency bugs, atomicity violation bugs are common and important. Existing techniques to detect atomicity violation bugs suffer from one limitation: requiring bugs to manifest during monitored runs, which is an open problem in concurrent program testing.This paper makes two contributions. First, it studies the interleaving characteristics of the common practice in concurrent program testing (i.e., running a program over and over) to understand why atomicity violation bugs are hard to expose. Second, it proposes CTrigger to effectively and efficiently expose atomicity violation bugs in large programs. CTrigger focuses on a special type of interleavings (i.e., unserializable interleavings) that are inherently correlated to atomicity violation bugs, and uses trace analysis to systematically identify (likely) feasible unserializable interleavings with low occurrence-probability. CTrigger then uses minimum execution perturbation to exercise low-probability interleavings and expose difficult-to-catch atomicity violation.We evaluate CTrigger with real-world atomicity violation bugs from four sever/desktop applications (Apache, MySQL, Mozilla, and PBZIP2) and three SPLASH2 applications on 8-core machines. CTrigger efficiently exposes the tested bugs within 1--235 seconds, two to four orders of magnitude faster than stress testing. Without CTrigger, some of these bugs do not manifest even after 7 full days of stress testing. In addition, without deterministic replay support, once a bug is exposed, CTrigger can help programmers reliably reproduce it for diagnosis. Our tested bugs are reproduced by CTrigger mostly within 5 seconds, 300 to over 60000 times faster than stress testing.

324 citations

Proceedings ArticleDOI
15 Oct 2014
TL;DR: This paper identifies failure-atomic sections of code based on existing critical sections and describes a log-based implementation that can be used to recover a consistent state after a failure, and confirms the ability to rapidly flush CPU caches as a core implementation bottleneck and suggest partial solutions.
Abstract: Non-volatile main memory, such as memristors or phase change memory, can revolutionize the way programs persist data. In-memory objects can themselves be persistent without the need for a separate persistent data storage format. However, the challenge is to ensure that such data remains consistent if a failure occurs during execution. In this paper, we present our system, called Atlas, which adds durability semantics to lock-based code, typically allowing us to automatically maintain a globally consistent state even in the presence of failures. We identify failure-atomic sections of code based on existing critical sections and describe a log-based implementation that can be used to recover a consistent state after a failure. We discuss several subtle semantic issues and implementation tradeoffs. We confirm the ability to rapidly flush CPU caches as a core implementation bottleneck and suggest partial solutions. Experimental results confirm the practicality of our approach and provide insight into the overheads of such a system.

271 citations

Proceedings ArticleDOI
09 Jun 2007
TL;DR: The authors identify a set of performance pathologies that could degrade performance in proposed HTM designs and suggest improving conflict resolution could eliminate these pathologies so designers can build robust HTM systems.
Abstract: Hardware Transactional Memory (HTM) systems reflect choices from three key design dimensions: conflict detection, version management, and conflict resolution. Previously proposed HTMs represent three points in this design space: lazy conflict detection, lazy version management, committer wins (LL); eager conflict detection, lazy version management, requester wins (EL); and eager conflict detection, eager version management, and requester stalls with conservative deadlock avoidance (EE). To isolate the effects of these high-level design decisions, we develop a common framework that abstracts away differences in cache write policies, interconnects, and ISA to compare these three design points. Not surprisingly, the relative performance of these systems depends on the workload. Under light transactional loads they perform similarly, but under heavy loads they differ by up to 80%. None of the systems performs best on all of our benchmarks. We identify seven performance pathologies-interactions between workload and system that degrade performance-as the root cause of many performance differences: FriendlyFire, StarvingWriter, SerializedCommit, FutileStall, StarvingElder, RestartConvoy, and DuelingUpgrades. We discuss when and on which systems these pathologies can occur and show that they actually manifest within TM workloads. The insight provided by these pathologies motivated four enhanced systems that often significantly reduce transactional memory overhead. Importantly, by avoiding transaction pathologies, each enhanced system performs well across our suite of benchmarks.

256 citations

Journal ArticleDOI
TL;DR: The promise of STM may likely be undermined by its overheads and workload applicabilities.
Abstract: TM (transactional memory) is a concurrency control paradigm that provides atomic and isolated execution for regions of code. TM is considered by many researchers to be one of the most promising sol...

252 citations

References
More filters
Journal ArticleDOI
TL;DR: Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.
Abstract: In this paper trade-offs among certain computational factors in hash coding are analyzed. The paradigm problem considered is that of testing a series of messages one-by-one for membership in a given set of messages. Two new hash-coding methods are examined and compared with a particular conventional hash-coding method. The computational factors considered are the size of the hash area (space), the time required to identify a message as a nonmember of the given set (reject time), and an allowable error frequency.The new methods are intended to reduce the amount of space required to contain the hash-coded information from that associated with conventional methods. The reduction in space is accomplished by exploiting the possibility that a small fraction of errors of commission may be tolerable in some applications, in particular, applications in which a large amount of data is involved and a core resident hash area is consequently not feasible using conventional methods.In such applications, it is envisaged that overall performance could be improved by using a smaller core resident hash area in conjunction with the new methods and, when necessary, by using some secondary and perhaps time-consuming test to “catch” the small fraction of errors associated with the new methods. An example is discussed which illustrates possible areas of application for the new methods.Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.

7,390 citations


"Transactional Memory" refers background or methods in this paper

  • ...A number of papers discuss transaction handlers that invoke arbitrary pieces of code when a transaction commits or aborts [3, 20, 29, 30]....

    [...]

  • ...A memory load first checks the write set (using a Bloom filter [29]), to determine if the transaction previously updated the object....

    [...]

Journal ArticleDOI
Gerard J. Holzmann1
01 May 1997
TL;DR: An overview of the design and structure of the verifier, its theoretical foundation, and an overview of significant practical applications are given.
Abstract: SPIN is an efficient verification system for models of distributed software systems. It has been used to detect design errors in applications ranging from high-level descriptions of distributed algorithms to detailed code for controlling telephone exchanges. The paper gives an overview of the design and structure of the verifier, reviews its theoretical foundation, and gives an overview of significant practical applications.

4,159 citations


"Transactional Memory" refers background or methods in this paper

  • ...Scherer and Scott describe a number of contention resolution policies and provide examples to show that no policy is uniformly better than all other polices [26]....

    [...]

  • ...Another novel aspect of this system is that the STM algorithm is written in Promela, so it can be directly verified with the SPIN model checker [26]; an exercise that found several data races....

    [...]

Journal ArticleDOI
TL;DR: It is argued that a transaction needs to lock a logical rather than a physical subset of the database, and an implementation of predicate locks which satisfies the consistency condition is suggested.
Abstract: In database systems, users access shared data under the assumption that the data satisfies certain consistency constraints. This paper defines the concepts of transaction, consistency and schedule and shows that consistency requires that a transaction cannot request new locks after releasing a lock. Then it is argued that a transaction needs to lock a logical rather than a physical subset of the database. These subsets may be specified by predicates. An implementation of predicate locks which satisfies the consistency condition is suggested.

2,031 citations


"Transactional Memory" refers background in this paper

  • ...This approach has subtleties that make automatic translation a challenge [5, 33]....

    [...]

  • ...1 Lomet, LDRS 77 Many of the concepts and implementation principles for STM were anticipated in a paper by Lomet in 1977 [4], which was published soon after the classic paper by Eswaran [5] on two-phase locking and transactions....

    [...]

  • ...[5] introduced the terms weak atomicity and strong atomicity....

    [...]

  • ...In a concurrent program, combining context-sensitive analysis with the synchronization analysis necessary to understand the communications between even two threads results in an undecidable problem [5]....

    [...]

  • ...Knowing the memory locations enabled the STM system to acquire ownership of them with the twophase locking protocol [5]....

    [...]

Proceedings ArticleDOI
20 Aug 1995
TL;DR: STM is used to provide a general highly concurrent method for translating sequential object implementations to non-blocking ones based on implementing a k-word compare&swap STM-transaction, a novel software method for supporting flexible transactional programming of synchronization operations.
Abstract: As we learn from the literature, flexibility in choosing synchronization operations greatly simplifies the task of designing highly concurrent programs. Unfortunately, existing hardware is inflexible and is at best on the level of a Load–Linked/Store–Conditional operation on a single word. Building on the hardware based transactional synchronization methodology of Herlihy and Moss, we offer software transactional memory (STM), a novel software method for supporting flexible transactional programming of synchronization operations. STM is non-blocking, and can be implemented on existing machines using only a Load–Linked/Store–Conditional operation. We use STM to provide a general highly concurrent method for translating sequential object implementations to non-blocking ones based on implementing a k-word compare&swap STM-transaction. Empirical evidence collected on simulated multiprocessor architectures shows that our method always outperforms the non-blocking translation methods in the style of Barnes, and outperforms Herlihy’s translation method for sufficiently large numbers of processors. The key to the efficiency of our software-transactional approach is that unlike Barnes style methods, it is not based on a costly “recursive helping” policy.

1,369 citations


"Transactional Memory" refers background in this paper

  • ...Herlihy and Wing [8] proposed linearizability as a correctness condition for operations on shared concurrent objects....

    [...]

  • ...Scientific programming languages, such as High Performance Fortran (HPF) [8], directly support data parallel programming with a collection of operators on matrices and ways to combine these operations....

    [...]

  • ...2 Shavit, Touitou, PODC 1995 Shavit and Touitou’s 1995 PODC paper [8] coined the term “software transactional memory” and described the first software implementation of transactional memory....

    [...]

  • ...Most papers assume the correctness criteria from database transactions (serializability [7]) or concurrent data structures (linearizability [8])....

    [...]

  • ...Scott [8] identifies four practical policies for detecting conflicts (Fig....

    [...]

Book
Maurice Herlihy1
14 Mar 2008
TL;DR: Transactional memory as discussed by the authors is a computational model in which threads synchronize by optimistic, lock-free transactions, and there is a growing community of researchers working on both software and hardware support for this approach.
Abstract: Computer architecture is about to undergo, if not another revolution, then a vigorous shaking-up. The major chip manufacturers have, for the time being, simply given up trying to make processors run faster. Instead, they have recently started shipping "multicore" architectures, in which multiple processors (cores) communicate directly through shared hardware caches, providing increased concurrency instead of increased clock speed.As a result, system designers and software engineers can no longer rely on increasing clock speed to hide software bloat. Instead, they must somehow learn to make effective use of increasing parallelism. This adaptation will not be easy. Conventional synchronization techniques based on locks and conditions are unlikely to be effective in such a demanding environment. Coarse-grained locks, which protect relatively large amounts of data, do not scale, and fine-grained locks introduce substantial software engineering problem.Transactional memory is a computational model in which threads synchronize by optimistic, lock-free transactions. This synchronization model promises to alleviate many (not all) of the problems associated with locking, and there is a growing community of researchers working on both software and hardware support for this approach. This talk will survey the area, with a focus on open research problems.

1,268 citations