scispace - formally typeset
Proceedings ArticleDOI

STM2: A Parallel STM for High Performance Simultaneous Multithreading Systems

TLDR
STM2 is presented, a novel parallel STM designed for high performance, aggressive multithreading systems, and shows speedups between 1.8x and 5.2x over the tested STM systems, on average, with peaks up to 12.4x.
Abstract
Extracting high performance from modern chip multithreading (CMT) processors is a complex task, especially for large CMT systems. Programmers must efficiently parallelize performance-critical software while avoiding deadlocks and race conditions. Transactional memory (TM) is a promising programming model that allows programmers to focus on parallelism rather than maintaining correctness and avoiding deadlock. Software-only implementations (STMs) are especially compelling because they run on commodity hardware, therefore providing high portability. Unfortunately, STM systems usually suffer from high overheads, which may limit their usage especially at scale. In this paper we present STM2, a novel parallel STM designed for high performance, aggressive multithreading systems. STM2 significantly lowers runtime overhead by offloading read-set validation, bookkeeping and conflict detection to auxiliary threads running on sibling hardware threads. Auxiliary threads perform STM operations in parallel with their paired application threads and absorb STM overhead, significantly improving performance. We exploit the fact that, on modern multi-core processors, sets of cores can share L1 or L2 caches. This lets us achieve closer coupling between the application thread and the auxiliary thread (when compared with a traditional multi-processor systems). Our results, performed on an IBM POWER7 machine, a state-of-the-art, aggressive multi-threaded system, show that our approach outperforms several well-known STM implementations. In particular, STM2 shows speedups between 1.8x and 5.2x over the tested STM systems, on average, with peaks up to 12.8x.

read more

Citations
More filters
Proceedings ArticleDOI

Sandboxing transactional memory

TL;DR: Transactional memory systems for managed languages can leverage type safety, just-in-time compilation, and fully monitored exceptions to sandbox transactions, isolating the rest of the system from damaging effects of inconsistent speculation.
Proceedings ArticleDOI

Using elimination and delegation to implement a scalable NUMA-friendly stack

TL;DR: This work proposes the first NUMA-friendly stack design that improves data locality and minimizes interconnect contention by using a dedicated server thread that performs all operations requested by the client threads and combining elimination and delegation.
Proceedings ArticleDOI

Remote Invalidation: Optimizing the Critical Path of Memory Transactions

TL;DR: Remote Invalidation (or RInval) is a new STM algorithm that reduces overheads and improves STM performance by remote execution of commit and invalidation routines and cache-aligned communication, and reduces the overhead of spin locking and cache misses on shared locks.

Transactional Semantics with Zombies

TL;DR: This paper focuses on the run-time level, where the semantics of individual operations (start, read, write, try-commit) govern the interactions between the compiler and the TM system.

On Improving Transactional Memory: Optimistic Transactional Boosting, Remote Execution, and Hybrid Transactions

Ahmed Hassan
TL;DR: This dissertation designs an optimistic methodology for transactional boosting to specifically enhance the performance of the transactional data structures, and proposes a hybrid TM solution which exploits the new HTM features of the currently released Intel's Haswell processor.
References
More filters
Proceedings ArticleDOI

Validity of the single processor approach to achieving large scale computing capabilities

TL;DR: In this paper, the authors argue that the organization of a single computer has reached its limits and that truly significant advances can be made only by interconnection of a multiplicity of computers in such a manner as to permit cooperative solution.
Proceedings ArticleDOI

Transactional memory: architectural support for lock-free data structures

TL;DR: Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.
Proceedings ArticleDOI

The implementation of the Cilk-5 multithreaded language

TL;DR: Cilk-5's novel "two-clone" compilation strategy and its Dijkstra-like mutual-exclusion protocol for implementing the ready deque in the work-stealing scheduler are presented.
Proceedings ArticleDOI

STAMP: Stanford Transactional Applications for Multi-Processing

TL;DR: This paper introduces the Stanford Transactional Application for Multi-Processing (STAMP), a comprehensive benchmark suite for evaluating TM systems and uses the suite to evaluate six different TM systems, identify their shortcomings, and motivate further research on their performance characteristics.
Journal Article

Transactional locking II

TL;DR: This paper introduces the transactional locking II (TL2) algorithm, a software transactional memory (STM) algorithm based on a combination of commit-time locking and a novel global version-clock based validation technique, which is ten-fold faster than a single lock.