scispace - formally typeset
Search or ask a question
Topic

Transactional memory

About: Transactional memory is a research topic. Over the lifetime, 2365 publications have been published within this topic receiving 60818 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This work proposes GPU-LocalTM as a lightweight and efficient transactional memory (TM) for GPU local memory, which provides from 1.1X up to 100X speedup over serialized critical sections.
Abstract: Graphics Processing Units (GPUs) have become the accelerator of choice for data-parallel applications, enabling the execution of thousands of threads in a Single Instruction - Multiple Thread (SIMT) fashion. Using OpenCL terminology, GPUs offer a global memory space shared by all the threads in the GPU, as well as a local memory space shared by only a subset of the threads. Programmers can use local memory as a scratchpad to improve the performance of their applications due to its lower latency as compared to global memory. In the SIMT execution model, data locking mechanisms used to protect shared data limit scalability. To take full advantage of the lower latency that local memory affords, and to provide an efficient synchronization mechanism, we propose GPU-LocalTM as a lightweight and efficient transactional memory (TM) for GPU local memory. To minimize the storage resources required for TM support, GPU-LocalTM allocates transactional metadata in the existing memory resources. Additionally, GPU-LocalTM implements different conflict detection mechanisms that can be used to match the characteristics of the application. For the workloads studied in our simulation-based evaluation, GPU-LocalTM provides from 1.1X up to 100X speedup over serialized critical sections.

11 citations

Book ChapterDOI
Florian Haas1, Sebastian Weis1, Theo Ungerer1, Gilles Pokam2, Youfeng Wu2 
03 Apr 2017
TL;DR: This paper proposes a software/hardware hybrid approach, which targets Intel’s current x86 multi-core platforms of the Core and Xeon family, and leverage hardware transactional memory (Intel TSX) to support implicit checkpoint creation and fast rollback.
Abstract: The demand for fault-tolerant execution on high performance computer systems increases due to higher fault rates resulting from smaller structure sizes. As an alternative to hardware-based lockstep solutions, software-based fault-tolerance mechanisms can increase the reliability of multi-core commercial-of-the-shelf (COTS) CPUs while being cheaper and more flexible. This paper proposes a software/hardware hybrid approach, which targets Intel’s current x86 multi-core platforms of the Core and Xeon family. We leverage hardware transactional memory (Intel TSX) to support implicit checkpoint creation and fast rollback. Redundant execution of processes and signature-based comparison of their computations provides error detection, and transactional wrapping enables error recovery. Existing applications are enhanced towards fault-tolerant redundant execution by post-link binary instrumentation. Hardware enhancements to further increase the applicability of the approach are proposed and evaluated with SPEC CPU 2006 benchmarks. The resulting performance overhead is 47% on average, assuming the existence of the proposed hardware support.

11 citations

Proceedings ArticleDOI
23 May 2009
TL;DR: This paper considers the application of Transactional Memory as a means of concurrent accesses to shared data and compares its performance with straightforward parallel versions of the algorithm based on traditional synchronization primitives, and combines TM with Helper Threading to increase the granularity of parallelism and avoid excessive synchronization.
Abstract: In this paper we use Dijkstra's algorithm as a challenging, hard to parallelize paradigm to test the efficacy of several parallelization techniques in a multicore architecture. We consider the application of Transactional Memory (TM) as a means of concurrent accesses to shared data and compare its performance with straightforward parallel versions of the algorithm based on traditional synchronization primitives. To increase the granularity of parallelism and avoid excessive synchronization, we combine TM with Helper Threading (HT). Our simulation results demonstrate that the straightforward parallelization of Dijkstra's algorithm with traditional locks and barriers has, as expected, disappointing performance. On the other hand, TM by itself is able to provide some performance improvement in several cases, while the version based on TM and HT exhibits a significant performance improvement that can reach up to a speedup of 1.46.

11 citations

Book ChapterDOI
01 Aug 2014
TL;DR: Reusable CDTs with polymorphic synchronization against transaction-based, lock-based and lock-free synchronizations on SPARC and x86-64 architectures and they outperform all reusable Java CDTs.
Abstract: This paper contributes to address the fundamental challenge of building Concurrent Data Types CDT that are reusable and scalable at the same time. We do so by proposing the abstraction of Polymorphic Transactions PT: a new programming abstraction that offers different compatible transactions that can run concurrently in the same application. We outline the commonality of the problem in various object-oriented languages and implement PT and a reusable package in Java. With PT, annotating sequential ADTs guarantee novice programmers to obtain an atomic and deadlock-free CDT and let an advanced programmer leverage the application semantics to get higher performance. We compare our polymorphic synchronization against transaction-based, lock-based and lock-free synchronizations on SPARC and x86-64 architectures and we integrate our methodology to a travel reservation benchmark. Although our reusable CDTs are sometimes less efficient than non-composable handcrafted CDTs from the JDK, they outperform all reusable Java CDTs.

11 citations

Journal ArticleDOI
TL;DR: It is shown that some spare abort cannot be avoided, and that there is an inherent tradeoff between the overhead of a TM and the extent to which it reduces the number of spare aborts.
Abstract: This paper takes a step toward developing a theory for understanding aborts in transactional memory systems (TMs). Existing TMs may abort many transactions that could, in fact, commit without violating correctness. We call such unnecessary aborts spare aborts. We classify what kinds of spare aborts can be eliminated, and which cannot. We further study what kinds of spare aborts can be avoided efficiently. Specifically, we show that some spare aborts cannot be avoided, and that there is an inherent tradeoff between the overhead of a TM and the extent to which it reduces the number of spare aborts. We also present an efficient example TM algorithm that avoids certain kinds of spare aborts, and analyze its properties and performance.

11 citations


Network Information
Related Topics (5)
Compiler
26.3K papers, 578.5K citations
87% related
Cache
59.1K papers, 976.6K citations
86% related
Parallel algorithm
23.6K papers, 452.6K citations
84% related
Model checking
16.9K papers, 451.6K citations
84% related
Programming paradigm
18.7K papers, 467.9K citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202316
202240
202129
202063
201970
201888