scispace - formally typeset
Book ChapterDOI

Reconstructing hardware transactional memory for workload optimized systems

Reads0
Chats0
TLDR
It is argued that Hardware Transactional Memory (HTM) can be a suitable implementation choice for these systems and the knowledge about the workload is extremely useful to make appropriate design choices in the workload optimized HTM.
Abstract
Workload optimized systems consisting of large number of general and special purpose cores, and with a support for shared memory programming, are slowly becoming prevalent. One of the major impediments for effective parallel programming on these systems is lock-based synchronization. An alternate synchronization solution called Transactional Memory (TM) is currently being explored.We observe that most of the TM design proposals in literature are catered to match the constrains of general purpose computing platforms. Given the fact that workload optimized systems utilize wider hardware design spaces and on-chip parallelism, we argue that Hardware Transactional Memory (HTM) can be a suitable implementation choice for these systems. We re-evaluate the criteria to be satisfied by a HTM and identify possible scope for relaxations in the context of workload optimized systems. Based on the relaxed criteria, we demonstrate the scope for building HTM design variants, such that, each variant caters to a specific workload requirement. We carry out suitable experiments to bring about the trade-off between the design variants. Overall, we show how the knowledge about the workload is extremely useful to make appropriate design choices in the workload optimized HTM.

read more

Citations
More filters
Journal ArticleDOI

Parallel Scientific Computation: A Structured Approach using BSP and MPI

TL;DR: This is the first textbook provides a comprehensive overview of the technical aspects of building parallel programs using BSP and BSPlib, and is contemporary, well presented, and balanced between concepts and the technical depth required for developing parallel algorithms.
References
More filters
Proceedings Article

Dynamic instrumentation of production systems

TL;DR: DTrace features the ability to dynamically instrument both user-level and kernel-level software in a unified and absolutely safe fashion and features a C-like high-level control language to describe the predicates and actions at a given point of instrumentation.
Journal ArticleDOI

Energy management for commercial servers

TL;DR: Commercial-server energy management now focuses on conserving power in the memory and microprocessor subsystems, which is more applicable to multiprocessor environments in commercial servers than techniques that primarily apply to single-application environments, such as those based on compiler optimizations.
Book

Transactional Memory

Ravi Rajwar
TL;DR: This book presents an overview of the state of the art in the design and implementation of transactional memory systems, as of early summer 2006.
Proceedings ArticleDOI

CFLRU: a replacement algorithm for flash memory

TL;DR: The Clean-First LRU (CFLRU) replacement algorithm is proposed that exploits the characteristics of flash memory and reduces the average replacement cost by 28.4% in swap system and by 26.2% in buffer cache, compared with LRU algorithm.

Understanding the Efficiency of Ray Traversal on GPUs - Kepler and Fermi Addendum

TL;DR: A simple solution is proposed that significantly narrows the gap between simulation and measurement, and results in the fastest GPU ray tracer to date.
Related Papers (5)