Book ChapterDOI
Reconstructing hardware transactional memory for workload optimized systems
Kunal Korgaonkar,Prabhat Jain,Deepak Tomar,Kashyap Garimella,V. Kamakoti +4 more
- pp 1-15
Reads0
Chats0
TLDR
It is argued that Hardware Transactional Memory (HTM) can be a suitable implementation choice for these systems and the knowledge about the workload is extremely useful to make appropriate design choices in the workload optimized HTM.Abstract:
Workload optimized systems consisting of large number of general and special purpose cores, and with a support for shared memory programming, are slowly becoming prevalent. One of the major impediments for effective parallel programming on these systems is lock-based synchronization. An alternate synchronization solution called Transactional Memory (TM) is currently being explored.We observe that most of the TM design proposals in literature are catered to match the constrains of general purpose computing platforms. Given the fact that workload optimized systems utilize wider hardware design spaces and on-chip parallelism, we argue that Hardware Transactional Memory (HTM) can be a suitable implementation choice for these systems. We re-evaluate the criteria to be satisfied by a HTM and identify possible scope for relaxations in the context of workload optimized systems. Based on the relaxed criteria, we demonstrate the scope for building HTM design variants, such that, each variant caters to a specific workload requirement. We carry out suitable experiments to bring about the trade-off between the design variants. Overall, we show how the knowledge about the workload is extremely useful to make appropriate design choices in the workload optimized HTM.read more
Citations
More filters
Journal ArticleDOI
Parallel Scientific Computation: A Structured Approach using BSP and MPI
TL;DR: This is the first textbook provides a comprehensive overview of the technical aspects of building parallel programs using BSP and BSPlib, and is contemporary, well presented, and balanced between concepts and the technical depth required for developing parallel algorithms.
References
More filters
Bursty Tracing: A Framework for Low-Overhead Temporal Profiling
TL;DR: This work describes and evaluates a framework for low-overhead temporal profiling that extends the Arnold-Ryder framework that uses instrumentation and counter-based sampling to collect frequency profiles with low overhead and describes techniques that reduce this overhead to 3-18%, making it suitable for use in an on-line setting.
Journal ArticleDOI
TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory
TL;DR: TokenTM is a unbounded HTM that uses the abstraction of tokens to precisely track conflicts on an unbounded number of memory blocks and implements tokens with new mechanisms, including metastate fission/fusion and fast token release.
Proceedings ArticleDOI
An FPGA-based VLIW processor with custom hardware execution
TL;DR: This paper presents an architecture that combines VLIW (Very Large Instruction Word) processing with the capability to introduce application specific customized instructions and complex hardware functions that allows for an overall speedup of 30X and 12X on average for signal processing benchmarks from the MediaBench.
Proceedings ArticleDOI
SIFT implementation and optimization for multi-core systems
TL;DR: The result shows the improved parallel SIFT implementation can process general video images in super-real-time on a dual-socket, quad-core system, and the speed is much faster than the implementation on GPUs.
Journal Article
High-Precision Floating-Point Arithmetic in Scientific Computation
TL;DR: The IEEE 64-bit floating-point arithmetic is sufficiently accurate for most scientific applications, but for a rapidly growing body of important scientific computing applications, a higher level of numeric precision is required: some of these applications require roughly twice this level; others require four times; while still others require hundreds or more digits to obtain numerically meaningful results.