scispace - formally typeset
Book ChapterDOI

Reconstructing hardware transactional memory for workload optimized systems

Reads0
Chats0
TLDR
It is argued that Hardware Transactional Memory (HTM) can be a suitable implementation choice for these systems and the knowledge about the workload is extremely useful to make appropriate design choices in the workload optimized HTM.
Abstract
Workload optimized systems consisting of large number of general and special purpose cores, and with a support for shared memory programming, are slowly becoming prevalent. One of the major impediments for effective parallel programming on these systems is lock-based synchronization. An alternate synchronization solution called Transactional Memory (TM) is currently being explored.We observe that most of the TM design proposals in literature are catered to match the constrains of general purpose computing platforms. Given the fact that workload optimized systems utilize wider hardware design spaces and on-chip parallelism, we argue that Hardware Transactional Memory (HTM) can be a suitable implementation choice for these systems. We re-evaluate the criteria to be satisfied by a HTM and identify possible scope for relaxations in the context of workload optimized systems. Based on the relaxed criteria, we demonstrate the scope for building HTM design variants, such that, each variant caters to a specific workload requirement. We carry out suitable experiments to bring about the trade-off between the design variants. Overall, we show how the knowledge about the workload is extremely useful to make appropriate design choices in the workload optimized HTM.

read more

Citations
More filters
Journal ArticleDOI

Parallel Scientific Computation: A Structured Approach using BSP and MPI

TL;DR: This is the first textbook provides a comprehensive overview of the technical aspects of building parallel programs using BSP and BSPlib, and is contemporary, well presented, and balanced between concepts and the technical depth required for developing parallel algorithms.
References
More filters
Proceedings ArticleDOI

Heap data management for limited local memory (LLM) multi-core processors

TL;DR: A semi-automatic, and scalable scheme for heap data management that hides this complexity in a library with a much natural programming interface is proposed, and for embedded applications, where the maximum heap size can be known at compile time, optimizations on the heap management to significantly improve the application performance are proposed.
Proceedings ArticleDOI

Efficient dynamic heap allocation of scratch-pad memory

TL;DR: This paper presents the Scratch-Pad Memory Allocator, a light-weight memory management algorithm, specifically designed to manage small on-chip memories that manages small memories efficiently and scales well under load when multiple competing cores access shared memory.
Journal ArticleDOI

Compiler-directed scratchpad memory management via graph coloring

TL;DR: This article introduces a general-purpose compiler approach, called memory coloring, to assign static data aggregates, such as arrays and structs, in a program to an SPM, and shows that this methodology is capable of managing SPMs efficiently and effectively for large embedded applications.
Proceedings ArticleDOI

Dynamic trace selection using performance monitoring hardware sampling

TL;DR: The profiling system provides a framework for collecting information required for performing run-time optimization, and the results show that the profile and patching techniques are able to capture 58% of execution time across various SPEC2000 integer benchmarks.
Related Papers (5)