scispace - formally typeset
Book ChapterDOI

Reconstructing hardware transactional memory for workload optimized systems

Reads0
Chats0
TLDR
It is argued that Hardware Transactional Memory (HTM) can be a suitable implementation choice for these systems and the knowledge about the workload is extremely useful to make appropriate design choices in the workload optimized HTM.
Abstract
Workload optimized systems consisting of large number of general and special purpose cores, and with a support for shared memory programming, are slowly becoming prevalent. One of the major impediments for effective parallel programming on these systems is lock-based synchronization. An alternate synchronization solution called Transactional Memory (TM) is currently being explored.We observe that most of the TM design proposals in literature are catered to match the constrains of general purpose computing platforms. Given the fact that workload optimized systems utilize wider hardware design spaces and on-chip parallelism, we argue that Hardware Transactional Memory (HTM) can be a suitable implementation choice for these systems. We re-evaluate the criteria to be satisfied by a HTM and identify possible scope for relaxations in the context of workload optimized systems. Based on the relaxed criteria, we demonstrate the scope for building HTM design variants, such that, each variant caters to a specific workload requirement. We carry out suitable experiments to bring about the trade-off between the design variants. Overall, we show how the knowledge about the workload is extremely useful to make appropriate design choices in the workload optimized HTM.

read more

Citations
More filters
Journal ArticleDOI

Parallel Scientific Computation: A Structured Approach using BSP and MPI

TL;DR: This is the first textbook provides a comprehensive overview of the technical aspects of building parallel programs using BSP and BSPlib, and is contemporary, well presented, and balanced between concepts and the technical depth required for developing parallel algorithms.
References
More filters
Journal ArticleDOI

Workload and network-optimized computing systems

TL;DR: A recent system-level trend toward the use of massive on-chip parallelism combined with efficient hardware accelerators and integrated networking to enable new classes of applications and computing-systems functionality is described, driven by semiconductor physics and emerging network-application requirements.
Proceedings ArticleDOI

Why the grass may not be greener on the other side: a comparison of locking vs. transactional memory

TL;DR: A constructive critique of locking and transactional memory: their strengths, weaknesses, and challenges is presented.
Proceedings ArticleDOI

FPGA accelerating double/quad-double high precision floating-point applications for ExaScale computing

TL;DR: A two-level RAM banks scheme to store and add long fixed-point numbers with minimized crucial data paths lengths and a scheme of partial summation to enhance the pipeline throughput of MAC operations, by dividing the summation function into 4 partial operations, processed in 4 banks are proposed.
Proceedings ArticleDOI

A hardware algorithm for variable-precision logarithm

TL;DR: An efficient hardware algorithm for variable-precision logarithm that uses an iterative technique that employs table lookups and polynomial approximations and uses significantly smaller tables than related algorithms.
Journal ArticleDOI

CORDIC Processor for Variable-Precision Interval Arithmetic

TL;DR: This system allows us to specify the precision to perform the CORDIC operation, and control the accuracy of the result, in such a way that re-computation of inaccurate results can be carried out with higher precision.
Related Papers (5)