scispace - formally typeset
Proceedings ArticleDOI

Author's retrospective for: improving the performance of speculatively parallel applications on the hydra CMP

Reads0
Chats0
TLDR
The ability of TLS hardware to allow programmers to parallelize code almost arbitrarily and then performance tune afterwards, based on feedback supplied by the TLS system, provided significant improvements to programmer productivity and made parallel programming much less error-prone.
Abstract
Our 1999 paper described how to use hardware with thread-level speculation (TLS) support to effectively parallelize a number of serial application benchmarks with minimal programmer intervention required. The ability of TLS hardware to allow programmers to parallelize code almost arbitrarily and then performance tune afterwards, based on feedback supplied by the TLS system, provided significant improvements to programmer productivity and made parallel programming much less error-prone. Since this paper appeared, we investigated other hardware variations that could provide similar benefits in terms of programmer productivity, such as ones based on an extension of transactional memory. Unfortunately, these concepts have not been implemented on any real systems. As a result, there is still an opportunity to implement schemes like the ones that we described in this paper in order to ease parallel programming in future systems dramatically.

read more

References
More filters
Proceedings ArticleDOI

Simultaneous multithreading: maximizing on-chip parallelism

TL;DR: Simultaneous multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain multi-threading, and is an attractive alternative to single-chip multiprocessors.
Book

Programming with POSIX threads

TL;DR: This book offers an in-depth description of the IEEE operating system interface standard, POSIXAE (Portable Operating System Interface) threads, commonly called Pthreads, and explains basic concepts such as asynchronous programming, the lifecycle of a thread, and synchronization.
Proceedings ArticleDOI

Transactional Memory Architecture and Implementation for IBM System Z

TL;DR: The implementation in the IBM zEnterprise EC12 (zEC12) microprocessor generation, focusing on how transactional memory can be embedded into the existing cache design and multiprocessor shared-memory infrastructure, is described.
Proceedings ArticleDOI

Programming with transactional coherence and consistency (TCC)

TL;DR: Two basic programming language constructs for decomposing programs into transactions, a loop conversion syntax and a general transaction-forking mechanism are described, so that writing correct parallel programs requires only small, incremental changes to correct sequential programs.
Proceedings ArticleDOI

The Jrpm system for dynamically parallelizing Java programs

TL;DR: Results demonstrate that Jrpm can exploit thread-level parallelism with minimal effort from the programmer, and performance was achieved by automatic selection of thread decompositions by the hardware profiler, intra-procedural optimizations on code compiled dynamically into speculative threads, and some minor programmer transformations for exposing parallelism that cannot be performed automatically.
Related Papers (5)