Proceedings ArticleDOI
SWEL: hardware cache coherence protocols to map shared data onto shared caches
Seth H. Pugsley,Josef Spjut,David Nellans,Rajeev Balasubramonian +3 more
- pp 465-476
TLDR
A novel coherence protocol is proposed that greatly reduces the number of coherence operations and falls back on a simple broadcast-based snooping protocol when infrequent coherence is required, based on the premise that most blocks are either private to a core or read-only, and hence, do not require coherence.Citations
More filters
Proceedings ArticleDOI
DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism
Byn Choi,Rakesh Komuravelli,Hyojin Sung,Robert Smolinski,Nima Honarmand,Sarita V. Adve,Vikram Adve,Nicholas P. Carter,Ching-Tsun Chou +8 more
TL;DR: DeNovo is presented, a hardware architecture motivated by a disciplined shared-memory programming model that allows DeNovo to seamlessly integrate message passing-like interactions within a global address space for improved design complexity, performance, and efficiency.
Proceedings ArticleDOI
Complexity-effective multicore coherence
Alberto Ros,Stefanos Kaxiras +1 more
TL;DR: A virtually costless coherence that outperforms a MESI directory protocol while at the same time reducing shared cache and network energy consumption for 15 parallel benchmarks, on 16 cores is shown.
Patent
System and method for simplifying cache coherence using multiple write policies
Stefanos Kaxiras,Alberto Ros +1 more
TL;DR: In this article, a multi-core cache coherence system with a local/shared cache hierarchy is described. The system includes multiple processor cores, a main memory, and a local cache memory associated with each core for storing cache lines accessible only by the associated core.
Proceedings ArticleDOI
TSO-CC: Consistency directed cache coherence for TSO
Marco Elver,Vijay Nagarajan +1 more
TL;DR: TSO-CC does not track sharers, and instead relies on self-invalidation and detection of potential acquires using timestamps to satisfy the TSO memory consistency model lazily, and achieves average performance comparable to a MESI directory protocol.
Proceedings ArticleDOI
DeNovoND: efficient hardware support for disciplined non-determinism
TL;DR: DeNovoND is proposed, a system that supports lock-based, disciplined non-determinism, with the simplicity, performance, and energy benefits of DeNovo, and a coherence protocol that does not require transient states, invalidation traffic, or directories, and does not incur false sharing.
References
More filters
Proceedings ArticleDOI
The SPLASH-2 programs: characterization and methodological considerations
TL;DR: This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality.
Proceedings ArticleDOI
The PARSEC benchmark suite: characterization and architectural implications
TL;DR: This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs), and shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic.
Journal ArticleDOI
The Nas Parallel Benchmarks
David H. Bailey,Eric Barszcz,John T. Barton,D. S. Browning,Russell Carter,Leonardo Dagum,Rod Fatoohi,Paul O. Frederickson,T. A. Lasinski,Robert Schreiber,Horst D. Simon,V. Venkatakrishnan,Sisira Weeratunga +12 more
TL;DR: A new set of benchmarks has been developed for the performance evaluation of highly parallel supercom puters that mimic the computation and data move ment characteristics of large-scale computational fluid dynamics applications.
Journal ArticleDOI
Simics: A full system simulation platform
Peter S. Magnusson,M. Christensson,J. Eskilson,D. Forsgren,G. Hallberg,J. Hogberg,Fredrik Larsson,A. Moestedt,Bengt Werner +8 more
TL;DR: Simics is a platform for full system simulation that can run actual firmware and completely unmodified kernel and driver code, and it provides both functional accuracy for running commercial workloads and sufficient timing accuracy to interface to detailed hardware models.
Book
Parallel Computer Architecture: A Hardware/Software Approach
TL;DR: This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures and provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions.