scispace - formally typeset
Open AccessJournal ArticleDOI

The Execution Migration Machine: Directoryless Shared-Memory Architecture

Reads0
Chats0
TLDR
For certain applications involving chip multiprocessors with more than 16 cores, a directoryless architecture with fine-grained and partial-context thread migration can outperform directory-based coherence, providing lighter on-chip traffic and reduced verification complexity.
Abstract
For certain applications involving chip multiprocessors with more than 16 cores, a directoryless architecture with fine-grained and partial-context thread migration can outperform directory-based coherence, providing lighter on-chip traffic and reduced verification complexity.

read more

Citations
More filters
Proceedings ArticleDOI

Highly scalable near memory processing with migrating threads on the emu system architecture

TL;DR: A new, highly-scalable PGAS memory-centric system architecture where migrating threads travel to the data they access, and a comparison of key parameters with a variety of today's systems, of differing architectures, indicates the potential advantages.
Proceedings ArticleDOI

Zero Directory Eviction Victim: Unbounded Coherence Directory and Core Cache Isolation

TL;DR: The Zero Directory Eviction Victim (Ze-roDEV) coherence protocol as discussed by the authors eliminates the directory and the coherence information eviction victims in a cache-coherent chip-multiprocessor (CMP).
Proceedings ArticleDOI

Designing Algorithms for the EMU Migrating-threads-based Architecture

TL;DR: This work identifies several design considerations that need to be taken care of while developing applications for the new EMU architecture and they are evaluated and their performance effects on the EMU-chick hardware are evaluated.
Proceedings ArticleDOI

Efficient parallel packet processing using a shared memory many-core processor with hardware support to accelerate communication

TL;DR: A thorough characterization of a multithreaded packet processing application is performed to quantify the opportunities from exploiting concurrency, as well as identify scalability bottlenecks in futuristic shared memory multicores.
Journal ArticleDOI

Hardware-Level Thread Migration to Reduce On-Chip Data Movement Via Reinforcement Learning

TL;DR: Reinforcement learning is proposed to use reinforcement learning (RL) to learn relatively complex data access patterns to improve on hardware-level thread migration techniques, and it is shown that a migration policy which recognizes more complex dataAccess patterns can be learned.
References
More filters
Proceedings ArticleDOI

The SPLASH-2 programs: characterization and methodological considerations

TL;DR: This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality.
Journal ArticleDOI

From the authors

TL;DR: The members of the European Respiratory Society Task Force on Exercise Testing in Clinical Practice have read with interest the letter from J.E. Cotes and J.W. Reed and are of the opinion that any response to the points raised therein should be placed in the context of a recently published Task Force 1.
Proceedings ArticleDOI

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

TL;DR: This paper proposes physical designs for these Non-Uniform Cache Architectures (NUCAs) and extends these physical designs with logical policies that allow important data to migrate toward the processor within the same level of the cache.

Processor: A 64-Core SoC with Mesh Interconnect

TL;DR: The TILE64TM processor as mentioned in this paper is a multicore SoC targeting the high-performance demands of a wide range of embedded applications across networking and digital multimedia applications, with 64 tile processors arranged in an 8x8 array.
Proceedings ArticleDOI

TILE64 - Processor: A 64-Core SoC with Mesh Interconnect

TL;DR: The TILE64TM processor is a multicore SoC targeting the high-performance demands of a wide range of embedded applications across networking and digital multimedia applications.