NCDE: In-Network Caching for Directory Entries to Expedite Data Access in Tiled-Chip Multiprocessors
Reads0
Chats0
TLDR
In this paper , the authors explore the opportunity of mitigating problems associated with shared data access via in-network caching for directory entries (NCDE), which can utilize every input port's virtual channels to hold directory entries.Abstract:
The processing of data-intensive applications, followed by an unprecedented amount of data traffic, drives explosive accesses to the memory subsystem. The overloaded memory subsystem experiences increased data access latency. To expedite data access, a network caching technique that leverages network-on-chip (NoC) virtual channels (VCs) as an expanded memory subsystem has emerged. Previous network caching studies focused on utilizing VCs on the NoC’s local input port as a victim cache to reduce local data access latency. In contrast to previous studies, we explore the opportunity of mitigating problems associated with shared data access via in-network caching for directory entries (NCDE), which can utilize every input port’s VCs to hold directory entries. NCDE exploits VCs as the victim and prefetch buffers of the directory entries, each reducing directory eviction-induced invalidations and simplifying the cache-to-cache (C2C) data transfer. The effectiveness of NCDE was evaluated using a gem5 full-system simulator, and the results show that the average memory access time (AMAT) and workload execution time were reduced by 7.69% and 5.82%, respectively. As a cost for accelerating the data access latency, implementing NCDE incurs a negligible router area overhead of 1.56%. read more
References
More filters
Journal ArticleDOI
The gem5 simulator
Nathan Binkert,Bradford M. Beckmann,Gabriel Black,Steven K. Reinhardt,Ali G. Saidi,Arkaprava Basu,Joel Hestness,Derek R. Hower,Tushar Krishna,Somayeh Sardashti,Rathijit Sen,Korey Sewell,Muhammad Shoaib,Nilay Vaish,Mark D. Hill,Darien Wood +15 more
TL;DR: The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.
Journal ArticleDOI
A high-performance, portable implementation of the MPI message passing interface standard
TL;DR: The MPI Message Passing Interface (MPI) as mentioned in this paper is a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists.
Journal ArticleDOI
An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS
Sriram R. Vangal,Jason Howard,Greg Ruhl,Saurabh Dighe,H. Wilson,James W. Tschanz,D. Finan,A. Singh,Tiju Jacob,Shailendra Jain,Vasantha Erraguntla,Clark Roberts,Yatin Hoskote,Nitin Borkar,Shekhar Borkar +14 more
TL;DR: In this paper, an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz.
Book
A Primer on Memory Consistency and Cache Coherence
TL;DR: This primer is to provide readers with a basic understanding of consistency and coherence, and presents both highlevel concepts as well as specific, concrete examples from real-world systems.
Journal ArticleDOI
Knights Landing: Second-Generation Intel Xeon Phi Product
Avinash Sodani,Roger Gramunt,Jesus Corbal,Ho-Seop Kim,Krishna N. Vinod,Sundaram Chinthamani,Steven R. Hutsell,Rajat Agarwal,Yen-Chen Liu +8 more
TL;DR: The architecture of Knights Landing, the second-generation Intel Xeon Phi product family, which targets high-performance computing and other highly parallel workloads, provides a significant increase in scalar and vector performance and a big boost in memory bandwidth compared to the prior generation, called Knights Corner.