Router Buffer Caching for Managing Shared Cache Blocks in Tiled Multi-Core Processors

doi:10.1109/ICCD50377.2020.00050

Proceedings Article•DOI•

Router Buffer Caching for Managing Shared Cache Blocks in Tiled Multi-Core Processors

Joe Augustine¹, Raghavendra K², John Jose³, Madhu Mutyam¹•Institutions (3)

Indian Institute of Technology Madras¹, Indian Institutes of Technology², Indian Institute of Technology Guwahati³

01 Oct 2020-pp 239-246

TL;DR: Wang et al. as mentioned in this paper proposed a congestion management technique in the LLC that equips the NoC router with small storage to keep a copy of heavily shared cache blocks, and also propose a prediction classifier in LLC controller.

read less

Abstract: Multiple cores in a tiled multi-core processor are connected using a network-on-chip mechanism. All these cores share the last-level cache (LLC). For large-sized LLCs, generally, non-uniform cache architecture design is considered, where the LLC is split into multiple slices. Accessing highly shared cache blocks from an LLC slice by several cores simultaneously results in congestion at the LLC, which in turn increases the access latency. To deal with this issue, we propose a congestion management technique in the LLC that equips the NoC router with small storage to keep a copy of heavily shared cache blocks. To identify highly shared cache blocks, we also propose a prediction classifier in the LLC controller. We implement our technique in Sniper, an architectural simulator for multi-core systems, and evaluate its effectiveness by running a set of parallel benchmarks. Our experimental results show that the proposed technique is effective in reducing the LLC access time.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

NCDE: In-Network Caching for Directory Entries to Expedite Data Access in Tiled-Chip Multiprocessors

[...]

Jae Eun Shim, Mingu Kang, Tae Hee Han

IEEE Access

TL;DR: In this paper , the authors explore the opportunity of mitigating problems associated with shared data access via in-network caching for directory entries (NCDE), which can utilize every input port's virtual channels to hold directory entries.

...read moreread less

Abstract: The processing of data-intensive applications, followed by an unprecedented amount of data traffic, drives explosive accesses to the memory subsystem. The overloaded memory subsystem experiences increased data access latency. To expedite data access, a network caching technique that leverages network-on-chip (NoC) virtual channels (VCs) as an expanded memory subsystem has emerged. Previous network caching studies focused on utilizing VCs on the NoC’s local input port as a victim cache to reduce local data access latency. In contrast to previous studies, we explore the opportunity of mitigating problems associated with shared data access via in-network caching for directory entries (NCDE), which can utilize every input port’s VCs to hold directory entries. NCDE exploits VCs as the victim and prefetch buffers of the directory entries, each reducing directory eviction-induced invalidations and simplifying the cache-to-cache (C2C) data transfer. The effectiveness of NCDE was evaluated using a gem5 full-system simulator, and the results show that the average memory access time (AMAT) and workload execution time were reduced by 7.69% and 5.82%, respectively. As a cost for accelerating the data access latency, implementing NCDE incurs a negligible router area overhead of 1.56%.

...read moreread less

Journal Article•DOI•

NCDE: In-Network Caching for Directory Entries to Expedite Data Access in Tiled-Chip Multiprocessors

[...]

01 Jan 2023-IEEE Access

TL;DR: In this paper , the authors explore the opportunity of mitigating problems associated with shared data access via in-network caching for directory entries (NCDE), which can utilize every input port's virtual channels to hold directory entries.

...read moreread less

Abstract: The processing of data-intensive applications, followed by an unprecedented amount of data traffic, drives explosive accesses to the memory subsystem. The overloaded memory subsystem experiences increased data access latency. To expedite data access, a network caching technique that leverages network-on-chip (NoC) virtual channels (VCs) as an expanded memory subsystem has emerged. Previous network caching studies focused on utilizing VCs on the NoC’s local input port as a victim cache to reduce local data access latency. In contrast to previous studies, we explore the opportunity of mitigating problems associated with shared data access via in-network caching for directory entries (NCDE), which can utilize every input port’s VCs to hold directory entries. NCDE exploits VCs as the victim and prefetch buffers of the directory entries, each reducing directory eviction-induced invalidations and simplifying the cache-to-cache (C2C) data transfer. The effectiveness of NCDE was evaluated using a gem5 full-system simulator, and the results show that the average memory access time (AMAT) and workload execution time were reduced by 7.69% and 5.82%, respectively. As a cost for accelerating the data access latency, implementing NCDE incurs a negligible router area overhead of 1.56%.

...read moreread less

Router Buffer Caching for Managing Shared Cache Blocks in Tiled Multi-Core Processors

Citations

References

Related Papers (5)