Router Buffer Caching for Managing Shared Cache Blocks in Tiled Multi-Core Processors

doi:10.1109/ICCD50377.2020.00050

Proceedings ArticleDOI

Router Buffer Caching for Managing Shared Cache Blocks in Tiled Multi-Core Processors

- pp 239-246

TLDR

Wang et al. as mentioned in this paper proposed a congestion management technique in the LLC that equips the NoC router with small storage to keep a copy of heavily shared cache blocks, and also propose a prediction classifier in LLC controller.

Abstract:

Multiple cores in a tiled multi-core processor are connected using a network-on-chip mechanism. All these cores share the last-level cache (LLC). For large-sized LLCs, generally, non-uniform cache architecture design is considered, where the LLC is split into multiple slices. Accessing highly shared cache blocks from an LLC slice by several cores simultaneously results in congestion at the LLC, which in turn increases the access latency. To deal with this issue, we propose a congestion management technique in the LLC that equips the NoC router with small storage to keep a copy of heavily shared cache blocks. To identify highly shared cache blocks, we also propose a prediction classifier in the LLC controller. We implement our technique in Sniper, an architectural simulator for multi-core systems, and evaluate its effectiveness by running a set of parallel benchmarks. Our experimental results show that the proposed technique is effective in reducing the LLC access time.

Router Buffer Caching for Managing Shared Cache Blocks in Tiled Multi-Core Processors

Citations

NCDE: In-Network Caching for Directory Entries to Expedite Data Access in Tiled-Chip Multiprocessors

NCDE: In-Network Caching for Directory Entries to Expedite Data Access in Tiled-Chip Multiprocessors

References

Network caching for Chip Multiprocessors

In-Network Caching for Chip Multiprocessors

Network Victim Cache: Leveraging Network-on-Chip for Managing Shared Caches in Chip Multiprocessors

Related Papers (5)

Dynamic cache management in multi-core architectures through run-time adaptation

A flexible data to L2 cache mapping approach for future multicore processors

ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors

Make the Most out of Last Level Cache in Intel Processors

Dynamic cache clustering for chip multiprocessors