scispace - formally typeset
Journal ArticleDOI

Optimization of Intercache Traffic Entanglement in Tagless Caches With Tiling Opportunities

TLDR
New replacement policies and energy-friendly mechanisms for tagless LLCs, such as restricted block caching and victim tag buffer caching, are proposed to incorporate L4 eviction costs into L3 replacement decisions efficiently and to address entanglement overheads and pathologies.
Abstract
So-called “tagless” caches have become common as a means to deal with the vast L4 last-level caches (LLCs) enabled by increasing device density, emerging memory technologies, and advanced integration capabilities (e.g., 3-D). Tagless schemes often result in intercache entanglement between tagless cache (L4) and the cache (L3) stewarding its metadata. We explore new cache organization policies that mitigate overheads stemming from the intercache-level replacement entanglement. We incorporate support for explicit tiling shapes that can better match software access patterns to improve the spatial and temporal locality of large block allocations in many essential computational kernels. To address entanglement overheads and pathologies, we propose new replacement policies and energy-friendly mechanisms for tagless LLCs, such as restricted block caching (RBC) and victim tag buffer caching (VBC) to incorporate L4 eviction costs into L3 replacement decisions efficiently. We evaluate our schemes on a range of linear algebra kernels that are software tiled. RBC and VBC demonstrate a reduction in memory traffic of 83/4.4/67% and 69/35.5/76% for 8/32/64 MB L4s, respectively. Besides, RBC and VBC provide speedups of 16/0.3/0.6% and 15.7/1.8/0.8%, respectively, for systems with 8/32/64 MB L4, over a tagless cache with an LRU policy in the L3. We also show that matching the shape of the hardware allocation for each tagless region superblocks to the access order of the software tile improves latency by 13.4% over the baseline tagless cache with reductions in memory traffic of 51% over linear superblocks.

read more

Citations
More filters
Proceedings ArticleDOI

Trends and Opportunities for SRAM Based In-Memory and Near-Memory Computation

TL;DR: In this article, an I-NMC accelerator is proposed for Sparse Matrix Multiplication (SMM) which can speed up index handling by 10x-60x and 10x -70x energy efficiency based on the workload dimensions.
References
More filters
Proceedings ArticleDOI

A fully associative, tagless DRAM cache

TL;DR: By completely eliminating data structures for cache tag management, from either on-die SRAM or inpackage DRAM, the proposed DRAM cache achieves best scalability and hit latency, while maintaining high hit rate of a fully associative cache.
Proceedings ArticleDOI

A low-power phase change memory based hybrid cache architecture

TL;DR: The experimental results show that the PRAM based cache architectures achieve close to 80% reduction in the leakage energy consumption of a L1-L2 cache hierarchy.
Proceedings ArticleDOI

Adaptive Cache Bypassing for Inclusive Last Level Caches

TL;DR: The key insight is that the lifetime of a bypassed line, assuming a well-designed bypassing algorithm, should be short in upper level caches and is most likely dead when its tag is evicted from the bypass buffer.
Proceedings ArticleDOI

Low power data-aware STT-RAM based hybrid cache architecture

TL;DR: This paper proposes a novel data-aware hybrid STT-RAM/SRAM cache architecture which stores data in the two partitions based on their bit counts and employs an asymmetric low-power 5T-SRAM structure which has high reliability for majority `one' data.
Proceedings ArticleDOI

Efficient footprint caching for Tagless DRAM Caches

TL;DR: TDC opens up unique opportunities to realize efficient footprint caching with higher prediction accuracy and a lower hardware cost than the original footprint caching scheme, and the resulting design, called Footprint-augmented Tagless DRAM Cache (F-TDC), significantly improves the bandwidth efficiency of TDC, and hence its performance and energy efficiency.
Related Papers (5)