Journal ArticleDOI
Optimization of Intercache Traffic Entanglement in Tagless Caches With Tiling Opportunities
S. R. Swamy Saranam Chongala,Sumitha George,Hariram Thirucherai Govindarajan,Jagadish B. Kotra,Madhu Mutyam,John Sampson,Mahmut Kandemir,Vijaykrishnan Narayanan +7 more
TLDR
New replacement policies and energy-friendly mechanisms for tagless LLCs, such as restricted block caching and victim tag buffer caching, are proposed to incorporate L4 eviction costs into L3 replacement decisions efficiently and to address entanglement overheads and pathologies.Abstract:
So-called “tagless” caches have become common as a means to deal with the vast L4 last-level caches (LLCs) enabled by increasing device density, emerging memory technologies, and advanced integration capabilities (e.g., 3-D). Tagless schemes often result in intercache entanglement between tagless cache (L4) and the cache (L3) stewarding its metadata. We explore new cache organization policies that mitigate overheads stemming from the intercache-level replacement entanglement. We incorporate support for explicit tiling shapes that can better match software access patterns to improve the spatial and temporal locality of large block allocations in many essential computational kernels. To address entanglement overheads and pathologies, we propose new replacement policies and energy-friendly mechanisms for tagless LLCs, such as restricted block caching (RBC) and victim tag buffer caching (VBC) to incorporate L4 eviction costs into L3 replacement decisions efficiently. We evaluate our schemes on a range of linear algebra kernels that are software tiled. RBC and VBC demonstrate a reduction in memory traffic of 83/4.4/67% and 69/35.5/76% for 8/32/64 MB L4s, respectively. Besides, RBC and VBC provide speedups of 16/0.3/0.6% and 15.7/1.8/0.8%, respectively, for systems with 8/32/64 MB L4, over a tagless cache with an LRU policy in the L3. We also show that matching the shape of the hardware allocation for each tagless region superblocks to the access order of the software tile improves latency by 13.4% over the baseline tagless cache with reductions in memory traffic of 51% over linear superblocks.read more
Citations
More filters
Proceedings ArticleDOI
Trends and Opportunities for SRAM Based In-Memory and Near-Memory Computation
Srivatsa Srinivasa,Akshay Krishna Ramanathan,Jainaveen Sundaram,Kurian Dileep J,Srinivasan Gopal,Nilesh Jain,Anuradha Srinivasan,Ravi Iyer,Vijaykrishnan Narayanan,Tanay Karnik +9 more
TL;DR: In this article, an I-NMC accelerator is proposed for Sparse Matrix Multiplication (SMM) which can speed up index handling by 10x-60x and 10x -70x energy efficiency based on the workload dimensions.
References
More filters
Proceedings ArticleDOI
A Survey of Trends in Non-Volatile Memory Technologies: 2000-2014
Kosuke Suzuki,Steven Swanson +1 more
TL;DR: A survey of non-volatile memory technology papers published between 2000 and 2014 in leading journals and conference proceedings in the area of integrated circuit design and semiconductor devices is presented.
Journal ArticleDOI
A Survey of Cache Bypassing Techniques
TL;DR: This paper presents a survey of cache bypassing techniques for CPUs, GPUs and CPU-GPU heterogeneous systems, and for caches designed with SRAM, non-volatile memory (NVM) and die-stacked DRAM, and underscores their differences and similarities.
Journal ArticleDOI
IBM zEnterprise 196 microprocessor and cache subsystem
Fadi Y. Busaba,Michael A. Blake,Brian W. Curran,Michael Fee,C. Jacobi,Pak-Kin Mak,Brian R. Prasky,Craig R. Walters +7 more
TL;DR: The IBM zEnterprise® 196 (z196) system, announced in the second quarter of 2010, is the latest generation of the IBM System z® mainframe, designed with a new microprocessor and memory subsystems, which distinguishes it from its z10® predecessor.
The Span Cache: Software Controlled Tag Checks and Cache Line Size
TL;DR: The span cache is a hardware-software design for a new kind of energy-efficient microprocessor data cache which has two key features, direct addressing and software controlled line size, which lets the compiler specify how much data to fetch on a miss, allowing greater cache utilization and reducing memory bandwidth requirements.
Proceedings ArticleDOI
MDACache: caching for multi-dimensional-access memories
Sumitha George,Minli Julie Liao,Huaipan Jiang,Jagadish B. Kotra,Mahmut Kandemir,Jack Sampson,Vijaykrishnan Narayanan +6 more
TL;DR: A taxonomy for different ways of connecting row and column preferences at the application level to an Mda memory through an MDA cache hierarchy is described and explored and the sensitivity of these benefits as a function of the working-set to cache capacity ratio as well as to MDA technology assumptions are explored.