scispace - formally typeset
Open AccessProceedings ArticleDOI

Spatial Locality-Aware Cache Partitioning for Effective Cache Sharing

TLDR
This work highlights that exploiting spatial locality enables much more effective cache sharing and proposes a simple yet effective mechanism to measure both spatial and temporal locality at run-time, which significantly outperforms the existing approaches.
Abstract
In modern multi-core processors, last-level caches (LLCs) are typically shared among multiple cores. Previous works have shown that such sharing is beneficial as different workloads have different needs for cache capacity, and logical partitioning of capacity can improve system performance. However, what is missing in previous works on partitioning shared LLCs is that the heterogeneity in spatial locality among workloads has not been explored. In other words, all the cores use the same block/line size in shared LLCs. In this work, we highlight that exploiting spatial locality enables much more effective cache sharing. The fundamental reason is that for many memory intensive workloads, their cache capacity requirements can be drastically reduced when a large block size is employed, therefore they can effectively donate more capacity to other workloads. To leverage spatial locality for cache partitioning effectively, we first propose a simple yet effective mechanism to measure both spatial and temporal locality at run-time. The locality information is then used to determine both the proper block size and the capacity assigned to each workload. Our experiments show that our Spatial Locality-aware Cache Partitioning (SLCP) significantly outperforms the previous works. We also present several case studies that dissect the effectiveness of SLCP compared to the existing approaches.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Application Clustering Policies to Address System Fairness with Intel’s Cache Allocation Technology

TL;DR: This paper proposes a family of clustering-based cache partitioning policies to address fairness in systems that feature Intel’s CAT, a hardware Cache Allocation Technology (CAT) mechanism that can be controlled from userspace software and that allows to create partitions in the LLC and assign different groups of applications to them.
Journal ArticleDOI

A Survey of Techniques for Cache Partitioning in Multicore Processors

TL;DR: This article presents a survey of techniques for partitioning shared caches in multicore processors, categorize the techniques based on important characteristics and provides a bird’s eye view of the field of cache partitioning.
Journal ArticleDOI

A Software Cache Partitioning System for Hash-Based Caches

TL;DR: This article extends page coloring to work on recent multicore architectures by proposing a mechanism able to handle their hash-based LLC addressing scheme, and implements this mechanism in the Linux kernel.
Journal ArticleDOI

Exploring Energy-Efficient Cache Design in Emerging Mobile Platforms

TL;DR: This article proposes to dynamically partition the L2 cache into the user and kernel segments to minimize overall cache size and integrates the short-retention STT-RAM into this dynamic partition-based cache design for maximal energy savings.
Journal ArticleDOI

Cache memory loaclity optimization for implementation of computer vision and image processing algorithms

TL;DR: The proposed optimization is applied on a set of image processing operations such as image intensity transformation, image filtering, geometric transformation, and CNN to enhance performance by increasing the cache memory utilization.
References
More filters
Proceedings ArticleDOI

The SPLASH-2 programs: characterization and methodological considerations

TL;DR: This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality.
Journal ArticleDOI

The SimpleScalar tool set, version 2.0

TL;DR: This document describes release 2.0 of the SimpleScalar tool set, a suite of free, publicly available simulation tools that offer both detailed and high-performance simulation of modern microprocessors.
Proceedings ArticleDOI

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.
Proceedings ArticleDOI

Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

TL;DR: In this article, the authors propose a low-overhead, runtime mechanism that partitions a shared cache between multiple applications depending on the reduction in cache misses that each application is likely to obtain for a given amount of cache resources.
Journal ArticleDOI

DRAMSim2: A Cycle Accurate Memory System Simulator

TL;DR: The process of validating DRAMSim2 timing against manufacturer Verilog models in an effort to prove the accuracy of simulation results is described.