Spatial Locality-Aware Cache Partitioning for Effective Cache Sharing

doi:10.1109/ICPP.2015.24

Open AccessProceedings ArticleDOI

Spatial Locality-Aware Cache Partitioning for Effective Cache Sharing

- pp 150-159

TLDR

This work highlights that exploiting spatial locality enables much more effective cache sharing and proposes a simple yet effective mechanism to measure both spatial and temporal locality at run-time, which significantly outperforms the existing approaches.

Abstract:

In modern multi-core processors, last-level caches (LLCs) are typically shared among multiple cores. Previous works have shown that such sharing is beneficial as different workloads have different needs for cache capacity, and logical partitioning of capacity can improve system performance. However, what is missing in previous works on partitioning shared LLCs is that the heterogeneity in spatial locality among workloads has not been explored. In other words, all the cores use the same block/line size in shared LLCs. In this work, we highlight that exploiting spatial locality enables much more effective cache sharing. The fundamental reason is that for many memory intensive workloads, their cache capacity requirements can be drastically reduced when a large block size is employed, therefore they can effectively donate more capacity to other workloads. To leverage spatial locality for cache partitioning effectively, we first propose a simple yet effective mechanism to measure both spatial and temporal locality at run-time. The locality information is then used to determine both the proper block size and the capacity assigned to each workload. Our experiments show that our Spatial Locality-aware Cache Partitioning (SLCP) significantly outperforms the previous works. We also present several case studies that dissect the effectiveness of SLCP compared to the existing approaches.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Application Clustering Policies to Address System Fairness with Intel’s Cache Allocation Technology

Vicent Selfa, +4 more

TL;DR: This paper proposes a family of clustering-based cache partitioning policies to address fairness in systems that feature Intel’s CAT, a hardware Cache Allocation Technology (CAT) mechanism that can be controlled from userspace software and that allows to create partitions in the LLC and assign different groups of applications to them.

...read moreread less

Journal ArticleDOI

A Survey of Techniques for Cache Partitioning in Multicore Processors

Sparsh Mittal

- 10 May 2017 -

ACM Computing Surveys

TL;DR: This article presents a survey of techniques for partitioning shared caches in multicore processors, categorize the techniques based on important characteristics and provides a bird’s eye view of the field of cache partitioning.

...read moreread less

Journal ArticleDOI

A Software Cache Partitioning System for Hash-Based Caches

Alberto Scolari, +2 more

- 16 Dec 2016 -

ACM Transactions on Architecture and Cod...

TL;DR: This article extends page coloring to work on recent multicore architectures by proposing a mechanism able to handle their hash-based LLC addressing scheme, and implements this mechanism in the Linux kernel.

...read moreread less

Journal ArticleDOI

Exploring Energy-Efficient Cache Design in Emerging Mobile Platforms

Kaige Yan, +3 more

- 20 Jul 2017 -

ACM Transactions on Design Automation of...

TL;DR: This article proposes to dynamically partition the L2 cache into the user and kernel segments to minimize overall cache size and integrates the short-retention STT-RAM into this dynamic partition-based cache design for maximal energy savings.

...read moreread less

Journal ArticleDOI

Cache memory loaclity optimization for implementation of computer vision and image processing algorithms

A Al-Marakeby

TL;DR: The proposed optimization is applied on a set of image processing operations such as image intensity transformation, image filtering, geometric transformation, and CNN to enhance performance by increasing the cache memory utilization.

...read moreread less

References

PDF

Open Access

More filters

Proceedings ArticleDOI

The SPLASH-2 programs: characterization and methodological considerations

Steven Cameron Woo, +4 more

TL;DR: This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality.

...read moreread less

Journal ArticleDOI

The SimpleScalar tool set, version 2.0

Doug Burger, +1 more

- 01 Jun 1997 -

ACM Sigarch Computer Architecture News

TL;DR: This document describes release 2.0 of the SimpleScalar tool set, a suite of free, publicly available simulation tools that offer both detailed and high-performance simulation of modern microprocessors.

...read moreread less

Proceedings ArticleDOI

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Sheng Li, +5 more

TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.

...read moreread less

Proceedings ArticleDOI

Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Moinuddin K. Qureshi, +1 more

TL;DR: In this article, the authors propose a low-overhead, runtime mechanism that partitions a shared cache between multiple applications depending on the reduction in cache misses that each application is likely to obtain for a given amount of cache resources.

...read moreread less

Journal ArticleDOI

DRAMSim2: A Cycle Accurate Memory System Simulator

Paul Rosenfeld, +2 more

- 01 Jan 2011 -

IEEE Computer Architecture Letters

TL;DR: The process of validating DRAMSim2 timing against manufacturer Verilog models in an effort to prove the accuracy of simulation results is described.

...read moreread less

Collapse

Spatial Locality-Aware Cache Partitioning for Effective Cache Sharing

Citations

Application Clustering Policies to Address System Fairness with Intel’s Cache Allocation Technology

A Survey of Techniques for Cache Partitioning in Multicore Processors

A Software Cache Partitioning System for Hash-Based Caches

Exploring Energy-Efficient Cache Design in Emerging Mobile Platforms

Cache memory loaclity optimization for implementation of computer vision and image processing algorithms

References

The SPLASH-2 programs: characterization and methodological considerations

The SimpleScalar tool set, version 2.0

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

DRAMSim2: A Cycle Accurate Memory System Simulator

Related Papers (5)

A Survey of Techniques for Cache Partitioning in Multicore Processors

Static locality analysis for cache management

Dynamic cache management in multi-core architectures through run-time adaptation

A new cache architecture concept: the split temporal/spatial cache

ULCC: a user-level facility for optimizing shared cache performance on multicores