Cache Exclusivity and Sharing: Theory and Optimization

A new metric called the victim footprint (VFP) is presented, measured once per program in its solo execution and can be combined to compute the performance of any exclusive cache hierarchy, replacing parallel testing with theoretical analysis.

Abstract:

A problem on multicore systems is cache sharing, where the cache occupancy of a program depends on the cache usage of peer programs. Exclusive cache hierarchy as used on AMD processors is an effective solution to allow processor cores to have a large private cache while still benefitting from shared cache. The shared cache stores the “victims” (i.e., data evicted from private caches). The performance depends on how victims of co-run programs interact in shared cache.This article presents a new metric called the victim footprint (VFP). It is measured once per program in its solo execution and can then be combined to compute the performance of any exclusive cache hierarchy, replacing parallel testing with theoretical analysis. The work evaluates the VFP by using it to analyze cache sharing by parallel mixes of sequential programs, comparing the accuracy of the theory to hardware counter results, and measuring the benefit of exclusivity-aware analysis and optimization.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

DCAPS: dynamic cache allocation with partial sharing

Xiang Yaocheng,Xiaolin Wang,Zihui Huang,Zeyu Wang,Yingwei Luo,Zhenlin Wang +5 morePeking University,Michigan Technological University

Show Less

TL;DR: This paper proposes Dynamic Cache Allocation with Partial Sharing (DCAPS), a framework that dynamically monitors and predicts a multi-programmed workload's cache demand, and reallocates LLC given a performance target and is able to optimize for a wide range of performance targets and can scale to a large core count.

...read moreread less

Proceedings ArticleDOI

Locality analysis through static parallel sampling

Dong Chen,Fangzhou Liu,Chen Ding,Sreepathi Pai +3 moreUniversity of Rochester

Show Less

TL;DR: A new approach to locality analysis based on static parallel sampling that can predict precise cache line granularity miss ratio curves for complex loops with non-linear array references and even branches is described.

...read moreread less

Journal ArticleDOI

A Relational Theory of Locality

Liang Yuan,Chen Ding,Wesley Smith,Peter J. Denning,Yunquan Zhang +4 moreChinese Academy of Sciences,University of Rochester,University of Edinburgh,Naval Postgraduate School

- 20 Aug 2019 -

ACM Transactions on Architecture and Cod...

Show Less

TL;DR: This article categorizes locality definitions in three groups and shows whether and how they can be interconverted, and gives a new measurement algorithm that is asymptotically more time/space efficient than previous approaches.

...read moreread less

Book ChapterDOI

HiFlipVX: An Open Source High-Level Synthesis FPGA Library for Image Processing

Lester Kalms,Ariel Podlubne,Diana Gohringer +2 moreDresden University of Technology

Show Less

TL;DR: This work presents a highly optimized, parametrizable and streaming capable HLS open-source library for FPGAs called HiFlipVX that achieves an efficient resource utilization and a significant scalability, also in comparison to the reference (xfOpenCV), as shown in the evaluation.

...read moreread less

Journal ArticleDOI

Working Set Analytics

Peter J. DenningNaval Postgraduate School

- 02 Feb 2021 -

ACM Computing Surveys

Show Less

TL;DR: This tutorial traces the development of working set theory from its origins to the present day, and presents the powerful, linear-time algorithms for computing working set statistics and applying them to the design of memory systems.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Pin: building customized program analysis tools with dynamic instrumentation

Chi-Keung Luk,Robert Cohn,Robert Muth,Harish Patil,Artur Klauser,Geoff Lowney,Steven Wallace,Vijay Janapa Reddi,Kim Hazelwood +8 moreIntel,University of Colorado Boulder

Show Less

TL;DR: The goals are to provide easy-to-use, portable, transparent, and efficient instrumentation, and to illustrate Pin's versatility, two Pintools in daily use to analyze production software are described.

...read moreread less

Proceedings ArticleDOI

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

Norman P. Jouppi

Show Less

TL;DR: In this article, a hardware technique to improve the performance of caches is presented, where a small fully-associative cache between a cache and its refill path is used to place prefetched data and not in the cache.

...read moreread less

Journal ArticleDOI

Evaluation techniques for storage hierarchies

R. L. Mattson,J. Gecsei,D. R. Slutz,I. L. Traiger +3 more

- 01 Jun 1970 -

Ibm Systems Journal

Show Less

TL;DR: A new and efficient method of determining, in one pass of an address trace, performance measures for a large class of demand-paged, multilevel storage systems utilizing a variety of mapping schemes and replacement algorithms.

...read moreread less

Journal ArticleDOI

The working set model for program behavior

Peter J. DenningMassachusetts Institute of Technology

- 01 May 1968 -

Communications of The ACM

Show Less

TL;DR: A new model, the “working set model,” is developed, defined to be the collection of its most recently used pages, which provides knowledge vital to the dynamic management of paged memories.

...read moreread less

Proceedings ArticleDOI

Adaptive insertion policies for high performance caching

Moinuddin K. Qureshi,Aamer Jaleel,Yale N. Patt,Simon C. Steely,Joel Emer +4 moreUniversity of Texas at Austin,Intel

Show Less

TL;DR: A Dynamic Insertion Policy (DIP) is proposed to choose between BIP and the traditional LRU policy depending on which policy incurs fewer misses, and shows that DIP reduces the average MPKI of the baseline 1MB 16-way L2 cache by 21%, bridging two-thirds of the gap between LRU and OPT.

...read moreread less