scispace - formally typeset
Proceedings ArticleDOI

Performance evaluation of exclusive cache hierarchies

TLDR
The results of two-level cache memory simulations are presented and the impact of exclusive caching on system performance is examined and it is indicated that significant performance advantages can be gained for some benchmark through the use of an exclusive organization.
Abstract
Memory hierarchy performance, specifically cache memory capacity, is a constraining factor in the performance of modern computers. This paper presents the results of two-level cache memory simulations and examines the impact of exclusive caching on system performance. Exclusive caching enables higher capacity with the same cache area by eliminating redundant copies. The experiments presented compare an exclusive cache hierarchy with an inclusive cache hierarchy utilizing similar L1 and L2 parameters. Experiments indicate that significant performance advantages can be gained for some benchmark through the use of an exclusive organization. The performance differences are illustrated using the L2 cache misses and execution time metrics. The most significant improvement shown is a 16% reduction in execution time, with an average reduction of 8% for the smallest cache configuration tested. With equal size victim buffer and victim cache for exclusive and inclusive cache hierarchies respectively, some benchmarks show increased execution time for exclusive caches because a victim cache can reduce conflict misses significantly while a victim buffer can introduce worst-case penalties. Considering the inconsistent performance improvement, the increased complexity of an exclusive cache hierarchy needs to be justified based upon the specifics of the application and system.

read more

Citations
More filters
Proceedings ArticleDOI

Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA) Cache Management Policies

TL;DR: This work proposes Temporal Locality Aware (TLA) cache management policies to allow an inclusive LLC to be aware of the temporal locality of lines in the core caches and shows that these policies improve inclusive cache performance without requiring any additional hardware structures.
Proceedings ArticleDOI

Bypass and insertion algorithms for exclusive last-level caches

TL;DR: Detailed execution-driven simulation results show that a combination of the best insertion and bypass policies delivers an improvement of up to 61.2% and on average 3.5% in terms of instructions retired per cycle for single-threaded dynamic instruction traces running on a 2 MB 16-way exclusive LLC compared to a baseline exclusive design in the presence of well-tuned multi-stream hardware prefetchers.
Proceedings ArticleDOI

High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches

TL;DR: This paper investigates increasing the size of smaller private caches in the hierarchy as opposed to increasing the shared LLC to improve average cache access latency for workloads whose working set fits into the larger private cache while retaining the benefits of a shared LLC.
Journal ArticleDOI

FLEXclusion: balancing cache capacity and on-chip bandwidth via flexible exclusion

TL;DR: FLEXclusion is proposed, a design that dynamically selects between exclusion and non-inclusion depending on workload behavior and reduces the on-chip LLC insertion traffic by 72.6% and improves performance by 5.9% when implemented with negligible hardware changes.

Non-Inclusion Property in Multi-level Caches Revisited

TL;DR: This paper argues that the inclusion property, a prime candidate for simplifying memory coherence protocols in multiprocessor systems, makes inefficient use of cache memory real estate on the chip due to duplication of data on multiple levels of cache.
References
More filters
Book

Computer Architecture: A Quantitative Approach

TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.
Journal ArticleDOI

Cache Memories

TL;DR: Specific aspects of cache memories investigated include: the cache fetch algorithm (demand versus prefetch), the placement and replacement algorithms, line size, store-through versus copy-back updating of main memory, cold-start versus warm-start miss ratios, mulhcache consistency, the effect of input /output through the cache, the behavior of split data/instruction caches, and cache size.
Book

Parallel Computer Architecture: A Hardware/Software Approach

TL;DR: This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures and provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions.
Proceedings ArticleDOI

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

TL;DR: In this article, a hardware technique to improve the performance of caches is presented, where a small fully-associative cache between a cache and its refill path is used to place prefetched data and not in the cache.
ReportDOI

My Cache or Yours? Making Storage More Exclusive

TL;DR: In this article, the authors explore the benefits of a simple scheme to achieve exclusive caching, in which a data block is cached at either a client or the disk array, but not both.
Related Papers (5)