Proceedings ArticleDOI
Performance evaluation of exclusive cache hierarchies
Ying Zheng,Brian Davis,M. Jordan +2 more
- pp 89-96
TLDR
The results of two-level cache memory simulations are presented and the impact of exclusive caching on system performance is examined and it is indicated that significant performance advantages can be gained for some benchmark through the use of an exclusive organization.Abstract:
Memory hierarchy performance, specifically cache memory capacity, is a constraining factor in the performance of modern computers. This paper presents the results of two-level cache memory simulations and examines the impact of exclusive caching on system performance. Exclusive caching enables higher capacity with the same cache area by eliminating redundant copies. The experiments presented compare an exclusive cache hierarchy with an inclusive cache hierarchy utilizing similar L1 and L2 parameters. Experiments indicate that significant performance advantages can be gained for some benchmark through the use of an exclusive organization. The performance differences are illustrated using the L2 cache misses and execution time metrics. The most significant improvement shown is a 16% reduction in execution time, with an average reduction of 8% for the smallest cache configuration tested. With equal size victim buffer and victim cache for exclusive and inclusive cache hierarchies respectively, some benchmarks show increased execution time for exclusive caches because a victim cache can reduce conflict misses significantly while a victim buffer can introduce worst-case penalties. Considering the inconsistent performance improvement, the increased complexity of an exclusive cache hierarchy needs to be justified based upon the specifics of the application and system.read more
Citations
More filters
Proceedings ArticleDOI
Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA) Cache Management Policies
TL;DR: This work proposes Temporal Locality Aware (TLA) cache management policies to allow an inclusive LLC to be aware of the temporal locality of lines in the core caches and shows that these policies improve inclusive cache performance without requiring any additional hardware structures.
Proceedings ArticleDOI
Bypass and insertion algorithms for exclusive last-level caches
TL;DR: Detailed execution-driven simulation results show that a combination of the best insertion and bypass policies delivers an improvement of up to 61.2% and on average 3.5% in terms of instructions retired per cycle for single-threaded dynamic instruction traces running on a 2 MB 16-way exclusive LLC compared to a baseline exclusive design in the presence of well-tuned multi-stream hardware prefetchers.
Proceedings ArticleDOI
High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches
TL;DR: This paper investigates increasing the size of smaller private caches in the hierarchy as opposed to increasing the shared LLC to improve average cache access latency for workloads whose working set fits into the larger private cache while retaining the benefits of a shared LLC.
Journal ArticleDOI
FLEXclusion: balancing cache capacity and on-chip bandwidth via flexible exclusion
TL;DR: FLEXclusion is proposed, a design that dynamically selects between exclusion and non-inclusion depending on workload behavior and reduces the on-chip LLC insertion traffic by 72.6% and improves performance by 5.9% when implemented with negligible hardware changes.
Non-Inclusion Property in Multi-level Caches Revisited
TL;DR: This paper argues that the inclusion property, a prime candidate for simplifying memory coherence protocols in multiprocessor systems, makes inefficient use of cache memory real estate on the chip due to duplication of data on multiple levels of cache.
References
More filters
Book
Computer Architecture: A Quantitative Approach
TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.
Journal ArticleDOI
Cache Memories
TL;DR: Specific aspects of cache memories investigated include: the cache fetch algorithm (demand versus prefetch), the placement and replacement algorithms, line size, store-through versus copy-back updating of main memory, cold-start versus warm-start miss ratios, mulhcache consistency, the effect of input /output through the cache, the behavior of split data/instruction caches, and cache size.
Book
Parallel Computer Architecture: A Hardware/Software Approach
TL;DR: This book explains the forces behind this convergence of shared-memory, message-passing, data parallel, and data-driven computing architectures and provides comprehensive discussions of parallel programming for high performance and of workload-driven evaluation, based on understanding hardware-software interactions.
Proceedings ArticleDOI
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
TL;DR: In this article, a hardware technique to improve the performance of caches is presented, where a small fully-associative cache between a cache and its refill path is used to place prefetched data and not in the cache.
ReportDOI
My Cache or Yours? Making Storage More Exclusive
Theodore M. Wong,John Wilkes +1 more
TL;DR: In this article, the authors explore the benefits of a simple scheme to achieve exclusive caching, in which a data block is cached at either a client or the disk array, but not both.