Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses
read more
Citations
Profiling Methods for Memory Centric Software Performance Analysis
Probabilistic Directed Writebacks for Exclusive Caches
A Novel Online Measure of Cache Utility Efficiency in Chip Multiprocessor
A Software Technique for Reducing Cache Pollution
References
Evaluation techniques for storage hierarchies
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches
Adaptive insertion policies for high performance caching
High performance cache replacement using re-reference interval prediction (RRIP)
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
Related Papers (5)
Frequently Asked Questions (10)
Q2. What are the future works mentioned in the paper "Reducing cache pollution through detection and elimination of non-temporal memory accesses" ?
Future work will explore other hardware mechanism for handling non-temporal data hints from software and possible applications in scheduling.
Q3. How did the authors measure the cycles and instruction counts?
The authors used the performance counters in the processor to measure the cycles and instruction counts using the perf framework provided by recent Linux kernels.
Q4. What is the implicit assumption that caches can be modeled to be?
Since the authors are using StatStack the authors have made the implicit assumption that caches can be modeled to be fully associative, i.e. conflict misses are insignificant.
Q5. What is the reason for the speedup when running with victims?
The speedup when running with applications from the two victim categories can largely be attributed to a reduction in the total bandwidth requirement of the mix.
Q6. What is the way to manage the cache for these applications?
Managing the cache for these applications is likely to improve throughput, both when they are running in isolation and in a mix with other applications.
Q7. What is the main advantage of using a non-temporal instruction to bypass the entire cache?
Most hardware implementations of cache management instructions allow the non-temporal data to live in parts of the cache hierarchy, such as the L1, before it is evicted to memory.
Q8. What is the stack distance distribution for LRU caches?
the stack distance distribution enables the application’s miss ratio to be computed for any given cache size, by simply computing the fraction of memory accesses with a stack distances greater than the desired cache size.
Q9. How can the authors reclassify applications based on their replacement ratios?
Using a modified StatStack implementation the authors can reclassify applications based on their replacement ratios after applying cache management, this allows us to reason about how cache management impacts performance.
Q10. How can the authors determine if the next access to the data used by the instruction is a?
By looking at the forward stack distances of an instruction the authors can easily determine if the next access to the data used by that instruction will be a cache miss, i.e. the instruction is non-temporal.