Topic

Cache pollution

About: Cache pollution is a research topic. Over the lifetime, 11353 publications have been published within this topic receiving 262139 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Improving Cache Management Policies Using Dynamic Reuse Distances

[...]

Nam Duong¹, Dali Zhao¹, Taesu Kim¹, Rosario Cammarota¹, Mateo Valero, Alexander V. Veidenbaum¹ - Show less +2 more•Institutions (1)

University of California, Irvine¹

01 Dec 2012

TL;DR: A new way to use dynamic reuse distances to further improve cache management policies is proposed which prevents replacing a cache line until a certain number of accesses to its cache set, called a Protecting Distance (PD).

...read moreread less

Abstract: Cache management policies such as replacement, bypass, or shared cache partitioning have been relying on data reuse behavior to predict the future. This paper proposes a new way to use dynamic reuse distances to further improve such policies. A new replacement policy is proposed which prevents replacing a cache line until a certain number of accesses to its cache set, called a Protecting Distance (PD). The policy protects a cache line long enough for it to be reused, but not beyond that to avoid cache pollution. This can be combined with a bypass mechanism that also relies on dynamic reuse analysis to bypass lines with less expected reuse. A miss fetch is bypassed if there are no unprotected lines. A hit rate model based on dynamic reuse history is proposed and the PD that maximizes the hit rate is dynamically computed. The PD is recomputed periodically to track a program's memory access behavior and phases. Next, a new multi-core cache partitioning policy is proposed using the concept of protection. It manages lifetimes of lines from different cores (threads) in such a way that the overall hit rate is maximized. The average per-thread lifetime is reduced by decreasing the thread's PD. The single-core PD-based replacement policy with bypass achieves an average speedup of 4.2% over the DIP policy, while the average speedups over DIP are 1.5% for dynamic RRIP (DRRIP) and 1.6% for sampling dead-block prediction (SDP). The 16-core PD-based partitioning policy improves the average weighted IPC by 5.2%, throughput by 6.4% and fairness by 9.9% over thread-aware DRRIP (TA-DRRIP). The required hardware is evaluated and the overhead is shown to be manageable.

...read moreread less

159 citations

Proceedings Article•DOI•

Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite

[...]

Hussein Al-Zoubi¹, Aleksandar Milenkovic¹, Milena Milenkovic¹•Institutions (1)

University of Alabama in Huntsville¹

02 Apr 2004

TL;DR: The results show that the PLRU techniques can approximate and even outperform LRU with much lower complexity, for a wide range of cache organizations, however, a relatively large gap between LRU and optimal replacement policy, of up to 50%, indicates that new research aimed to close the gap is necessary.

...read moreread less

Abstract: Replacement policy, one of the key factors determining the effectiveness of a cache, becomes even more important with latest technological trends toward highly associative caches. The state-of-the-art processors employ various policies such as Random, Least Recently Used (LRU), Round-Robin, and PLRU (Pseudo LRU), indicating that there is no common wisdom about the best one. Optimal yet unattainable policy would replace cache memory block whose next reference is the farthest away in the future, among all memory blocks present in the set.In our quest for replacement policy as close to optimal as possible, we thoroughly explored the design space of existing replacement mechanisms using SimpleScalar toolset and SPEC CPU2000 benchmark suite, across wide range of cache sizes and organizations. In order to better understand the behavior of different policies, we introduced new measures, such as cumulative distribution of cache hits in the LRU stack. We also dynamically monitored the number of cache misses, per each 100000 instructions.Our results show that the PLRU techniques can approximate and even outperform LRU with much lower complexity, for a wide range of cache organizations. However, a relatively large gap between LRU and optimal replacement policy, of up to 50%, indicates that new research aimed to close the gap is necessary. The cumulative distribution of cache hits in the LRU stack indicates a very good potential for way prediction using LRU information, since the percentage of hits to the bottom of the LRU stack is relatively high.

...read moreread less

158 citations

Proceedings Article•DOI•

MRPB: Memory request prioritization for massively parallel processors

[...]

Wenhao Jia¹, Kelly A. Shaw², Margaret Martonosi¹•Institutions (2)

Princeton University¹, University of Richmond²

19 Jun 2014

TL;DR: This paper proposes the memory request prioritization buffer (MRPB), a hardware structure that improves caching efficiency of massively parallel workloads by applying two prioritization methods-request reordering and cache bypassing-to memory requests before they access a cache.

...read moreread less

Abstract: Massively parallel, throughput-oriented systems such as graphics processing units (GPUs) offer high performance for a broad range of programs. They are, however, complex to program, especially because of their intricate memory hierarchies with multiple address spaces. In response, modern GPUs have widely adopted caches, hoping to providing smoother reductions in memory access traffic and latency. Unfortunately, GPU caches often have mixed or unpredictable performance impact due to cache contention that results from the high thread counts in GPUs. We propose the memory request prioritization buffer (MRPB) to ease GPU programming and improve GPU performance. This hardware structure improves caching efficiency of massively parallel workloads by applying two prioritization methods—request reordering and cache bypassing—to memory requests before they access a cache. MRPB then releases requests into the cache in a more cache-friendly order. The result is drastically reduced cache contention and improved use of the limited per-thread cache capacity. For a simulated 16KB L1 cache, MRPB improves the average performance of the entire PolyBench and Rodinia suites by 2.65× and 1.27× respectively, outperforming a state-of-the-art GPU cache management technique.

...read moreread less

158 citations

Journal Article•DOI•

Timekeeping in the memory system: predicting and optimizing memory behavior

[...]

Zhigang Hu¹, Stefanos Kaxiras², Margaret Martonosi¹•Institutions (2)

Princeton University¹, Agere Systems²

01 May 2002

TL;DR: The extent to which detailed timing characteristics of past memory reference events are strongly predictive of future program reference behavior is shown, and a family of time-keeping techniques that optimize behavior based on observations about particular cache time durations, such as the cache access interval or the cache dead time are proposed.

...read moreread less

Abstract: Techniques for analyzing and improving memory referencing behavior continue to be important for achieving good overall program performance due to the ever-increasing performance gap between processors and main memory. This paper offers a fresh perspective on the problem of predicting and optimizing memory behavior. Namely, we show quantitatively the extent to which detailed timing characteristics of past memory reference events are strongly predictive of future program reference behavior. We propose a family of time-keeping techniques that optimize behavior based on observations about particular cache time durations, such as the cache access interval or the cache dead time. Timekeeping techniques can be used to build small simple, and high-accuracy (often 90% or more) predictors for identifying conflict misses, for predicting dead blocks, and even for estimating the time at which the next reference to a cache frame will occur and the address that will be accessed. Based on these predictors, we demonstrate two new and complementary time-based hardware structures: (1) a time-based victim cache that improves performance by only storing conflict miss lines with likely reuse, and (2) a time-based prefetching technique that hones in on the right address to prefetch, and the right time to schedule the prefetch. Our victim cache technique improves performance over previous proposals by better selections of what to place in the victim cache. Our prefetching technique outperforms similar prior hardware prefetching proposals, despite being orders of magnitude smaller. Overall, these techniques improve performance by more than 11% across the SPEC2000 benchmark suite.

...read moreread less

157 citations

Book•

Cache and memory hierarchy design: a performance-directed approach

[...]

Steven A. Przybylski

03 Jan 1990

TL;DR: This paper presents a meta-modelling architecture that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and implementing a caching system.

...read moreread less

Abstract: 1. Introduction 2. Background Material 3. The Cache Design Problem and Its Solution 4. Performance-Directed Cache Design 5. Multi-Level Cache Hierarchies 6. Summary, Implications and Conclusions Appendix A. Validation of the Empirical Results Appendix B. Modelling Write Strategy Effects

...read moreread less

156 citations

Collapse

Network Information

Performance

Metrics

11,507

Papers

268,081

Citations

No. of papers in the topic in previous years
Year	Papers
2023	42
2022	110
2021	12
2020	20
2019	15
2018	30

Cache pollution

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics