scispace - formally typeset
Search or ask a question
Topic

Cache pollution

About: Cache pollution is a research topic. Over the lifetime, 11353 publications have been published within this topic receiving 262139 citations.


Papers
More filters
Proceedings ArticleDOI
04 Dec 2010
TL;DR: The zcache is presented, a cache design that allows much higher associativity than the number of physical ways, and it is shown that zcaches provide higher performance and better energy efficiency than conventional caches without incurring the overheads of designs with a large number of ways.
Abstract: The ever-increasing importance of main memory latency and bandwidth is pushing CMPs towards caches with higher capacity and associativity. Associativity is typically improved by increasing the number of ways. This reduces conflict misses, but increases hit latency and energy, placing a stringent trade-off on cache design. We present the zcache, a cache design that allows much higher associativity than the number of physical ways (e.g. a 64-associative cache with 4 ways). The zcache draws on previous research on skew-associative caches and cuckoo hashing. Hits, the common case, require a single lookup, incurring the latency and energy costs of a cache with a very low number of ways. On a miss, additional tag lookups happen off the critical path, yielding an arbitrarily large number of replacement candidates for the incoming block. Unlike conventional designs, the zcache provides associativity by increasing the number of replacement candidates, but not the number of cache ways. To understand the implications of this approach, we develop a general analysis framework that allows to compare associativity across different cache designs (e.g. a set-associative cache and a zcache) by representing associativity as a probability distribution. We use this framework to show that for zcaches, associativity depends only on the number of replacement candidates, and is independent of other factors (such as the number of cache ways or the workload). We also show that, for the same number of replacement candidates, the associativity of a zcache is superior than that of a set-associative cache for most workloads. Finally, we perform detailed simulations of multithreaded and multiprogrammed workloads on a large-scale CMP with zcache as the last-level cache. We show that zcaches provide higher performance and better energy efficiency than conventional caches without incurring the overheads of designs with a large number of ways.

203 citations

Patent
16 May 1991
TL;DR: In this paper, a microprocessor partially decodes instructions retrieved from main memory before placing them into the microprocessor's integrated instruction cache, each storage location in the instruction cache includes two slots for decoded instructions.
Abstract: A microprocessor partially decodes instructions retrieved from main memory before placing them into the microprocessor's integrated instruction cache. Each storage location in the instruction cache includes two slots for decoded instructions. One slot controls one of the microprocessor's integer pipelines and a port to the microprocessor's data cache. A second slot controls the second integer pipeline or one of the microprocessor's floating point units. The instructions retrieved from main memory are decoded by a loader unit which decodes the instructions from the compact form as stored in main memory and places them into the two slots of the instruction cache entry according to their functions. In addition, auxiliary information is placed in the cache entry along with the instruction to control parallel execution as well as emulation of complex instructions. A bit in each instruction cache entry indicates whether the instructions in the two slots are independent, so that they can be executed in parallel, or dependent, so that they must be executed sequentially. Using a single bit for this purpose allows two dependent instructions to be stored in the slots of the single cache entry.

203 citations

Journal ArticleDOI
TL;DR: This work presents the first quantitative, analytical results for the predictability of replacement policies, and introduces three metrics, evict, fill, and mls that capture aspects of cache-state predictability.
Abstract: Hard real-time systems must obey strict timing constraints. Therefore, one needs to derive guarantees on the worst-case execution times of a system's tasks. In this context, predictable behavior of system components is crucial for the derivation of tight and thus useful bounds. This paper presents results about the predictability of common cache replacement policies. To this end, we introduce three metrics, evict, fill, and mls that capture aspects of cache-state predictability. A thorough analysis of the LRU, FIFO, MRU, and PLRU policies yields the respective values under these metrics. To the best of our knowledge, this work presents the first quantitative, analytical results for the predictability of replacement policies. Our results support empirical evidence in static cache analysis.

202 citations

Proceedings Article
Arun Iyengar1, Jim Challenger1
08 Dec 1997
TL;DR: The DynamicWeb cache is analyzed, which resulted in near-optimal performance for many cases and 58% of optimal performance in the worst case on systems which invoke server programs via CGI.
Abstract: Dynamic Web pages can seriously reduce the performance of Web servers. One technique for improving performance is to cache dynamic Web pages. We have developed the Dynamic Web cache which is particularly well-suited for dynamic pages. Our cache has improved performance significantly at several commercial Web sites. This paper analyzes the design and performance of the DynamicWeb cache. It also presents a model for analyzing overall system performance in the presence of caching. Our cache can satisfy several hundred requests per second. On systems which invoke server programs via CGI, the DynamicWeb cache results in near-optimal performance, where optimal performance is that which would be achieved by a hypothetical cache which consumed no CPU cycles. On a system we tested which invoked server programs via ICAPI which has significantly less overhead than CGI, the DynamicWeb cache resulted in near-optimal performance for many cases and 58% of optimal performance in the worst case. The DynamicWeb cache achieved a hit rate of around 80% when it was deployed to support the official Internet Web site for the 1996 Atlanta Olympic games.

201 citations

Proceedings ArticleDOI
05 Jan 1997
TL;DR: In this article, the effect of cache misses on the performance of sorting algorithms was investigated both experimentally and analytically, and it was shown that high cache miss penalties lead to worse overall performance than the efficient comparison based sorting algorithms.
Abstract: We investigate the effect that caches have on the performance of sorting algorithms both experimentally and analytically. To address the performance problems that high cache miss penalties introduce we restructure mergesort, quicksort, and heapsort in order to improve their cache locality. For all three algorithms the improvement in cache performance leads to a reduction in total execution time. We also investigate the performance of radix sort. Despite the extremely low instruction count incurred by this linear time sorting algorithm, its relatively poor cache performance results in worse overall performance than the efficient comparison based sorting algorithms. For each algorithm we provide an analysis that closely predicts the number of cache misses incurred by the algorithm.

200 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
93% related
Compiler
26.3K papers, 578.5K citations
89% related
Scalability
50.9K papers, 931.6K citations
87% related
Server
79.5K papers, 1.4M citations
86% related
Static routing
25.7K papers, 576.7K citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202342
2022110
202112
202020
201915
201830