scispace - formally typeset
Search or ask a question
Topic

Cache pollution

About: Cache pollution is a research topic. Over the lifetime, 11353 publications have been published within this topic receiving 262139 citations.


Papers
More filters
Proceedings ArticleDOI
09 Jan 1999
TL;DR: It is shown that for the first two optimizations, instruction-based prediction, using few predictor entries per node, outpaces address based schemes, and for the producer consumer optimization which uses speculative execution, low mis speculation rates show promise for performance improvements.
Abstract: We propose Instruction-based Prediction as a means to optimize directory based cache coherent NUMA shared memory. Instruction-based prediction is based on observing the behavior of load and store instructions in relation to coherent events and predicting their future behavior. Although this technique is well established in the uniprocessor world, it has not been widely applied for optimizing transparent shared memory. Typically, in this environment, prediction is based on data block access history (address based prediction) in the form of adaptive cache coherence protocols. The advantage of instruction-based prediction is that it requires few hardware resources in the form of small prediction structures per node to match (or exceed) the performance of address based prediction. To show the potential of instruction-based prediction we propose and evaluate three different optimizations: i) a migratory sharing optimization, ii) a wide sharing optimization, and iii) a producer consumer optimization based on speculative execution. With execution driven simulation and a set of nine benchmarks we show that i) for the first two optimizations, instruction-based prediction, using few predictor entries per node, outpaces address based schemes, and (ii) for the producer consumer optimization which uses speculative execution, low mis speculation rates show promise for performance improvements.

95 citations

Proceedings ArticleDOI
30 Nov 2008
TL;DR: This paper proposes a safe static instruction cache analysis method for multi-level non-inclusive caches, and shows that in all cases WCET estimations are much tighter when considering the cache hierarchy than when considering only the L1 cache.
Abstract: With the advent of increasingly complex hardware in real-time embedded systems (processors with performance enhancing features such as pipelines, cache hierarchy, multiple cores), many processors now have a set-associative L2 cache. Thus, there is a need for considering cache hierarchies when validating the temporal behavior of real-time systems, in particular when estimating tasks' worst-case execution times (WCETs). In this paper, we propose a safe static instruction cache analysis method for multi-level non-inclusive caches. The proposed method is experimented on medium-size and large programs. We show that the method is reasonably tight. We further show that in all cases WCET estimations are much tighter when considering the cache hierarchy than when considering only the L1 cache. An evaluation of the analysis time is conducted, demonstrating that analyzing the cache hierarchy has a reasonable computation time.

95 citations

Journal ArticleDOI
14 Jun 2014
TL;DR: This paper presents, for the first time, a detailed design-space exploration of caches that utilize statistical compression and shows that more aggressive approaches like Huffman coding, which have been neglected in the past due to the high processing overhead for (de)compression, are suitable techniques for caches and memory.
Abstract: Low utilization of on-chip cache capacity limits performance and wastes energy because of the long latency, limited bandwidth, and energy consumption associated with off-chip memory accesses. Value replication is an important source of low capacity utilization. While prior cache compression techniques manage to code frequent values densely, they trade off a high compression ratio for low decompression latency, thus missing opportunities to utilize capacity more effectively.This paper presents, for the first time, a detailed designspace exploration of caches that utilize statistical compression. We show that more aggressive approaches like Huffman coding, which have been neglected in the past due to the high processing overhead for (de)compression, are suitable techniques for caches and memory. Based on our key observation that value locality varies little over time and across applications, we first demonstrate that the overhead of statistics acquisition for code generation is low because new encodings are needed rarely, making it possible to off-load it to software routines. We then show that the high compression ratio obtained by Huffman-coding makes it possible to utilize the performance benefits of 4X larger last-level caches with about 50% lower power consumption than such larger caches

95 citations

Patent
31 Oct 2011
TL;DR: In this paper, the authors present a processor with a plurality of cores and a cache memory coupled to the cores and including a pluralityof partitions, which can dynamically vary a size of the cache memory based on a memory boundedness of a workload executed on at least one of the cores.
Abstract: In one embodiment, the present invention is directed to a processor having a plurality of cores and a cache memory coupled to the cores and including a plurality of partitions. The processor can further include a logic to dynamically vary a size of the cache memory based on a memory boundedness of a workload executed on at least one of the cores. Other embodiments are described and claimed.

95 citations

Patent
05 May 1980
TL;DR: The addressable cache memory feature overcomes the latency delay which inherently occurs in seeking the beginning of a region to be accessed on the disk drive mass storage in a multiprocessor system as discussed by the authors.
Abstract: In a multiprocessor system, a controllable cache store interface to a shared disk memory employs a plurality of storage partitions whose access is interleaved in a time domain multiplexed manner on a common bus with the shared disk to enable high speed sharing of the disk storage by all of the processors. The communication between each processor and its corresponding cache memory partition can be overlapped with each other and with accesses between the cache memory and the commonly shared disk memory. The addressable cache memory feature overcomes the latency delay which inherently occurs in seeking the beginning of a region to be accessed on the disk drive mass storage.

95 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
93% related
Compiler
26.3K papers, 578.5K citations
89% related
Scalability
50.9K papers, 931.6K citations
87% related
Server
79.5K papers, 1.4M citations
86% related
Static routing
25.7K papers, 576.7K citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202342
2022110
202112
202020
201915
201830