Topic

Cache pollution

About: Cache pollution is a research topic. Over the lifetime, 11353 publications have been published within this topic receiving 262139 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Improving CC-NUMA performance using Instruction-based Prediction

[...]

Stefanos Kaxiras¹, James R. Goodman•Institutions (1)

Bell Labs¹

09 Jan 1999

TL;DR: It is shown that for the first two optimizations, instruction-based prediction, using few predictor entries per node, outpaces address based schemes, and for the producer consumer optimization which uses speculative execution, low mis speculation rates show promise for performance improvements.

...read moreread less

Abstract: We propose Instruction-based Prediction as a means to optimize directory based cache coherent NUMA shared memory. Instruction-based prediction is based on observing the behavior of load and store instructions in relation to coherent events and predicting their future behavior. Although this technique is well established in the uniprocessor world, it has not been widely applied for optimizing transparent shared memory. Typically, in this environment, prediction is based on data block access history (address based prediction) in the form of adaptive cache coherence protocols. The advantage of instruction-based prediction is that it requires few hardware resources in the form of small prediction structures per node to match (or exceed) the performance of address based prediction. To show the potential of instruction-based prediction we propose and evaluate three different optimizations: i) a migratory sharing optimization, ii) a wide sharing optimization, and iii) a producer consumer optimization based on speculative execution. With execution driven simulation and a set of nine benchmarks we show that i) for the first two optimizations, instruction-based prediction, using few predictor entries per node, outpaces address based schemes, and (ii) for the producer consumer optimization which uses speculative execution, low mis speculation rates show promise for performance improvements.

...read moreread less

95 citations

Proceedings Article•DOI•

WCET Analysis of Multi-level Non-inclusive Set-Associative Instruction Caches

[...]

Damien Hardy, Isabelle Puaut

30 Nov 2008

TL;DR: This paper proposes a safe static instruction cache analysis method for multi-level non-inclusive caches, and shows that in all cases WCET estimations are much tighter when considering the cache hierarchy than when considering only the L1 cache.

...read moreread less

Abstract: With the advent of increasingly complex hardware in real-time embedded systems (processors with performance enhancing features such as pipelines, cache hierarchy, multiple cores), many processors now have a set-associative L2 cache. Thus, there is a need for considering cache hierarchies when validating the temporal behavior of real-time systems, in particular when estimating tasks' worst-case execution times (WCETs). In this paper, we propose a safe static instruction cache analysis method for multi-level non-inclusive caches. The proposed method is experimented on medium-size and large programs. We show that the method is reasonably tight. We further show that in all cases WCET estimations are much tighter when considering the cache hierarchy than when considering only the L1 cache. An evaluation of the analysis time is conducted, demonstrating that analyzing the cache hierarchy has a reasonable computation time.

...read moreread less

95 citations

Journal Article•DOI•

SC2: a statistical compression cache scheme

[...]

Angelos Arelakis¹, Per Stenström¹•Institutions (1)

Chalmers University of Technology¹

14 Jun 2014

TL;DR: This paper presents, for the first time, a detailed design-space exploration of caches that utilize statistical compression and shows that more aggressive approaches like Huffman coding, which have been neglected in the past due to the high processing overhead for (de)compression, are suitable techniques for caches and memory.

...read moreread less

Abstract: Low utilization of on-chip cache capacity limits performance and wastes energy because of the long latency, limited bandwidth, and energy consumption associated with off-chip memory accesses. Value replication is an important source of low capacity utilization. While prior cache compression techniques manage to code frequent values densely, they trade off a high compression ratio for low decompression latency, thus missing opportunities to utilize capacity more effectively.This paper presents, for the first time, a detailed designspace exploration of caches that utilize statistical compression. We show that more aggressive approaches like Huffman coding, which have been neglected in the past due to the high processing overhead for (de)compression, are suitable techniques for caches and memory. Based on our key observation that value locality varies little over time and across applications, we first demonstrate that the overhead of statistics acquisition for code generation is low because new encodings are needed rarely, making it possible to off-load it to software routines. We then show that the high compression ratio obtained by Huffman-coding makes it possible to utilize the performance benefits of 4X larger last-level caches with about 50% lower power consumption than such larger caches

...read moreread less

95 citations

Patent•

Dynamically controlling cache size to maximize energy efficiency

[...]

Avinash N. Ananthakrishnan¹, Efraim Rotem¹, Eliezer Weissmann¹, Doron Rajwan¹, Nadav Shulman¹, Alon Naveh¹, Abu-Salah Hisham¹ - Show less +3 more•Institutions (1)

Intel¹

31 Oct 2011

TL;DR: In this paper, the authors present a processor with a plurality of cores and a cache memory coupled to the cores and including a pluralityof partitions, which can dynamically vary a size of the cache memory based on a memory boundedness of a workload executed on at least one of the cores.

...read moreread less

Abstract: In one embodiment, the present invention is directed to a processor having a plurality of cores and a cache memory coupled to the cores and including a plurality of partitions. The processor can further include a logic to dynamically vary a size of the cache memory based on a memory boundedness of a workload executed on at least one of the cores. Other embodiments are described and claimed.

...read moreread less

95 citations

Patent•

Multiprocessor system with high density memory set architecture including partitionable cache store interface to shared disk drive memory

[...]

John Joseph Brann¹, Charles Samuel Freer¹, Warren Walter Jensen¹•Institutions (1)

IBM¹

05 May 1980

TL;DR: The addressable cache memory feature overcomes the latency delay which inherently occurs in seeking the beginning of a region to be accessed on the disk drive mass storage in a multiprocessor system as discussed by the authors.

...read moreread less

Abstract: In a multiprocessor system, a controllable cache store interface to a shared disk memory employs a plurality of storage partitions whose access is interleaved in a time domain multiplexed manner on a common bus with the shared disk to enable high speed sharing of the disk storage by all of the processors. The communication between each processor and its corresponding cache memory partition can be overlapped with each other and with accesses between the cache memory and the commonly shared disk memory. The addressable cache memory feature overcomes the latency delay which inherently occurs in seeking the beginning of a region to be accessed on the disk drive mass storage.

...read moreread less

95 citations

Collapse

Network Information

Performance

Metrics

11,507

Papers

268,081

Citations

No. of papers in the topic in previous years
Year	Papers
2023	42
2022	110
2021	12
2020	20
2019	15
2018	30

Cache pollution

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics