Topic

Cache pollution

About: Cache pollution is a research topic. Over the lifetime, 11353 publications have been published within this topic receiving 262139 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Dissertation•

Software methods for improvement of cache performance on supercomputer applications

[...]

Allan Kennedy Porterfield, Ken Kennedy

01 Jan 1989

TL;DR: Measurements of actual supercomputer cache performance has not been previously undertaken, and PFC-Sim, a program-driven event tracing facility that can simulate data cache performance of very long programs, is used to measure the performance of various cache structures.

...read moreread less

Abstract: Measurements of actual supercomputer cache performance has not been previously undertaken. PFC-Sim is a program-driven event tracing facility that can simulate data cache performance of very long programs. PFC-Sim simulates cache concurrently with program execution, allowing very long traces to be used. Programs with traces in excess of 4 billion entries have been used to measure the performance of various cache structures. PFC-Sim was used to measure the cache performance of array references in a benchmark set of supercomputer applications, RiCEPS. Data cache hit ratios varied on average between 70% for a 16K cache and 91% for a 256K cache. Programs with very large working sets generate poor cache performance even with large caches. The hit ratios of individual references are measured to either 0% or 100%. By locating the references that miss, attempts to improve memory performance can focus on references where improvement is possible. The compiler can estimate the number of loop iterations which can execute without filling the cache, the overflow iteration. The overflow iteration combined with the dependence graph can be used to determine at each reference whether execution will result in hits or misses. Program transformation can be used to improve cache performance by reordering computation to move references to the same memory location closer together, thereby eliminating cache misses. Using the overflow iteration, the compiler can often do this transformation automatically. Standard blocking transformations cannot be used on many loop nests that contain transformation preventing dependences. Wavefront blocking allows any loop nest to be blocked, when the components of dependence vectors are bounded. When the cache misses cannot be eliminated, software prefetching can overlap the miss delays with computation. Software prefetching uses a special instruction to preload values into the cache. A cache load resembles a register load in structure, but does not block computation and only moves the address into cache where a later register load will be required. The compiler can inform the cache (on average) over 100 cycles before a load is required. Cache misses can be serviced in parallel with computation.

...read moreread less

210 citations

Proceedings Article•DOI•

Distance associativity for high-performance energy-efficient non-uniform cache architectures

[...]

Zeshan A. Chishti¹, Michael D. Powell¹, T. N. Vijaykumar¹•Institutions (1)

Purdue University¹

03 Dec 2003

TL;DR: NuRAPID is proposed, which averages sequential tag-data access to decouple data placement from tag placement, resulting in higher performance and substantially lower cache energy.

...read moreread less

Abstract: Wire delays continue to grow as the dominant component oflatency for large caches.A recent work proposed an adaptive,non-uniform cache architecture (NUCA) to manage large, on-chipcaches.By exploiting the variation in access time acrosswidely-spaced subarrays, NUCA allows fast access to closesubarrays while retaining slow access to far subarrays.Whilethe idea of NUCA is attractive, NUCA does not employ designchoices commonly used in large caches, such as sequential tag-dataaccess for low power.Moreover, NUCA couples dataplacement with tag placement foregoing the flexibility of dataplacement and replacement that is possible in a non-uniformaccess cache.Consequently, NUCA can place only a few blockswithin a given cache set in the fastest subarrays, and mustemploy a high-bandwidth switched network to swap blockswithin the cache for high performance.In this paper, we proposethe Non-uniform access with Replacement And PlacementusIng Distance associativity" cache, or NuRAPID, whichleverages sequential tag-data access to decouple data placementfrom tag placement.Distance associativity, the placementof data at a certain distance (and latency), is separated from setassociativity, the placement of tags within a set.This decouplingenables NuRAPID to place flexibly the vast majority offrequently-accessed data in the fastest subarrays, with fewerswaps than NUCA.Distance associativity fundamentallychanges the trade-offs made by NUCA's best-performingdesign, resulting in higher performance and substantiallylower cache energy.A one-ported, non-banked NuRAPIDcache improves performance by 3% on average and up to 15%compared to a multi-banked NUCA with an infinite-bandwidthswitched network, while reducing L2 cache energy by 77%.

...read moreread less

210 citations

Book Chapter•DOI•

Advances on access-driven cache attacks on AES

[...]

Michael Neve¹, Jean-Pierre Seifert²•Institutions (2)

Intel¹, University of Haifa²

17 Aug 2006

TL;DR: This work shows that access-driven cache-based attacks are becoming easier to understand and analyze, and when such attacks are mounted against systems performing AES, only a very limited number of encryptions are required to recover the whole key with a high probability of success.

...read moreread less

Abstract: An access-driven attack is a class of cache-based side channel analysis. Like the time-driven attack, the cache's timings are under inspection as a source of information leakage. Access-driven attacks scrutinize the cache behavior with a finer granularity, rather than evaluating the overall execution time. Access-driven attacks leverage the ability to detect whether a cache line has been evicted, or not, as the primary mechanism for mounting an attack. In this paper we focus on the case of AES and we show that the vast majority of processors suffer from this cache-based vulnerability. Our best results are indeed performed on a processor without the multi-threading capabilities -- in contrast to previous works in this area that had suggested that multi-threading actually improved, or even made possible, this class of attack. Despite some technical difficulties required to mount such attacks, our work shows that access-driven cache-based attacks are becoming easier to understand and analyze. Also, when such attacks are mounted against systems performing AES, only a very limited number of encryptions are required to recover the whole key with a high probability of success, due to our last round analysis from the ciphertext.

...read moreread less

208 citations

Patent•

Methods and Systems for Caching Content at Multiple Levels

[...]

Chris King¹, Steve Mullaney¹, Jamshid Mahdavi¹, Ravikumar Venkata Duvvuri¹•Institutions (1)

Blue Coat Systems¹

23 Mar 2007

TL;DR: A cache includes an object cache layer and a byte cache layer, each configured to store information to storage devices included in the cache appliance as mentioned in this paper, and an application proxy layer may also be included.

...read moreread less

Abstract: A cache includes an object cache layer and a byte cache layer, each configured to store information to storage devices included in the cache appliance. An application proxy layer may also be included. In addition, the object cache layer may be configured to identify content that should not be cached by the byte cache layer, which itself may be configured to compress contents of the object cache layer. In some cases the contents of the byte cache layer may be stored as objects within the object cache.

...read moreread less

208 citations

Proceedings Article•DOI•

Real-time cache management framework for multi-core architectures

[...]

Renato Mancuso¹, R. Dudko¹, Emiliano Betti², Marco Cesati², Marco Caccamo¹, Rodolfo Pellizzoni³ - Show less +2 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, University of Rome Tor Vergata², University of Waterloo³

09 Apr 2013

TL;DR: A complete framework to analyze and profile task memory access patterns and a novel kernel-level cache management technique to enforce an efficient and deterministic cache allocation of the most frequently accessed memory areas are proposed.

...read moreread less

Abstract: Multi-core architectures are shaking the fundamental assumption that in real-time systems the WCET, used to analyze the schedulability of the complete system, is calculated on individual tasks. This is not even true in an approximate sense in a modern multi-core chip, due to interference caused by hardware resource sharing. In this work we propose (1) a complete framework to analyze and profile task memory access patterns and (2) a novel kernel-level cache management technique to enforce an efficient and deterministic cache allocation of the most frequently accessed memory areas. In this way, we provide a powerful tool to address one of the main sources of interference in a system where the last level of cache is shared among two or more CPUs. The technique has been implemented on commercial hardware and our evaluations show that it can be used to significantly improve the predictability of a given set of critical tasks.

...read moreread less

207 citations

Collapse

Network Information

Performance

Metrics

11,507

Papers

268,081

Citations

No. of papers in the topic in previous years
Year	Papers
2023	42
2022	110
2021	12
2020	20
2019	15
2018	30

Cache pollution

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics