Topic

Cache invalidation

About: Cache invalidation is a research topic. Over the lifetime, 10539 publications have been published within this topic receiving 245409 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Using dead blocks as a virtual victim cache

[...]

Samira Khan¹, Doug Burger², Daniel A. Jimenez¹, Babak Falsafi³•Institutions (3)

University of Texas at San Antonio¹, Microsoft², École Polytechnique Fédérale de Lausanne³

11 Sep 2010

TL;DR: Increasing cache efficiency can improve performance by reducing miss rate, or alternately, improve power and energy by allowing a smaller cache with the same miss rate.

...read moreread less

Abstract: Caches mitigate the long memory latency that limits the performance of modern processors. However, caches can be quite inefficient. On average, a cache block in a 2MB L2 cache is dead 59% of the time, i.e., it will not be referenced again before it is evicted. Increasing cache efficiency can improve performance by reducing miss rate, or alternately, improve power and energy by allowing a smaller cache with the same miss rate.This paper proposes using predicted dead blocks to hold blocks evicted from other sets. When these evicted blocks are referenced again, the access can be satisfied from the other set, avoiding a costly access to main memory. The pool of predicted dead blocks can be thought of as a virtual victim cache. For a set of memory-intensive single-threaded workloads, a virtual victim cache in a 16-way set associative 2MB L2 cache reduces misses by 26%, yields an geometric mean speedup of 12.1% and improves cache efficiency by 27% on average, where cache efficiency is defined as the average time during which cache blocks contain live information. This virtual victim cache yields a lower average miss rate than a fully-associative LRU cache of the same capacity. For a set of multi-core workloads, the virtual victim cache improves throughput performance by 4% over LRU while improving cache efficiency by 62%.Alternately, a 1.7MB virtual victim cache achieves about the same performance as a larger 2MB L2 cache, reducing the number of SRAM cells required by 16%, thus maintaining performance while reducing power and area.

...read moreread less

75 citations

Proceedings Article•DOI•

L1 data cache decomposition for energy efficiency

[...]

Michael C. Huang¹, Jose Renau¹, Seung-Moon Yoo¹, Josep Torrellas¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

06 Aug 2001

TL;DR: In this paper, a new L1 data cache structure that combines a Specialized Stack Cache (SSC) and a Pseudo Set-Associative Cache (PSAC) is proposed.

...read moreread less

Abstract: The L1 data cache is a time-critical module and, at the same time a major consumer of energy To reduce its energy-delay product, we apply two principles of low-power design: specialize part of the cache structure and break the cache down into smaller caches To this end, we propose a new L1 data cache structure that combines a Specialized Stack Cache (SSC) and a Pseudo Set-Associative Cache (PSAC) Individually, our SSC and PSAC designs have a lower energy-delay product than previously-proposed related designs In addition, their combined operation is very effective Relative to a conventional 2-way 32 KB data cache, a design containing a 4-way 32 KB PSAC and a 512 B SSC reduces the energy-delay product of several applications by an average of 44%

...read moreread less

75 citations

Patent•

Maintaining cache coherency during a memory read operation in a multiprocessing computer system

[...]

James B. Keller¹, Derrick R. Meyer¹•Institutions (1)

Advanced Micro Devices¹

21 Dec 1998

TL;DR: In this article, a cache-coherent data transfer during a memory read operation in a multiprocessing computer system is described, where a source node sends a read command to a target processing node to read data from a designated memory location in a system memory associated with the target node.

...read moreread less

Abstract: A messaging scheme that accomplishes cache-coherent data transfers during a memory read operation in a multiprocessing computer system is described. A source processing node sends a read command to a target processing node to read data from a designated memory location in a system memory associated with the target processing node. In response to the read command, the target processing node transmits a probe command to all the remaining processing nodes in the computer system regardless of whether one or more of the remaining nodes have a copy of the data cached in their respective cache memories. Probe command causes each node to maintain cache coherency by appropriately changing the state of the cache block containing the requested data and by causing the node having an updated copy of the cache block to send the cache block to the source node. Each processing node that receives a probe command sends, in return, a probe response indicating whether that processing node has a cached copy of the data and the state of the cached copy if the responding node has the cached copy. The target node sends a read response including the requested data to the source node. The source node waits for responses from the target node and from each of the remaining node in the system and acknowledges the receipt of requested data by sending a source done response to the target node.

...read moreread less

75 citations

Journal Article•DOI•

Randomized cache placement for eliminating conflicts

[...]

Nigel Topham¹, Antonio González•Institutions (1)

University of Edinburgh¹

01 Feb 1999-IEEE Transactions on Computers

TL;DR: This paper shows how the introduction of a pseudorandom element into the cache index function can effectively eliminate repetitive conflict misses and produce a cache where miss ratio depends solely on working set behavior.

...read moreread less

Abstract: Applications with regular patterns of memory access can experience high levels of cache conflict misses. In shared-memory multiprocessors conflict misses can be increased significantly by the data transpositions required for parallelization. Techniques such as blocking which are introduced within a single thread to improve locality, can result in yet more conflict misses. The tension between minimizing cache conflicts and the other transformations needed for efficient parallelization leads to complex optimization problems for parallelizing compilers. This paper shows how the introduction of a pseudorandom element into the cache index function can effectively eliminate repetitive conflict misses and produce a cache where miss ratio depends solely on working set behavior. We examine the impact of pseudorandom cache indexing on processor cycle times and present practical solutions to some of the major implementation issues for this type of cache. Our conclusions are supported by simulations of a superscalar out-of-order processor executing the SPEC95 benchmarks, as well as from cache simulations of individual loop kernels to illustrate specific effects. We present measurements of instructions committed per cycle (IPC) when comparing the performance of different cache architectures on whole-program benchmarks such as the SPEC95 suite.

...read moreread less

75 citations

Patent•

Data Prefetch Throttle

[...]

Michael William Morrow¹, James Norris Dieffenderfer¹•Institutions (1)

Qualcomm¹

10 Jul 2007

TL;DR: In this paper, a cache controller is configured to track a prefetch hit rate reflecting the percentage of hits on the data cache that involve prefetched data lines and disable data prefetching if the prefetchhit rate falls below a defined threshold.

...read moreread less

Abstract: A system and method taught herein control data prefetching for a data cache by tracking prefetch hits and overall hits for the data cache. Data prefetching for the data cache is disabled based on the tracking of prefetch hits and data prefetching is enabled for the data cache based on the tracking of overall hits. For example, in one or more embodiments, a cache controller is configured to track a prefetch hit rate reflecting the percentage of hits on the data cache that involve prefetched data lines and disable data prefetching if the prefetch hit rate falls below a defined threshold. The cache controller also tracks an overall hit rate reflecting the overall percentage of data cache hits (versus misses) and enables data prefetching if the overall hit rate falls below a defined threshold.

...read moreread less

75 citations

Collapse

Network Information

Performance

Metrics

10,702

Papers

250,710

Citations

No. of papers in the topic in previous years
Year	Papers
2023	44
2022	117
2021	4
2020	8
2019	7
2018	20

Cache invalidation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics