scispace - formally typeset
Search or ask a question
Topic

Cache invalidation

About: Cache invalidation is a research topic. Over the lifetime, 10539 publications have been published within this topic receiving 245409 citations.


Papers
More filters
Proceedings ArticleDOI
11 Sep 2010
TL;DR: Increasing cache efficiency can improve performance by reducing miss rate, or alternately, improve power and energy by allowing a smaller cache with the same miss rate.
Abstract: Caches mitigate the long memory latency that limits the performance of modern processors. However, caches can be quite inefficient. On average, a cache block in a 2MB L2 cache is dead 59% of the time, i.e., it will not be referenced again before it is evicted. Increasing cache efficiency can improve performance by reducing miss rate, or alternately, improve power and energy by allowing a smaller cache with the same miss rate.This paper proposes using predicted dead blocks to hold blocks evicted from other sets. When these evicted blocks are referenced again, the access can be satisfied from the other set, avoiding a costly access to main memory. The pool of predicted dead blocks can be thought of as a virtual victim cache. For a set of memory-intensive single-threaded workloads, a virtual victim cache in a 16-way set associative 2MB L2 cache reduces misses by 26%, yields an geometric mean speedup of 12.1% and improves cache efficiency by 27% on average, where cache efficiency is defined as the average time during which cache blocks contain live information. This virtual victim cache yields a lower average miss rate than a fully-associative LRU cache of the same capacity. For a set of multi-core workloads, the virtual victim cache improves throughput performance by 4% over LRU while improving cache efficiency by 62%.Alternately, a 1.7MB virtual victim cache achieves about the same performance as a larger 2MB L2 cache, reducing the number of SRAM cells required by 16%, thus maintaining performance while reducing power and area.

75 citations

Proceedings ArticleDOI
06 Aug 2001
TL;DR: In this paper, a new L1 data cache structure that combines a Specialized Stack Cache (SSC) and a Pseudo Set-Associative Cache (PSAC) is proposed.
Abstract: The L1 data cache is a time-critical module and, at the same time a major consumer of energy To reduce its energy-delay product, we apply two principles of low-power design: specialize part of the cache structure and break the cache down into smaller caches To this end, we propose a new L1 data cache structure that combines a Specialized Stack Cache (SSC) and a Pseudo Set-Associative Cache (PSAC) Individually, our SSC and PSAC designs have a lower energy-delay product than previously-proposed related designs In addition, their combined operation is very effective Relative to a conventional 2-way 32 KB data cache, a design containing a 4-way 32 KB PSAC and a 512 B SSC reduces the energy-delay product of several applications by an average of 44%

75 citations

Patent
21 Dec 1998
TL;DR: In this article, a cache-coherent data transfer during a memory read operation in a multiprocessing computer system is described, where a source node sends a read command to a target processing node to read data from a designated memory location in a system memory associated with the target node.
Abstract: A messaging scheme that accomplishes cache-coherent data transfers during a memory read operation in a multiprocessing computer system is described. A source processing node sends a read command to a target processing node to read data from a designated memory location in a system memory associated with the target processing node. In response to the read command, the target processing node transmits a probe command to all the remaining processing nodes in the computer system regardless of whether one or more of the remaining nodes have a copy of the data cached in their respective cache memories. Probe command causes each node to maintain cache coherency by appropriately changing the state of the cache block containing the requested data and by causing the node having an updated copy of the cache block to send the cache block to the source node. Each processing node that receives a probe command sends, in return, a probe response indicating whether that processing node has a cached copy of the data and the state of the cached copy if the responding node has the cached copy. The target node sends a read response including the requested data to the source node. The source node waits for responses from the target node and from each of the remaining node in the system and acknowledges the receipt of requested data by sending a source done response to the target node.

75 citations

Journal ArticleDOI
TL;DR: This paper shows how the introduction of a pseudorandom element into the cache index function can effectively eliminate repetitive conflict misses and produce a cache where miss ratio depends solely on working set behavior.
Abstract: Applications with regular patterns of memory access can experience high levels of cache conflict misses. In shared-memory multiprocessors conflict misses can be increased significantly by the data transpositions required for parallelization. Techniques such as blocking which are introduced within a single thread to improve locality, can result in yet more conflict misses. The tension between minimizing cache conflicts and the other transformations needed for efficient parallelization leads to complex optimization problems for parallelizing compilers. This paper shows how the introduction of a pseudorandom element into the cache index function can effectively eliminate repetitive conflict misses and produce a cache where miss ratio depends solely on working set behavior. We examine the impact of pseudorandom cache indexing on processor cycle times and present practical solutions to some of the major implementation issues for this type of cache. Our conclusions are supported by simulations of a superscalar out-of-order processor executing the SPEC95 benchmarks, as well as from cache simulations of individual loop kernels to illustrate specific effects. We present measurements of instructions committed per cycle (IPC) when comparing the performance of different cache architectures on whole-program benchmarks such as the SPEC95 suite.

75 citations

Patent
10 Jul 2007
TL;DR: In this paper, a cache controller is configured to track a prefetch hit rate reflecting the percentage of hits on the data cache that involve prefetched data lines and disable data prefetching if the prefetchhit rate falls below a defined threshold.
Abstract: A system and method taught herein control data prefetching for a data cache by tracking prefetch hits and overall hits for the data cache. Data prefetching for the data cache is disabled based on the tracking of prefetch hits and data prefetching is enabled for the data cache based on the tracking of overall hits. For example, in one or more embodiments, a cache controller is configured to track a prefetch hit rate reflecting the percentage of hits on the data cache that involve prefetched data lines and disable data prefetching if the prefetch hit rate falls below a defined threshold. The cache controller also tracks an overall hit rate reflecting the overall percentage of data cache hits (versus misses) and enables data prefetching if the overall hit rate falls below a defined threshold.

75 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
93% related
Scalability
50.9K papers, 931.6K citations
88% related
Server
79.5K papers, 1.4M citations
88% related
Network packet
159.7K papers, 2.2M citations
83% related
Dynamic Source Routing
32.2K papers, 695.7K citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202344
2022117
20214
20208
20197
201820