Topic

Cache invalidation

About: Cache invalidation is a research topic. Over the lifetime, 10539 publications have been published within this topic receiving 245409 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Cache-efficient dynamic programming algorithms for multicores

[...]

Rezaul Chowdhury¹, Vijaya Ramachandran¹•Institutions (1)

University of Texas at Austin¹

14 Jun 2008

TL;DR: This work develops a generic CMP algorithm with an associated tiling sequence and provides a parallel schedule that results in a cache-efficient parallel execution up to the critical path length of the underlying dynamic programming algorithm.

...read moreread less

Abstract: We present cache-efficient chip multiprocessor (CMP) algorithms with good speed-up for some widely used dynamic programming algorithms. We consider three types of caching systems for CMPs: D-CMP with a private cache for each core, S-CMP with a single cache shared by all cores, and Multicore, which has private L1 caches and a shared L2 cache. We derive results for three classes of problems: local dependency dynamic programming (LDDP), Gaussian Elimination Paradigm (GEP), and parenthesis problem.For each class of problems, we develop a generic CMP algorithm with an associated tiling sequence. We then tailor this tiling sequence to each caching model and provide a parallel schedule that results in a cache-efficient parallel execution up to the critical path length of the underlying dynamic programming algorithm.We present experimental results on an 8-core Opteron for two sequence alignment problems that are important examples of LDDP. Our experimental results show good speed-ups for simple versions of our algorithms.

...read moreread less

86 citations

Proceedings Article•DOI•

Decoupled sectored caches: conciliating low tag implementation cost and low miss ratio

[...]

A. Seznec

18 Apr 1994

TL;DR: A decoupled sectored cache will allow the same level of performance as a non-sectored cache, but at a significantly lower hardware cost.

...read moreread less

Abstract: Sectored caches have been used for many years in order to reconcile low tag array size and small or medium block size. In a sectored cache, a single address tag is associated with a sector consisting on several cache lines, while validity, dirty and coherency tags are associated with each of the inner cache lines. Usually in a cache, a cache line location is statically linked to one and only one address tag word location. In the decoupled sectored cache introduced in the paper, this monolithic association is broken; the address tag location associated with a cache line location is dynamically chosen at fetch time among several possible locations. The tag volume on a decoupled sectored cache is in the same range as the tag volume in a traditional sectored cache; but the hit ratio on a decoupled sectored cache is very close to the hit ratio on a non-sectored cache. A decoupled sectored cache will allow the same level of performance as a non-sectored cache, but at a significantly lower hardware cost. >

...read moreread less

86 citations

Proceedings Article•DOI•

BEAR: techniques for mitigating bandwidth bloat in gigascale DRAM caches

[...]

Chiachen Chou¹, Aamer Jaleel², Moinuddin K. Qureshi¹•Institutions (2)

Georgia Institute of Technology¹, Nvidia²

13 Jun 2015

TL;DR: Bandwidth Efficient ARchitecture (BEAR) for DRAM caches integrates three components, one each for reducing the bandwidth consumed by miss detection, miss fill, and writeback probes, and reduces the bandwidth consumption of DRAM cache by 32%, which reduces cache hit latency by 24% and increases overall system performance by 10%.

...read moreread less

Abstract: Die stacking memory technology can enable gigascale DRAM caches that can operate at 4x-8x higher bandwidth than commodity DRAM. Such caches can improve system performance by servicing data at a faster rate when the requested data is found in the cache, potentially increasing the memory bandwidth of the system by 4x-8x. Unfortunately, a DRAM cache uses the available memory bandwidth not only for data transfer on cache hits, but also for other secondary operations such as cache miss detection, fill on cache miss, and writeback lookup and content update on dirty evictions from the last-level on-chip cache. Ideally, we want the bandwidth consumed for such secondary operations to be negligible, and have almost all the bandwidth be available for transfer of useful data from the DRAM cache to the processor. We evaluate a 1GB DRAM cache, architected as Alloy Cache, and show that even the most bandwidth-efficient proposal for DRAM cache consumes 3.8x bandwidth compared to an idealized DRAM cache that does not consume any bandwidth for secondary operations. We also show that redesigning the DRAM cache to minimize the bandwidth consumed by secondary operations can potentially improve system performance by 22%. To that end, this paper proposes Bandwidth Efficient ARchitecture (BEAR) for DRAM caches. BEAR integrates three components, one each for reducing the bandwidth consumed by miss detection, miss fill, and writeback probes. BEAR reduces the bandwidth consumption of DRAM cache by 32%, which reduces cache hit latency by 24% and increases overall system performance by 10%. BEAR, with negligible overhead, outperforms an idealized SRAM Tag-Store design that incurs an unacceptable overhead of 64 megabytes, as well as Sector Cache designs that incur an SRAM storage overhead of 6 megabytes.

...read moreread less

86 citations

Patent•

A modular mirrored cache memory battery backup system

[...]

Liong Thomas Singkiat, Ashwath Nagaraj, Krishnakumar Rao

07 Mar 1997

TL;DR: In this article, a battery backup mirrored cache memory module (200) for a cache dynamic random access memory (DRAM) system that senses the Vcc level supplied through the cache controller (310) to the cache memory and switches off the battery backup apparatus (400) switches cache memory array to a backup battery Vcc source (220), and a backup refresh control generator unit (230) that is also powered by the backup battery source.

...read moreread less

Abstract: A battery backup mirrored cache memory module (210) for a cache dynamic random access memory (DRAM (200)) system that senses the Vcc level supplied through the cache controller (310) to the cache memory and, if the cache controller supplied Vcc falls below a preset threshold level, the battery backup apparatus (400) switches (210) the cache memory array to a backup battery Vcc source (220), and a backup refresh control generator unit (230) that is also powered by the backup battery Vcc source (220). The cache DRAM (200), backup battery (220), and backup refresh generator are physically contained in a single module (400) that can be disconnected from the cache controller and host while preserving cache memory contents. The backup system is installed in an operating system for recovery of the cache memory contents and/or resumption of execution of the program that was running when the Vcc power failure occurred.

...read moreread less

86 citations

Proceedings Article•DOI•

Cache replacement with dynamic exclusion

[...]

Scott McFarling

01 Apr 1992

TL;DR: A new technique for reducing direct-mapped cache misses caused by conflicts for a particular cache line is presented, which shows an average reduction in miss rate of 33% for a 32KB instruction cache with 16B lines.

...read moreread less

Abstract: Most recent cache designs use direct-mapped caches to provide the fast access time required by modern high speed CPU's. Unfortunately, direct-mapped caches have higher miss rates than set-associative caches, largely because direct-mapped caches are more sensitive to conflicts between items needed frequently in the same phase of program execution.This paper presents a new technique for reducing direct-mapped cache misses caused by conflicts for a particular cache line. A small finite state machine recognizes the common instruction reference patterns where storing an instruction in the cache actually harms performance. Such instructions are dynamically excluded, that is they are passed directly through the cache without being stored. This reduces misses to the instructions that would have been replaced.The effectiveness of dynamic exclusion is dependent on the severity of cache conflicts and thus on the particular program and cache size of interest. However, across the SPEC benchmarks, simulation results show an average reduction in miss rate of 33% for a 32KB instruction cache with 16B lines. In addition, applying dynamic exclusion to one level of a cache hierarchy can improve the performance of the next level since instructions do not need to be stored on both levels. Finally, dynamic exclusion also improves combined instruction and data cache miss rates.

...read moreread less

86 citations

Collapse

Network Information

Performance

Metrics

10,702

Papers

250,710

Citations

No. of papers in the topic in previous years
Year	Papers
2023	44
2022	117
2021	4
2020	8
2019	7
2018	20

Cache invalidation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics