Reducing traffic generated by conflict misses in caches

doi:10.1145/977091.977123

Proceedings ArticleDOI

Reducing traffic generated by conflict misses in caches

- pp 235-239

TLDR

Experimental results show that the BCC and SCC cache reduce the amount of traffic significantly in many cases and overall they incur the same number of cache misses as the direct-mapped cache.

Abstract:

Off-chip memory accesses are a major source of power consumption in embedded processors. In order to reduce the amount of traffic between the processor and the off-chip memory as well as to hide the memory latency, nearly all embedded processors have a cache on the same die as the processor core. Because small caches dissipate less power and are cheaper than large caches, a small cache is preferable to a large cache. Furthermore, because set-associative caches consume more power than direct-mapped caches, a direct-mapped cache is preferable to a set-associative one. Small, direct-mapped caches generally incur many conflict misses, however. In this paper we propose and evaluate a structure called the Conflict Detection Table (CDT). This table can be used to determine if a memory access is expected to hit the cache. If a hit is expected and a miss occurs, then a conflict is detected and appropriate action can be taken. In addition, we propose two cache structures that employ this technique: the Bypass in Case of Conflict (BCC) cache and the Sub-block in Case of Conflict (SCC) cache. The BCC cache bypasses the cache when a conflict is detected, whereas the SCC cache fetches a sub-block of the missing cache block in such a case. Experimental results using several embedded workloads show that the BCC and SCC cache reduce the amount of traffic significantly in many cases. Furthermore, overall they incur the same number of cache misses as the direct-mapped cache. This shows that the BCC and SCC cache reduce the amount of power consumed with a negligible reduction in performance.

Reducing traffic generated by conflict misses in caches

Citations

Reducing layout conflicts among code units with caller-callee relationships

Dynamic techniques to reduce memory traffic in embedded systems

Cache Miss-Aware Dynamic Stack Allocation

Energy reduction techniques for caches and multiprocessors

Data Cache System based on the Selective Bank Algorithm for Embedded System

References

Computer Architecture: A Quantitative Approach

MiBench: A free, commercially representative embedded benchmark suite

MediaBench: a tool for evaluating and synthesizing multimedia and communications systems

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

Computer architecture (2nd ed.): a quantitative approach

Related Papers (5)

Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches

An efficient direct mapped instruction cache for application-specific embedded systems

Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders

Revisiting level-0 caches in embedded processors

Reducing cache misses through programmable decoders