Proceedings ArticleDOI
Reducing traffic generated by conflict misses in caches
Pepijn de Langen,Ben Juurlink +1 more
- pp 235-239
TLDR
Experimental results show that the BCC and SCC cache reduce the amount of traffic significantly in many cases and overall they incur the same number of cache misses as the direct-mapped cache.Abstract:
Off-chip memory accesses are a major source of power consumption in embedded processors. In order to reduce the amount of traffic between the processor and the off-chip memory as well as to hide the memory latency, nearly all embedded processors have a cache on the same die as the processor core. Because small caches dissipate less power and are cheaper than large caches, a small cache is preferable to a large cache. Furthermore, because set-associative caches consume more power than direct-mapped caches, a direct-mapped cache is preferable to a set-associative one. Small, direct-mapped caches generally incur many conflict misses, however. In this paper we propose and evaluate a structure called the Conflict Detection Table (CDT). This table can be used to determine if a memory access is expected to hit the cache. If a hit is expected and a miss occurs, then a conflict is detected and appropriate action can be taken. In addition, we propose two cache structures that employ this technique: the Bypass in Case of Conflict (BCC) cache and the Sub-block in Case of Conflict (SCC) cache. The BCC cache bypasses the cache when a conflict is detected, whereas the SCC cache fetches a sub-block of the missing cache block in such a case. Experimental results using several embedded workloads show that the BCC and SCC cache reduce the amount of traffic significantly in many cases. Furthermore, overall they incur the same number of cache misses as the direct-mapped cache. This shows that the BCC and SCC cache reduce the amount of power consumed with a negligible reduction in performance.read more
Citations
More filters
Patent
Reducing layout conflicts among code units with caller-callee relationships
TL;DR: A code placement technique that organizes code units to at least reduce layout conflicts among caller/callee code units is presented in this paper, where the code preparation environment determines those code units of a code representation that have overlapping memory mappings with their counterpart code units.
Proceedings ArticleDOI
Dynamic techniques to reduce memory traffic in embedded systems
Ben Juurlink,Pepijn de Langen +1 more
TL;DR: This paper measures how much traffic is generated by small, direct-mapped caches and what the minimal amount of traffic is and yields an upper bound on the amount oftraffic that can be saved by utilizing the on-chip memory more effectively.
Proceedings ArticleDOI
Cache Miss-Aware Dynamic Stack Allocation
TL;DR: Experimental results show that dynamic stack allocation significantly reduces cache misses from 4% to 42% in various benchmarks with relatively small power consumption and no extra delay.
Energy reduction techniques for caches and multiprocessors
TL;DR: This dissertation studies several techniques that aim at reducing energy consumption in processors by decreasing the amount of data transferred between a processor and external memory and aims at improving or at least maintaining performance.
Journal ArticleDOI
Data Cache System based on the Selective Bank Algorithm for Embedded System
Bo Sung Jung,Jung Hoon Lee +1 more
TL;DR: This paper presents a high performance and low power cache structure with a bank selection mechanism that enhances exploitation of spatial and temporal locality and reduces conflict misses and cache pollution at the same time.
References
More filters
Book
Computer Architecture: A Quantitative Approach
TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.
Proceedings ArticleDOI
MiBench: A free, commercially representative embedded benchmark suite
Matthew R. Guthaus,Jeff Ringenberg,Daniel J. Ernst,Todd Austin,Trevor Mudge,Richard B. Brown +5 more
TL;DR: A new version of SimpleScalar that has been adapted to the ARM instruction set is used to characterize the performance of the benchmarks using configurations similar to current and next generation embedded processors.
Proceedings ArticleDOI
MediaBench: a tool for evaluating and synthesizing multimedia and communications systems
TL;DR: The MediaBench benchmark suite as discussed by the authors is a benchmark suite that has been designed to fill the gap between the compiler community and embedded applications developers, which has been constructed through a three-step process: intuition and market driven initial selection, experimental measurement, and integration with system synthesis algorithms to establish usefulness.
Proceedings ArticleDOI
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
TL;DR: In this article, a hardware technique to improve the performance of caches is presented, where a small fully-associative cache between a cache and its refill path is used to place prefetched data and not in the cache.