Tuning Caches to Applications for Low-Energy Embedded Systems
read more
Citations
Automatic application-specific microarchitecture reconfiguration
A Run-time Reconfigurable Cache Architecture.
Adaptive Cache Infrastructure: Supporting dynamic Program Changes following dynamic Program Behavior
References
MediaBench: a tool for evaluating and synthesizing multimedia and communications systems
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
CACTI: an enhanced cache access and cycle time model
Selective cache ways: on-demand cache resource allocation
Evaluating future microprocessors : The SimpleScalar tool set
Related Papers (5)
Frequently Asked Questions (14)
Q2. What is the parameter to search first?
When developing a good heuristic, the parameter (cache size, line size, associativity, or way prediction) with the largest impact in performance and energy would likely be the best parameter to search first.
Q3. What is the basic intuition behind the interlaced heuristic?
The basic intuition behind their heuristic is that interlacing the exploration allows for better modeling and tuning of the interdependencies between the different levels of cache hierarchy.
Q4. How long did it take to generate the data for the nine benchmarks?
It took over one month of continual simulation time on an UltraSparc compute server to generate the data for their nine benchmarks.
Q5. Why are directmapped caches popular in embedded microprocessor architecture?
Directmapped caches are popular in embedded microprocessor architecture due to their simplicity and good hit rates for many applications.
Q6. What is the way to simulate powerstone9 and mediabench18?
The authors simulated numerous Powerstone9 and MediaBench18 benchmarks using SimpleScalar19, a cycle-accurate simulator that includes a MIPS-like microprocessor model, to obtain the number of cache accesses and cache misses for each benchmark and configuration explored.
Q7. What is the main reason why the FV cache was proposed?
The FV cache was proposed based on the observation that a small number of distinct frequently occurring data values often occupy a large portion of program memory data spaces and therefore account for a large portion of memory accesses27.
Q8. What is the way to store FVs?
Instead of synthesizing the FVs on-chip, a register file may be used to store FVs so that they can be rewritten on each activation of a different program.
Q9. What is the target architecture for the two-level cache tuning heuristic?
The target architecture for their two-level cache tuning heuristic contains separate level one instruction and data caches and separate level two instruction and data caches.
Q10. How did the authors obtain the power consumed by their cache tuner?
the authors obtained the power consumed by their cache tuner, through simulation of a synthesized version of their cache tuner written in VHDL.
Q11. How does Ghosh et al.15 use an analytical model to efficiently explore cache?
Ghosh et al.15 uses an analytical model to efficiently explore cache size and associativity and directly computes a cache configuration to meet the designers’ performance constraints.
Q12. What is the impact of varying the cache size on energy and miss rate?
The authors observed that varying the cache size had the largest average impact on energy and miss rate – changing the cache size can impact the energy by a factor of two or more.
Q13. How many configurations do the authors use to search?
Their search heuristic is quite effective: it searches on average only 5.8 configurations, compared to 27 configurations for an exhaustive approach.
Q14. How did the authors extend the heuristic for a two level cache?
the authors extended the heuristic described in Section 3.3 for a twolevel cache by tuning the level-one cache while holding the level-two cache at the smallest size, then tuning the level-two cache using the same heuristic.