Using dead blocks as a virtual victim cache
read more
Citations
Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design
SHiP: signature-based hit predictor for high performance caching
Sampling Dead Block Prediction for Last-Level Caches
CAMEO: A Two-Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache
Bypass and insertion algorithms for exclusive last-level caches
References
The SimpleScalar tool set, version 2.0
A study of replacement algorithms for a virtual-storage computer
Automatically characterizing large scale program behavior
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches
Related Papers (5)
Frequently Asked Questions (20)
Q2. What are the future works mentioned in the paper "Using dead blocks as a virtual victim cache" ?
The authors see several future directions for this work. The VVC allows reducing the associativity and size of the cache while maintaining performance, but the potential for reducing the number of sets has not been explored. A more discriminating technique could further improve performance by filtering out cold data.
Q3. What is the reason why dead blocks lead to poor cache efficiency?
Dead blocks lead to poor cache efficiency [15, 4] because after the last access to a block, it resides a long time in the cache before it is evicted.
Q4. How much is the overhead of the predictor?
The overhead of the predictor and VVC metadata is 76KB which is 3.4% of the total 2MB cache space (including both the data and tag arrays).
Q5. What is the technique used to replace dead blocks with other blocks?
Their technique also replaces predicted dead blocks with other blocks, but the other blocks are victims from other sets, effectively extending associativity in the same way a victim cache does.
Q6. How does the VVC achieve an average MPKI?
At a capacity of 1.7MB representing an associativity of 13, the VVC achieves an average MPKI of 9.9, just above the MPKI of the 2MB baseline cache at 9.7.
Q7. What is the use of dead block predictor?
This dead block predictor is used to prefetch data into predicted dead blocks in the L1 data cache, enabling lookahead prefetching and eliminating the necessity of prefetch buffers.
Q8. How can a virtual victim cache be established?
By coupling small numbers of sets, and moving blocks overflowing from one set into the predicted dead blocks (which the authors call receiver blocks) of a ”partner set,” a virtual victim cache can be established with little additional overhead.
Q9. What is the definition of dead block predictor?
The reference trace predictor encodes the path of memory access instructions leading to a memory reference as the truncated sum of the instructions’ addresses.
Q10. What is the way to view the idea?
Another way to view the idea is as an enhanced combination of block insertion (i.e. placement) policy, search strategy, and replacement policy.
Q11. What is the average speed of the benchmarks?
The authors choose a memory-intensive subset of the benchmarks based on the following criteria: a benchmark is used if it (1) does not cause an abnormal termination in the baseline sim-outorder simulator for the chosen simpoint, and (2) if increasing the size of the L2 cache from 1MB to 2MB results in at least a 5% speedup.
Q12. What are the benefits of victim caches?
victim caches do not reduce capacity misses appreciably, nor conflict misses where the reference patterns do not produce a new reference to the victim quickly, but they provide excellent miss reduction for a small additional amount of state and complexity.
Q13. What is the way to reduce the number of accesses to the tag array?
Reducing the number of accesses to the tag array through a more intelligent search strategy could improve the power behavior of the cache.
Q14. How many false positives are there for the Lai et al. predictor?
At a 128KB hardware budget, the new predictor has a false positive misprediction rate of 3.4% compared with 4.2% for the Lai et al. predictor.
Q15. What is the potential for reducing the number of sets?
The VVC allows reducing the associativity and size of the cache while maintaining performance, but the potential for reducing the number of sets has not been explored.
Q16. How does the VVC improve cache efficiency?
This strategy reduces the number of cache misses per thousand instructions (MPKI) by 11.7% on average with a 2MB L2 cache, yields an average speedup of 12.5% over the baseline and improves cache efficiency by 15% on average.
Q17. What happens if a block is evicted from a set?
The block will then be refilled in the original set from the adjacent set, and the block in the adjacent set will be marked as invalid.
Q18. What is the penalty for the additional tag match and fill?
A small penalty for the additional tag match and fill will accrue to this access, but this access is considered a hit in the L2 for purposes of counting hits and misses (analogously, an access to a virtually-addressed cache following a TLB miss may still be considered a hit, albeit with an extra delay).
Q19. What is the difference between a memory-level parallelism aware cache replacement policy and ?
A memory-level parallelism aware cache replacement policy relies on the fact that isolated misses are more costly on performance than parallel misses [19].
Q20. Why do the authors choose a 64KB victim cache?
The authors choose a 64KB victim cache because it requires approximately the same amount of SRAM, including the tag array, as the extra structures of the VVC.