Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers
read more
Citations
Efficiently prefetching complex address patterns
Best-offset hardware prefetching
Path confidence based lookahead prefetching
A Survey of Recent Prefetching Techniques for Processor Caches
Bingo Spatial Data Prefetcher
References
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
POWER4 system microarchitecture
Related Papers (5)
Frequently Asked Questions (15)
Q2. How many KB of storage does FDP use for its prefetching structures?
In total, FDP uses 3.1 KB for its prefetching structures, and requires logic which can detect if an address is within an existing stream, allocate new streams, add and remove items from a Bloom filter, and calculate the dynamic settings of the prefetcher based on feedback mechanisms.
Q3. What is the advantage of a confirmation-based prefetcher?
Confirmation-based prefetchers have the advantage that once a pattern has been confirmed, many prefetches can be issued along that pattern, far ahead of the program’s actual access stream.
Q4. What is the key idea behind a prefetch sandbox?
The key idea behind a prefetch sandbox is to track prefetch requests generated by a candidate prefetch pattern, without actually issuing those prefetch requests to the memory system.
Q5. What is the aggressive level of prefetching?
Sandbox Prefetching (SBP) represents another class of prefetcher, and combines the ideas of global confirmation with immediate action to aggressively, and safely, perform prefetches.
Q6. Why do the authors add a number of items to the Bloom filter?
Because each candidate prefetcher generates only a single prefetch address, the authors will add a number of items to the Bloom filter equal to the number of L2 accesses in an evaluation period.
Q7. How many sandboxes are used to evaluate a candidate prefetcher?
There is only one sandbox per core, and the candidate prefetchers are evaluated one at a time, in a time multiplexed fashion, with the sandbox being reset in between each evaluation.
Q8. How do the authors test the prefetch accuracy of a stream?
There is a set of candidate prefetchers which are evaluated by simulating their prefetch action by adding prefetch addresses to a sandbox Bloom filter, rather than issuing real prefetches, and by testing subsequent cache access addresses to see if they are part of a strided stream.
Q9. How does the proposed Sandbox Prefetching technique improve performance?
Their results show that using the proposed Sandbox Prefetching technique improves the average performance of 14 memory-intensive benchmarks in the SPEC2006 suite by 47.6% compared to no prefetching, by 18.7% compared to the state-of-the-art Feedback Directed Prefetching, and by 1.4% compared to the Access Map Pattern Matching Prefetcher, which has a considerably larger storage and logic requirement compared to Sandbox Prefetching.
Q10. How does the simulation simulate a DRAM main memory?
The authors evaluate Sandbox Prefetching using the Wind River Simics full system simulator [2], which has been augmented to precisely model a DRAM main memory system by integrating the USIMM DRAM simulator [4].
Q11. How much does SBP improve on the lbm?
Compared to FDP, SBP improves performance across single threaded workloads by an average of 18.7%, with a maximum of 68.8% improvement in the lbm workload.
Q12. What is the common way to calculate the prefetch degree?
The only performancecritical logic is used to generate prefetch addresses based on a reference address, and a set of offsets which have been predetermined to have high evaluation scores, which is not unusual for a prefetching mechanism.
Q13. What is the performance of the singlethreaded workloads where AMPM sees the greatest?
In the singlethreaded workloads where AMPM sees its largest performance improvements over No PF, SBP is able to consistently achieve even higher performance.
Q14. How many L2 accesses are used to evaluate a candidate?
Each candidate is evaluated for a fixed number of L2 accesses, and then the contents of the sandbox are reset, and the next candidate is evaluated.
Q15. What is the difference between immediate and next-line prefetchers?
On the other hand, because immediate prefetchers work on the granularity of individual cache lines, and not streams, a next-line prefetcher would be able to perfectly prefetch the second cache line of these linked list nodes.