Program locality analysis using reuse distance
read more
Citations
Survey of scheduling techniques for addressing shared resources in multicore processors
HOTL: a higher order theory of locality
Is reuse distance applicable to data locality analysis on chip multiprocessors
Evaluating iterative optimization across 1000 datasets
PARDA: A Fast Parallel Reuse Distance Analysis Algorithm
References
A data locality optimizing algorithm
High-Performance Compilers for Parallel Computing
Evaluation techniques for storage hierarchies
Self-adjusting binary search trees
The space complexity of approximating the frequency moments
Related Papers (5)
Evaluation techniques for storage hierarchies
StatCache: a probabilistic approach to efficient and accurate data locality analysis
Frequently Asked Questions (12)
Q2. Why is locality considered a fundamental concept in computing?
Locality is considered a fundamental concept in computing because to understand a computation the authors must understand its use of data.
Q3. How many accesses is there between successive tree compressions?
Since one treenode is added for each access, the number of accesses between successive tree compressions is at least M/2 accesses.
Q4. How did they improve the efficiency of the tree-based methods?
Almasi et al. [2002] showed that by recording the empty regions instead of the last accesses in the trace, they could improve the efficiency of vector and tree based methods by 20% to 40%.
Q5. What is the simplest way to model a buffer as a stack?
Mattson et al. [1970] showed that buffer memory could be modeled as a stack, if the method of buffer management satisfied the inclusion property in that a smaller buffer would hold a subset of data held by a larger buffer.
Q6. What is the cost of a reuse-distance analysis?
To measure only the cost of reuse-distance analysis, the hashing step is bypassed by pre-computing the last access time in all analyzers (except for KHW, which does not need the access time).
Q7. What is the way to predict a reuse distance?
Using more than two training inputs may produce a better prediction, because more data may reduce the noise from imprecise reuse distance measurement and histogram construction.
Q8. How can an analyzer determine the approximate reuse distance?
By properly adjusting these time ranges, an analyzer can examine the trace and compute approximate reuse distance in effectively constant time regardless of the length of the trace.
Q9. What is the method for calculating the average distance of a group of reuse distances?
For a group of reuse distances, the authors calculate the ratio of their average distance in two executions, di/d̂ i, and pick fi to be the pattern function that is closest to di/d̂ i.
Q10. Why is the consistency across inputs due to consistency in programmers’ coding style?
The consistency across inputs might be due to consistency in programmers’ coding style, for example, the distribution of function sizes.
Q11. How many accesses are transferred to the approximate trace?
To satisfy the first requirement, the authors transfer the last accesses of c data elements from the precise trace to the approximate trace when the size of the precise trace exceeds 2c.
Q12. How is the prediction accuracy shown in Table VI?
Table VI shows the prediction accuracy when the size of the largest training run is reduced to 1.6%, 3%, and 13% of the size used previously in Table II.