Modeling performance variation due to cache sharing
read more
Citations
The application slowdown model: quantifying and controlling the impact of inter-application interference at shared caches and main memory
A detailed GPU cache model based on reuse distance theory
Run-to-run variability on Xeon Phi based cray XC systems
Optimal Cache Partition-Sharing
Analyzing the impact of CPU pinning and partial CPU loads on performance and energy efficiency
References
Automatically characterizing large scale program behavior
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches
The M5 Simulator: Modeling Networked Systems
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Related Papers (5)
Frequently Asked Questions (13)
Q2. What are the future works in "Modeling performance variation due to cache sharing" ?
In future work, the authors plan on extending their analytical method to include such bandwidth-sharing effects. Due to its speed, simple input data, and accuracy, this method can be used to build efficient tools for software developers or system designers, and is fast enough to be leveraged in scheduling and operating system designs.
Q3. What is the purpose of the Cache Pirating model?
Cache Pirating uses hardware performance monitoring facilities to measure target application properties at runtime, such as cache misses, hits, and execution cycles.
Q4. What is the reason why an application receives less cache space?
when an application receives less cache space, its bandwidth usage increases since it misses more in L3 and that data needs to be fetch from memory again.
Q5. What are the four interference benchmarks that the authors selected?
For their evaluation, the authors selected four interference benchmarks that represent four different phase behaviors: Single-Phase (omnetpp), Dual-Phase (bwaves), Few-Phase (astar/lakes) and Multi-Phase (mcf).
Q6. How many steps did the authors use to measure the cache size?
The authors measured cache-size dependent data using cache pirating in 16 steps of 768 kB (the equivalent of one way) up to 12MB, and used a sample window size of 100 million instructions.
Q7. How many cores can the authors use to model the system throughput?
Since all of the techniques the authors integrate in this method scale beyond two cores, the authors demonstrate that their method can scale as well by estimating the system throughput when co-running a mix of four applications on their four core reference system.
Q8. How do the authors extend the cache pirate method?
In this paper, the authors extend the cache pirate method to produce time-dependent data by dividing the execution into sample windows by sampling the performance counters at regular intervals.
Q9. What are the three different categories of techniques to explore and understand multicore performance?
Techniques to explore and understand multicore performance can generally be divided into three different categories; full system simulation, partial simulation/modeling,and higher level modeling.
Q10. What is the key difficulty in modeling time-dependent cache sharing?
The key difficulty in modeling time-dependent cache sharing is to determine which parts of the application (i.e., sample windows or phases) will co-execute.
Q11. What is the average error of the windows-based method?
On average, the windows-based method has an error of 0.39% and a maximum error of 2.2% (bzip2 + omnetpp), while the phasebased method has an average error of 0.41% and a maximum of 1.8% (omnetpp + bwaves).
Q12. How do the authors estimate the performance of a mixed workload?
In order to accurately estimate the performance of a mixed workload, the authors need to run it multiple times and estimate its performance distribution.
Q13. How many times did the authors run each experiment?
In order to get an accurate representation of the performance, the authors ran each experiment (target-interference pair) 100 times with random start offsets for the target.