Co-scheduling Amdahl applications on cache-partitioned systems
read more
Citations
Co-scheduling HPC workloads on cache-partitioned CMP platforms:
Co-Scheduling HPC Workloads on Cache-Partitioned CMP Platforms
Co-scheduling for large-scale applications : memory and resilience
An Adaptive Self-Scheduling Loop Scheduler
An Analytical Bound for Choosing Trivial Strategies in Co-scheduling.
References
Validity of the single processor approach to achieving large scale computing capabilities
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches
The NAS parallel benchmarks—summary and preliminary results
Evaluating STT-RAM as an energy-efficient main memory alternative
Related Papers (5)
Energy-efficient Real-time Scheduling on Multicores: A Novel Approach to Model Cache Contention
Frequently Asked Questions (12)
Q2. What future works have the authors mentioned in the paper "Co-scheduling amdahl applications on cache-partitioned systems" ?
Future work will be devoted to gain access to, and conduct real experiments on, a cache-partitioned system with a high core count: this would allow us to further validate the accuracy of the model and to confirm the impact of their promising results. On the theoretical side, the authors plan to focus on the problem with integer numbers of processors and they hope to derive interesting results that could help design even more efficient heuristics.
Q3. How do the authors simplify the design of the heuristics?
The authors simplify the design of the heuristics by temporarily allocating processors as if the applications were perfectly parallel, and then concentrating on strategies that partition the cache efficiently among some applications (and give no cache fraction to remaining ones).
Q4. What is the cache configuration used for the simulations?
For the simulations, the authors use a cache configuration representing an Intel Xeon CPU E5-2690, with a 40MB last level cache per processor of 8 cores.
Q5. What is the heuristic for a naive approach?
Extensive simulation results demonstrate that the use of dominant partitions always leads to better results than more naive approaches, as soon as there is a small sequential fraction of work in application speedup profiles.
Q6. What is the main difficulty of co-scheduling?
The main difficulty of co-scheduling is to decide which applications to execute concurrently, and how many cores to assign to each of them.
Q7. What is the effect of the ratio processors/applications on performance?
The authors show that the ratio processors/applications has a significant impact on performance: when many processors are available for a few applications, it is less crucial to use efficient cache-partitioning and all applications can share the cache, hence Fair obtains good results, close to DomS-MinRatio.
Q8. what is the power law of cache misses?
the power law states that if m0 is the miss rate of a workload for a baseline cache size C0, the miss rate m for a new cache size C can be expressed as m = m0 ( C0 C )α where α is the sensitivity factor from the Power Law of Cache Misses [HSPE08, RKB+09, KSS12] and typically ranges between 0.3 and 0.7 with an average at 0.5.
Q9. How is the last level cache (LLC) latency calculated?
According to the literature [KKSM13, MHSN15, PB14], the last level cache (LLC) latency is on average four to ten times better than the DDR latency, and the authors enforce a ratio of 5.88 in the simulations.
Q10. What is the solution to CSCPP-Ext?
Lemma 3. Given a set of applications T1, . . . , Tn and a partition IC , IC , the optimal solution to CSCPP-Ext ( IC , IC ) isxi = (wifidi) 1/(α+1)∑ j∈IC (wjfjdj) 1/(α+1) if i ∈ IC ,xi = 0 otherwise.
Q11. How many processors are used in the AllProcCache heuristic?
Results are normalized with the makespan of AllProcCache, which is the execution without any co-scheduling: in the AllProcCache heuristic, applications are executed sequentially, each using all processors and all the cache.
Q12. What is the heuristic for a large number of applications?
With more applications, the authors obtain the same ranking of heuristics, except that Fair is always the worst heuristic: since there are less processors on average per application, a good co-scheduling policy is necessary (see [ABD+17] for detailed results).