A case for NUMA-aware contention management on multicore systems
Citations
464 citations
Cites background from "A case for NUMA-aware contention ma..."
...In multi-socket servers, one can isolate workloads across NUMA channels [9, 73], but this approach constrains DRAM capacity allocation and address interleaving....
[...]
314 citations
Cites methods from "A case for NUMA-aware contention ma..."
...This could certainly be done: for example, Zhang et al. [40] used memory reference counts to approximate memory bandwidth consumption on SMP machines; West et al. [38] used cache miss and reference counts to estimate cache occupancy of competing threads on multicore machines; VM3 [20] pro.led applications cache-misses per instruction to estimate effective cache sizes in a consolidated virtual machine environment; Cuanta [17] introduced a cache loader micro-benchmark to pro.le application performance under varying cache-usage pressure; and Blagodurov [7] and Zhuravlev [43] applied heuristics based on cache miss rates to guide contention-aware scheduling....
[...]
...[38] used cache miss and reference counts to estimate cache occupancy of competing threads on multicore machines; VM3 [20] profiled applications’ cache-misses per instruction to estimate effective cache sizes in a consolidated virtual machine environment; Cuanta [17] introduced a cache loader micro-benchmark to profile application performance under varying cache-usage pressure; and Blagodurov [7] and Zhuravlev [43] applied heuristics based on cache miss rates to guide contention-aware scheduling....
[...]
289 citations
Cites background from "A case for NUMA-aware contention ma..."
...The most comprehensive work to date on NUMA-aware contention management is the DINO scheduler [5], which spreads memory intensive threads across memory domains and accordingly migrates the corresponding memory pages....
[...]
...Some of these works were designed for UMA systems [16, 21, 33] and are inefficient on NUMA systems because they fail to address or even accentuate issues such as remote access latencies and contention on memory controllers and on the interconnect links [5]....
[...]
217 citations
Cites methods from "A case for NUMA-aware contention ma..."
...While bulk data movement is a key operation in many applications and operating systems, contemporary systems perform this movement inefficiently, by transferring data from DRAM to the processor, and then back to DRAM, across a narrow off-chip channel....
[...]
163 citations
Cites background from "A case for NUMA-aware contention ma..."
...In a multi-core domain, existing work tries to minimize the memory access latency by thread-to-core mapping [21, 38, 51], or memory allocation policy [22, 27, 34]....
[...]
References
4,019 citations
"A case for NUMA-aware contention ma..." refers methods in this paper
...In order to rapidly evaluate various memory migration strategies, we designed a simulator based on a widely used binary instrumentation tool for x86 binaries called Pin [15]....
[...]
900 citations
"A case for NUMA-aware contention ma..." refers methods in this paper
...The SGI Origin 2000 system [4] implemented the following hardware-supported [13] mechanism for colocation of computation and memory....
[...]
532 citations
289 citations
"A case for NUMA-aware contention ma..." refers background or methods in this paper
...The DINO algorithm introduced in our work complements [19] as it is designed to mitigate contention between applications....
[...]
...Many research efforts addressed efficient co-location of the computation and related memory on the same node [14, 3, 12, 19, 1, 4]....
[...]
...In [19] the authors group threads of the same application that are likely to share data onto neighbouring cores to minimize the costs of data sharing between them....
[...]
...However, when this assumption does not hold, DINO can be extended to predict when co-scheduling threads on the same domain is more beneficial than separating them, using techniques described in [9] or [19]....
[...]
274 citations
"A case for NUMA-aware contention ma..." refers background in this paper
...in [14] introduced AMPS, an operating system scheduler for asymmetric multicore systems that supports NUMA architectures....
[...]
...Many research efforts addressed efficient co-location of the computation and related memory on the same node [14, 3, 12, 19, 1, 4]....
[...]