ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers
Citations
520 citations
Cites methods from "ATLAS: A scalable and high-performa..."
...13 For these results we only evaluated the 32 workloads with 50% memory-intensive benchmarks, as this scenario of balanced memory-intensive and non-memory-intensive benchmarks is likely to be common in future systems [22]....
[...]
385 citations
Additional excerpts
...Maximum Slowdown [12, 29, 30] Reduction 6% 12% 23%...
[...]
...NumberofCores 2 4 8 NumberofWorkloads 138 50 40 WeightedSpeedup[14]Improvement 15% 20% 27% InstructionThroughputImprovement 14% 15% 25% HarmonicSpeedup[35]Improvement 13% 16% 29% MaximumSlowdown[12,29,30]Reduction 6% 12% 23% MemoryBandwidth/Instruction[49]Reduction 29% 27% 28% MemoryEnergy/InstructionReduction 19% 17% 17% Table 7:Effect ofRowClone onmulti-core performance, fairness,bandwidth,andenergy To provide more insightintothebenefitsofRowClone on multi-core systems, we classify our copy/initializationintensive benchmarks into two categories: 1) Moderately copy/initialization-intensive (compile,mcached,andmysql)and highlycopy/initialization-intensive(bootup,forkbench,andshell)....
[...]
375 citations
Cites background or methods from "ATLAS: A scalable and high-performa..."
...If there are multiple memory controllers, this information is sent to a centralized meta-controller at the end of a quantum, similarly to what is done in ATLAS [5]....
[...]
...Grouping of threads into clusters happens in a synchronized manner across all memory controllers to better exploit bank-level parallelism [5, 14]....
[...]
...Compared to ATLAS [5], the best previous algorithm in terms of system throughput, TCM improves system throughput and reduces maximum slowdown by 4....
[...]
..., by forming a priority order based on a metric that corresponds to memory intensity, as done in [5])....
[...]
...Finally, as previous work has shown, it is desirable that scheduling decisions are made in a synchronized manner across all banks [5, 14, 12], so that concurrent requests of each thread are serviced in parallel, without being serialized due to interference from other threads....
[...]
358 citations
338 citations
References
4,019 citations
3,915 citations
3,122 citations
"ATLAS: A scalable and high-performa..." refers background or methods in this paper
...propose deployable modifications to IP to enhance its evolvability [37]; but does not admit the expressiveness afforded by XIA....
[...]
...RPT over CCN: RPT also can be integrated with a broad class of content-aware networks, including CCN [37] and SmartRE [15]....
[...]
...CCN is designed to operate on top of unreliable packet delivery service, and thus Interest and Data packets may be lost [37]....
[...]
...We use the term contentaware networks to refer to the variety of architectural proposals [14, 32, 37, 42] and devices [2, 4, 10, 35] that cache data and remove duplicates to alleviate congestion (i....
[...]
...In the CCN-over-jumbo-UDP protocol case [37], a client generates three Interest packets (325 bytes) and receives five 1500-byte packets (6873 bytes) to fetch a Web page [37]....
[...]
2,639 citations
2,608 citations