Journal ArticleDOI
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
Onur Mutlu,Thomas Moscibroda +1 more
- Vol. 36, Iss: 3, pp 63-74
Reads0
Chats0
TLDR
A parallelism-aware batch scheduler that seamlessly incorporates support for system-level thread priorities and can provide different service levels, including purely opportunistic service, to threads with different priorities, and is also simpler to implement than STFM.Abstract:
In a chip-multiprocessor (CMP) system, the DRAM system isshared among cores. In a shared DRAM system, requests from athread can not only delay requests from other threads by causingbank/bus/row-buffer conflicts but they can also destroy other threads’DRAM-bank-level parallelism. Requests whose latencies would otherwisehave been overlapped could effectively become serialized. As aresult both fairness and system throughput degrade, and some threadscan starve for long time periods.This paper proposes a fundamentally new approach to designinga shared DRAM controller that provides quality of service to threads,while also improving system throughput. Our parallelism-aware batchscheduler (PAR-BS) design is based on two key ideas. First, PARBSprocesses DRAM requests in batches to provide fairness and toavoid starvation of requests. Second, to optimize system throughput,PAR-BS employs a parallelism-aware DRAM scheduling policythat aims to process requests from a thread in parallel in the DRAMbanks, thereby reducing the memory-related stall-time experienced bythe thread. PAR-BS seamlessly incorporates support for system-levelthread priorities and can provide different service levels, includingpurely opportunistic service, to threads with different priorities.We evaluate the design trade-offs involved in PAR-BS and compareit to four previously proposed DRAM scheduler designs on 4-, 8-, and16-core systems. Our evaluations show that, averaged over 100 4-coreworkloads, PAR-BS improves fairness by 1.11X and system throughputby 8.3% compared to the best previous scheduling technique, Stall-Time Fair Memory (STFM) scheduling. Based on simple request prioritizationrules, PAR-BS is also simpler to implement than STFM.read more
Citations
More filters
Journal ArticleDOI
Dynamic Resource Partitioning for Heterogeneous Multi-Core-Based Cloud Computing in Smart Cities
TL;DR: A dynamic resource partitioning (DRP) method for single-ISA heterogeneous multi-cores is proposed, which partitions the shared resources according to both threads' requirements for the shared Resources and the performance of their running cores.
Proceedings ArticleDOI
A Flexible Framework for Throttling-Enabled Multicore Management (TEMM)
TL;DR: A flexible framework for Throttling-Enabled Multicore Management (TEMM) that efficiently finds a high-quality hardware execution throttling configuration for a user-specified resource management objective is proposed.
Patent
Memory-controller-parallelism-aware scheduling for multiple memory controllers
TL;DR: In this paper, the thread priority information is based on a maximum of a plurality of local memory bandwidth usage indicators for each thread of the plurality of threads, each of which corresponds to a respective memory controller.
Patent
Scheduling for multiple memory controllers
TL;DR: In this article, a virtual-time-based quality-of-service (QoS) scheduling technique for multi-processor systems is described. But the virtual finish time is based on a share of system memory bandwidth associated with the memory request.
Journal ArticleDOI
BPM/BPM+: Software-based dynamic memory partitioning mechanisms for mitigating DRAM bank-/channel-level interferences in multicore systems
TL;DR: A dynamic mechanism upon BPM/BPM+ is proposed that assigns appropriate bank/channel resources based on application memory/bandwidth demands monitored through PMU and a low-overhead OS page table scanning process to achieve benefits in the presence of diverse application memory needs.
References
More filters
Journal ArticleDOI
Pin: building customized program analysis tools with dynamic instrumentation
Chi-Keung Luk,Robert Cohn,Robert Muth,Harish Patil,Artur Klauser,Geoff Lowney,Steven Wallace,Vijay Janapa Reddi,Kim Hazelwood +8 more
TL;DR: The goals are to provide easy-to-use, portable, transparent, and efficient instrumentation, and to illustrate Pin's versatility, two Pintools in daily use to analyze production software are described.
Proceedings ArticleDOI
Memory access scheduling
TL;DR: This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure.
Book
An efficient algorithm for exploiting multiple arithmetic units
TL;DR: In this article, the authors describe the methods employed in the floating-point area of the System/360 Model 91 to exploit the existence of multiple execution units and register tagging schemes.
Journal ArticleDOI
Symbiotic jobscheduling for a simultaneous multithreaded processor
Allan Snavely,Dean M. Tullsen +1 more
TL;DR: It is demonstrated that performance on a hardware multithreaded processor is sensitive to the set of jobs that are coscheduled by the operating system jobscheduler, and that a small sample of the possible schedules is sufficient to identify a good schedule quickly.