Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

doi:10.1145/1394608.1382128

Journal ArticleDOI

Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

Onur Mutlu, +1 more

- Vol. 36, Iss: 3, pp 63-74

Chats0

TLDR

A parallelism-aware batch scheduler that seamlessly incorporates support for system-level thread priorities and can provide different service levels, including purely opportunistic service, to threads with different priorities, and is also simpler to implement than STFM.

Abstract:

In a chip-multiprocessor (CMP) system, the DRAM system isshared among cores. In a shared DRAM system, requests from athread can not only delay requests from other threads by causingbank/bus/row-buffer conflicts but they can also destroy other threads’DRAM-bank-level parallelism. Requests whose latencies would otherwisehave been overlapped could effectively become serialized. As aresult both fairness and system throughput degrade, and some threadscan starve for long time periods.This paper proposes a fundamentally new approach to designinga shared DRAM controller that provides quality of service to threads,while also improving system throughput. Our parallelism-aware batchscheduler (PAR-BS) design is based on two key ideas. First, PARBSprocesses DRAM requests in batches to provide fairness and toavoid starvation of requests. Second, to optimize system throughput,PAR-BS employs a parallelism-aware DRAM scheduling policythat aims to process requests from a thread in parallel in the DRAMbanks, thereby reducing the memory-related stall-time experienced bythe thread. PAR-BS seamlessly incorporates support for system-levelthread priorities and can provide different service levels, includingpurely opportunistic service, to threads with different priorities.We evaluate the design trade-offs involved in PAR-BS and compareit to four previously proposed DRAM scheduler designs on 4-, 8-, and16-core systems. Our evaluations show that, averaged over 100 4-coreworkloads, PAR-BS improves fairness by 1.11X and system throughputby 8.3% compared to the best previous scheduling technique, Stall-Time Fair Memory (STFM) scheduling. Based on simple request prioritizationrules, PAR-BS is also simpler to implement than STFM.

Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

Citations

Architecture de contrôleur mémoire configurable et continuité de service pour l'accès à la mémoire externe dans les systèmes multiprocesseurs intégrés à base de réseaux sur puce

Dynamic and discrete cache insertion policies for managing shared last level caches in large multicores

Sectored DRAM: An Energy-Efficient High-Throughput and Practical Fine-Grained DRAM Architecture

PARBLO: Page-Allocation-Based DRAM Row Buffer Locality Optimization

SOUP-N-SALAD: Allocation-Oblivious Access Latency Reduction with Asymmetric DRAM Microarchitectures

References

Pin: building customized program analysis tools with dynamic instrumentation

Various optimizers for single‐stage production

Memory access scheduling

An efficient algorithm for exploiting multiple arithmetic units

Symbiotic jobscheduling for a simultaneous multithreaded processor

Related Papers (5)

Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Memory access scheduling

ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers

Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior

Self-Optimizing Memory Controllers: A Reinforcement Learning Approach