scispace - formally typeset
Journal ArticleDOI

Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

Onur Mutlu, +1 more
- Vol. 36, Iss: 3, pp 63-74
Reads0
Chats0
TLDR
A parallelism-aware batch scheduler that seamlessly incorporates support for system-level thread priorities and can provide different service levels, including purely opportunistic service, to threads with different priorities, and is also simpler to implement than STFM.
Abstract
In a chip-multiprocessor (CMP) system, the DRAM system isshared among cores. In a shared DRAM system, requests from athread can not only delay requests from other threads by causingbank/bus/row-buffer conflicts but they can also destroy other threads’DRAM-bank-level parallelism. Requests whose latencies would otherwisehave been overlapped could effectively become serialized. As aresult both fairness and system throughput degrade, and some threadscan starve for long time periods.This paper proposes a fundamentally new approach to designinga shared DRAM controller that provides quality of service to threads,while also improving system throughput. Our parallelism-aware batchscheduler (PAR-BS) design is based on two key ideas. First, PARBSprocesses DRAM requests in batches to provide fairness and toavoid starvation of requests. Second, to optimize system throughput,PAR-BS employs a parallelism-aware DRAM scheduling policythat aims to process requests from a thread in parallel in the DRAMbanks, thereby reducing the memory-related stall-time experienced bythe thread. PAR-BS seamlessly incorporates support for system-levelthread priorities and can provide different service levels, includingpurely opportunistic service, to threads with different priorities.We evaluate the design trade-offs involved in PAR-BS and compareit to four previously proposed DRAM scheduler designs on 4-, 8-, and16-core systems. Our evaluations show that, averaged over 100 4-coreworkloads, PAR-BS improves fairness by 1.11X and system throughputby 8.3% compared to the best previous scheduling technique, Stall-Time Fair Memory (STFM) scheduling. Based on simple request prioritizationrules, PAR-BS is also simpler to implement than STFM.

read more

Citations
More filters
Dissertation

Architecture de contrôleur mémoire configurable et continuité de service pour l'accès à la mémoire externe dans les systèmes multiprocesseurs intégrés à base de réseaux sur puce

TL;DR: In this article, the authors introduce an architecture of controleur de memoire totalement configurable basee sur des blocs fonctionnels configurables, and proposons un modele de simulation associe relativement precis temporellement and a haut niveau d'abstraction.
Journal ArticleDOI

Dynamic and discrete cache insertion policies for managing shared last level caches in large multicores

TL;DR: This study introduces Adaptive Discrete and deprioritized Application PrioriTization (ADAPT), an LLC management policy addressing the large multi-cores where the LLC associativity degree is smaller than the number of applications.
Journal ArticleDOI

Sectored DRAM: An Energy-Efficient High-Throughput and Practical Fine-Grained DRAM Architecture

TL;DR: Sectored DRAM’s DRAM energy savings, combined with its system performance improvement, allows system-wide energy savings of up to 23% and can be implemented with low hardware cost.
Journal ArticleDOI

PARBLO: Page-Allocation-Based DRAM Row Buffer Locality Optimization

TL;DR: A new page-allocation-based optimization that works seamlessly together with some existing hardware and software optimizations to eliminate significantly more row buffer conflicts is presented.
Proceedings ArticleDOI

SOUP-N-SALAD: Allocation-Oblivious Access Latency Reduction with Asymmetric DRAM Microarchitectures

TL;DR: The first technique is SALAD, a new DRAM device architecture that provides symmetric access latency with asymmetric DRAM bank organizations, and the resulting design, called SOUP-N-SALAD, improves IPC and EDP by 9.6% and 18.2% over the baseline DDR4 device, respectively, for memory-intensive SPEC CPU2006 workloads without any software modifications.
References
More filters
Journal ArticleDOI

Pin: building customized program analysis tools with dynamic instrumentation

TL;DR: The goals are to provide easy-to-use, portable, transparent, and efficient instrumentation, and to illustrate Pin's versatility, two Pintools in daily use to analyze production software are described.
Proceedings ArticleDOI

Memory access scheduling

TL;DR: This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure.
Book

An efficient algorithm for exploiting multiple arithmetic units

TL;DR: In this article, the authors describe the methods employed in the floating-point area of the System/360 Model 91 to exploit the existence of multiple execution units and register tagging schemes.
Journal ArticleDOI

Symbiotic jobscheduling for a simultaneous multithreaded processor

TL;DR: It is demonstrated that performance on a hardware multithreaded processor is sensitive to the set of jobs that are coscheduled by the operating system jobscheduler, and that a small sample of the possible schedules is sufficient to identify a good schedule quickly.
Related Papers (5)