Memory access scheduling

doi:10.1145/339647.339668

Open AccessProceedings ArticleDOI

Memory access scheduling

- Vol. 28, Iss: 2, pp 128-138

TLDR

This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure.

Abstract:

The bandwidth and latency of a memory system are strongly dependent on the manner in which accesses interact with the “3-D” structure of banks, rows, and columns characteristic of contemporary DRAM chips. There is nearly an order of magnitude difference in bandwidth between successive references to different columns within a row and different rows within a bank. This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure. Conservative reordering, in which the first ready reference in a sequence is performed, improves bandwidth by 40% for traces from five media benchmarks. Aggressive reordering, in which operations are scheduled to optimize memory bandwidth, improves bandwidth by 93% for the same set of applications. Memory access scheduling is particularly important for media processors where it enables the processor to make the most efficient use of scarce memory bandwidth.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Analyzing CUDA workloads using a detailed GPU simulator

Ali Bakhoda, +4 more

TL;DR: In this paper, the performance of non-graphics applications written in NVIDIA's CUDA programming model is evaluated on a microarchitecture performance simulator that runs NVIDIA's parallel thread execution (PTX) virtual instruction set.

...read moreread less

Journal ArticleDOI

3D-Stacked Memory Architectures for Multi-core Processors

Gabriel H. Loh

TL;DR: This work explores more aggressive 3D DRAM organizations that make better use of the additional die-to-die bandwidth provided by 3D stacking, as well as the additional transistor count, to achieve a 1.75x speedup over previously proposed 3D-DRAM approaches on memory-intensive multi-programmed workloads on a quad-core processor.

...read moreread less

Book

Memory Systems: Cache, DRAM, Disk

Bruce Jacob, +2 more

TL;DR: Is your memory hierarchy stopping your microprocessor from performing at the high level it should be?

...read moreread less

Proceedings ArticleDOI

Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Onur Mutlu, +1 more

TL;DR: This paper proposes a new memory access scheduler, called the Stall-Time Fair Memory scheduler (STFM), that provides quality of service to different threads sharing the DRAM memory system and shows that STFM significantly reduces the unfairness in theDRAM system while also improving system throughput on a wide variety of workloads and systems.

...read moreread less

Journal ArticleDOI

Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

Onur Mutlu, +1 more

TL;DR: A parallelism-aware batch scheduler that seamlessly incorporates support for system-level thread priorities and can provide different service levels, including purely opportunistic service, to threads with different priorities, and is also simpler to implement than STFM.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

Norman P. Jouppi

TL;DR: In this article, a hardware technique to improve the performance of caches is presented, where a small fully-associative cache between a cache and its refill path is used to place prefetched data and not in the cache.

...read moreread less

Journal ArticleDOI

A case for intelligent RAM

David A. Patterson, +7 more

- 01 Mar 1997 -

IEEE Micro

TL;DR: The state of microprocessors and DRAMs today is reviewed, some of the opportunities and challenges for IRAMs are explored, and performance and energy efficiency of three IRAM designs are estimated.

...read moreread less

Proceedings ArticleDOI

Lockup-free instruction fetch/prefetch cache organization

David Kroft

TL;DR: A cache organization is presented that essentially eliminates a penalty on subsequent cache references following a cache miss and has been incorporated in a cache/memory interface subsystem design, and the design has been implemented and prototyped.

...read moreread less

Proceedings ArticleDOI

A bandwidth-efficient architecture for media processing

Scott Rixner, +6 more

TL;DR: The Imagine architecture supports the stream programming model by providing a bandwidth hierarchy tailored to the demands of media applications by reducing the global register and memory bandwidth required by typical applications by factors of 13 and 21 respectively.

...read moreread less

Proceedings ArticleDOI

Development of a video-rate stereo machine

Takeo Kanade, +4 more

TL;DR: A video-rate stereo machine has been developed at CMU with the capability of generating a dense range map, aligned with an intensity image, at the video rate, with high throughput and high precision.

...read moreread less

Memory access scheduling

Citations

Analyzing CUDA workloads using a detailed GPU simulator

3D-Stacked Memory Architectures for Multi-core Processors

Memory Systems: Cache, DRAM, Disk

Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

References

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

A case for intelligent RAM

Lockup-free instruction fetch/prefetch cache organization

A bandwidth-efficient architecture for media processing

Development of a video-rate stereo machine

Related Papers (5)

Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior

ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers

Self-Optimizing Memory Controllers: A Reinforcement Learning Approach