scispace - formally typeset
Open AccessProceedings ArticleDOI

Memory access scheduling

TLDR
This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure.
Abstract
The bandwidth and latency of a memory system are strongly dependent on the manner in which accesses interact with the “3-D” structure of banks, rows, and columns characteristic of contemporary DRAM chips. There is nearly an order of magnitude difference in bandwidth between successive references to different columns within a row and different rows within a bank. This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure. Conservative reordering, in which the first ready reference in a sequence is performed, improves bandwidth by 40% for traces from five media benchmarks. Aggressive reordering, in which operations are scheduled to optimize memory bandwidth, improves bandwidth by 93% for the same set of applications. Memory access scheduling is particularly important for media processors where it enables the processor to make the most efficient use of scarce memory bandwidth.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Analyzing CUDA workloads using a detailed GPU simulator

TL;DR: In this paper, the performance of non-graphics applications written in NVIDIA's CUDA programming model is evaluated on a microarchitecture performance simulator that runs NVIDIA's parallel thread execution (PTX) virtual instruction set.
Journal ArticleDOI

3D-Stacked Memory Architectures for Multi-core Processors

TL;DR: This work explores more aggressive 3D DRAM organizations that make better use of the additional die-to-die bandwidth provided by 3D stacking, as well as the additional transistor count, to achieve a 1.75x speedup over previously proposed 3D-DRAM approaches on memory-intensive multi-programmed workloads on a quad-core processor.
Book

Memory Systems: Cache, DRAM, Disk

TL;DR: Is your memory hierarchy stopping your microprocessor from performing at the high level it should be?
Proceedings ArticleDOI

Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

TL;DR: This paper proposes a new memory access scheduler, called the Stall-Time Fair Memory scheduler (STFM), that provides quality of service to different threads sharing the DRAM memory system and shows that STFM significantly reduces the unfairness in theDRAM system while also improving system throughput on a wide variety of workloads and systems.
Journal ArticleDOI

Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems

TL;DR: A parallelism-aware batch scheduler that seamlessly incorporates support for system-level thread priorities and can provide different service levels, including purely opportunistic service, to threads with different priorities, and is also simpler to implement than STFM.
References
More filters
Proceedings ArticleDOI

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

TL;DR: In this article, a hardware technique to improve the performance of caches is presented, where a small fully-associative cache between a cache and its refill path is used to place prefetched data and not in the cache.
Journal ArticleDOI

A case for intelligent RAM

TL;DR: The state of microprocessors and DRAMs today is reviewed, some of the opportunities and challenges for IRAMs are explored, and performance and energy efficiency of three IRAM designs are estimated.
Proceedings ArticleDOI

Lockup-free instruction fetch/prefetch cache organization

David Kroft
TL;DR: A cache organization is presented that essentially eliminates a penalty on subsequent cache references following a cache miss and has been incorporated in a cache/memory interface subsystem design, and the design has been implemented and prototyped.
Proceedings ArticleDOI

A bandwidth-efficient architecture for media processing

TL;DR: The Imagine architecture supports the stream programming model by providing a bandwidth hierarchy tailored to the demands of media applications by reducing the global register and memory bandwidth required by typical applications by factors of 13 and 21 respectively.
Proceedings ArticleDOI

Development of a video-rate stereo machine

TL;DR: A video-rate stereo machine has been developed at CMU with the capability of generating a dense range map, aligned with an intensity image, at the video rate, with high throughput and high precision.
Related Papers (5)