Memory access scheduling
Scott Rixner,William J. Dally,Ujval J. Kapasi,Peter Mattson,John D. Owens +4 more
- Vol. 28, Iss: 2, pp 128-138
TLDR
This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure.Abstract:
The bandwidth and latency of a memory system are strongly dependent on the manner in which accesses interact with the “3-D” structure of banks, rows, and columns characteristic of contemporary DRAM chips. There is nearly an order of magnitude difference in bandwidth between successive references to different columns within a row and different rows within a bank. This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure. Conservative reordering, in which the first ready reference in a sequence is performed, improves bandwidth by 40% for traces from five media benchmarks. Aggressive reordering, in which operations are scheduled to optimize memory bandwidth, improves bandwidth by 93% for the same set of applications. Memory access scheduling is particularly important for media processors where it enables the processor to make the most efficient use of scarce memory bandwidth.read more
Citations
More filters
Proceedings ArticleDOI
Analyzing CUDA workloads using a detailed GPU simulator
TL;DR: In this paper, the performance of non-graphics applications written in NVIDIA's CUDA programming model is evaluated on a microarchitecture performance simulator that runs NVIDIA's parallel thread execution (PTX) virtual instruction set.
Journal ArticleDOI
3D-Stacked Memory Architectures for Multi-core Processors
TL;DR: This work explores more aggressive 3D DRAM organizations that make better use of the additional die-to-die bandwidth provided by 3D stacking, as well as the additional transistor count, to achieve a 1.75x speedup over previously proposed 3D-DRAM approaches on memory-intensive multi-programmed workloads on a quad-core processor.
Book
Memory Systems: Cache, DRAM, Disk
TL;DR: Is your memory hierarchy stopping your microprocessor from performing at the high level it should be?
Proceedings ArticleDOI
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors
Onur Mutlu,Thomas Moscibroda +1 more
TL;DR: This paper proposes a new memory access scheduler, called the Stall-Time Fair Memory scheduler (STFM), that provides quality of service to different threads sharing the DRAM memory system and shows that STFM significantly reduces the unfairness in theDRAM system while also improving system throughput on a wide variety of workloads and systems.
Journal ArticleDOI
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
Onur Mutlu,Thomas Moscibroda +1 more
TL;DR: A parallelism-aware batch scheduler that seamlessly incorporates support for system-level thread priorities and can provide different service levels, including purely opportunistic service, to threads with different priorities, and is also simpler to implement than STFM.
References
More filters
Proceedings ArticleDOI
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
TL;DR: In this article, a hardware technique to improve the performance of caches is presented, where a small fully-associative cache between a cache and its refill path is used to place prefetched data and not in the cache.
Journal ArticleDOI
A case for intelligent RAM
David A. Patterson,Thomas Anderson,Neal Cardwell,Richard Fromm,Kimberly Keeton,Christos Kozyrakis,R. Thomas,Katherine Yelick +7 more
TL;DR: The state of microprocessors and DRAMs today is reviewed, some of the opportunities and challenges for IRAMs are explored, and performance and energy efficiency of three IRAM designs are estimated.
Proceedings ArticleDOI
Lockup-free instruction fetch/prefetch cache organization
TL;DR: A cache organization is presented that essentially eliminates a penalty on subsequent cache references following a cache miss and has been incorporated in a cache/memory interface subsystem design, and the design has been implemented and prototyped.
Proceedings ArticleDOI
A bandwidth-efficient architecture for media processing
Scott Rixner,William J. Dally,Ujval J. Kapasi,Brucek Khailany,A. Lopez-Lagunas,Peter Mattson,John D. Owens +6 more
TL;DR: The Imagine architecture supports the stream programming model by providing a bandwidth hierarchy tailored to the demands of media applications by reducing the global register and memory bandwidth required by typical applications by factors of 13 and 21 respectively.
Proceedings ArticleDOI
Development of a video-rate stereo machine
TL;DR: A video-rate stereo machine has been developed at CMU with the capability of generating a dense range map, aligned with an intensity image, at the video rate, with high throughput and high precision.
Related Papers (5)
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
Onur Mutlu,Thomas Moscibroda +1 more
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors
Onur Mutlu,Thomas Moscibroda +1 more