Topic

Multi-channel memory architecture

About: Multi-channel memory architecture is a research topic. Over the lifetime, 329 publications have been published within this topic receiving 5548 citations. The topic is also known as: multi-channel memory & multi-channel RAM.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization

[...]

Vivek Seshadri¹, Yoongu Kim¹, Chris Fallin¹, Donghyuk Lee¹, Rachata Ausavarungnirun¹, Gennady Pekhimenko¹, Yixin Luo¹, Onur Mutlu¹, Phillip B. Gibbons², Michael Kozuch², Todd C. Mowry¹ - Show less +7 more•Institutions (2)

Carnegie Mellon University¹, Intel²

07 Dec 2013

TL;DR: RowClone is proposed, a new and simple mechanism to perform bulk copy and initialization completely within DRAM — eliminating the need to transfer any data over the memory channel to perform such operations.

...read moreread less

Abstract: Several system-level operations trigger bulk data copy or initialization. Even though these bulk data operations do not require any computation, current systems transfer a large quantity of data back and forth on the memory channel to perform such operations. As a result, bulk data operations consume high latency, bandwidth, and energy — degrading both system performance and energy efficiency. In this work, we propose RowClone, a new and simple mechanism to perform bulk copy and initialization completely within DRAM — eliminating the need to transfer any data over the memory channel to perform such operations. Our key observation is that DRAM can internally and efficiently transfer a large quantity of data (multiple KBs) between a row of DRAM cells and the associated row buffer. Based on this, our primary mechanism can quickly copy an entire row of data from a source row to a destination row by first copying the data from the source row to the row buffer and then from the row buffer to the destination row, via two back-to-back activate commands. This mechanism, which we call the Fast Parallel Mode of RowClone, reduces the latency and energy consumption of a 4KB bulk copy operation by 11.6× and 74.4×, respectively, and a 4KB bulk zeroing operation by 6.0× and 41.5×, respectively. To efficiently copy data between rows that do not share a row buffer, we propose a second mode of RowClone, the Pipelined Serial Mode, which uses the shared internal bus of a DRAM chip to quickly copy data between two banks. RowClone requires only a 0.01% increase in DRAM chip area. We quantitatively evaluate the benefits of RowClone by focusing on fork, one of the frequently invoked system calls, and five other copy and initialization intensive applications. Our results show that RowClone can significantly improve both single-core and multi-core system performance, while also significantly reducing main memory bandwidth and energy consumption.

...read moreread less

385 citations

Proceedings Article•DOI•

Reducing memory interference in multicore systems via application-aware memory channel partitioning

[...]

Sai Prashanth Muralidhara¹, Lavanya Subramanian², Onur Mutlu², Mahmut Kandemir¹, Thomas Moscibroda³ - Show less +1 more•Institutions (3)

Pennsylvania State University¹, Carnegie Mellon University², Microsoft³

03 Dec 2011

TL;DR: In this paper, the authors present an alternative approach to reduce inter-application interference in the memory system: application-aware memory channel partitioning (MCP), which maps the data of applications that are likely to severely interfere with each other to different memory channels.

...read moreread less

Abstract: Main memory is a major shared resource among cores in a multicore system. If the interference between different applications' memory requests is not controlled effectively, system performance can degrade significantly. Previous work aimed to mitigate the problem of interference between applications by changing the scheduling policy in the memory controller, i.e., by prioritizing memory requests from applications in a way that benefits system performance.In this paper, we first present an alternative approach to reducing inter-application interference in the memory system: application-aware memory channel partitioning (MCP). The idea is to map the data of applications that are likely to severely interfere with each other to different memory channels. The key principles are to partition onto separate channels 1) the data of light (memory non-intensive) and heavy (memory-intensive) applications, 2) the data of applications with low and high row-buffer locality.Second, we observe that interference can be further reduced with a combination of memory channel partitioning and scheduling, which we call integrated memory partitioning and scheduling (IMPS). The key idea is to 1) always prioritize very light applications in the memory scheduler since such applications cause negligible interference to others, 2) use MCP to reduce interference among the remaining applications.We evaluate MCP and IMPS on a variety of multi-programmed workloads and system configurations and compare them to four previously proposed state-of-the-art memory scheduling policies. Averaged over 240 workloads on a 24-core system with 4 memory channels, MCP improves system throughput by 7.1% over an application-unaware memory scheduler and 1% over the previous best scheduler, while avoiding modifications to existing memory schedulers. IMPS improves system throughput by 11.1% over an application-unaware scheduler and 5% over the previous best scheduler, while incurring much lower hardware complexity than the latter.

...read moreread less

281 citations

Patent•

General purpose, multiple precision parallel operation, programmable media processor

[...]

Craig Hansen¹, John Moussouris¹•Institutions (1)

MicroUnity¹

22 Nov 1996

TL;DR: In this article, a general purpose programmable media processor for processing and transmitting a media data stream of audio, video, radio, graphics, encryption, authentication, and networking information in real-time is presented.

...read moreread less

Abstract: A general purpose, programmable media processor for processing and transmitting a media data stream of audio, video, radio, graphics, encryption, authentication, and networking information in real-time. The media processor incorporates an execution unit that maintains substantially peak data throughout of media data streams. The execution unit includes a dynamically partionable multi-precision arithmetic unit, programmable switch and programmable extended mathematical element. A high bandwidth external interface supplies media data streams at substantially peak rates to a general purpose register file and the multi-precision execution unit. A memory management unit, and instruction and data cache/buffers are also provided. High bandwidth memory controllers are linked in series to provide a memory channel to the general purpose, programmable media processor. The general purpose, programmable media processor is disposed in a network fabric consisting of fiber optic cable, coaxial cable and twisted pair wires to transmit, process and receive single or unified media data streams. Parallel general purpose media processors are disposed throughout the network in a distributed virtual manner to allow for multi-processor operations and sharing of resources through the network. A method for receiving, processing and transmitting media data streams over the communications fabric is also provided.

...read moreread less

263 citations

Proceedings Article•DOI•

Mini-rank: Adaptive DRAM architecture for improving memory power efficiency

[...]

Zheng Hongzhong¹, Jiang Lin², Zhao Zhang², Eugene Gorbatov³, Howard S. David³, Zhichun Zhu¹ - Show less +2 more•Institutions (3)

University of Illinois at Chicago¹, Iowa State University², Intel³

08 Nov 2008

TL;DR: A novel idea called mini-rank for DDRx (DDR/DDR2/ DDR3) DRAMs is proposed, which uses a small bridge chip on each DRAM DIMM to break a conventional DRAM rank into multiple smaller mini-ranks so as to reduce the number of devices involved in a single memory access.

...read moreread less

Abstract: The widespread use of multicore processors has dramatically increased the demand on high memory bandwidth and large memory capacity. As DRAM subsystem designs stretch to meet the demand, memory power consumption is now approaching that of processors. However, the conventional DRAM architecture prevents any meaningful power and performance trade-offs for memory-intensive workloads. We propose a novel idea called mini-rank for DDRx (DDR/DDR2/DDR3) DRAMs, which uses a small bridge chip on each DRAM DIMM to break a conventional DRAM rank into multiple smaller mini-ranks so as to reduce the number of devices involved in a single memory access. The design dramatically reduces the memory power consumption with only a slight increase on the memory idle latency. It does not change the DDRx bus protocol and its configuration can be adapted for the best performance-power trade-offs. Our experimental results using four-core multiprogramming workloads show that using x32 mini-ranks reduces memory power by 27.0% with 2.8% performance penalty and using x16 mini-ranks reduces memory power by 44.1% with 7.4% performance penalty on average for memory-intensive workloads, respectively.

...read moreread less

256 citations

Journal Article•DOI•

Memory channel network for PCI

[...]

R.B. Gillett

01 Feb 1996-IEEE Micro

TL;DR: MC implements a form of virtual shared memory that permits applications to completely bypass the operating system and perform cluster communication directly from the user level, and drops communication latency and overhead by up to three orders of magnitude.

...read moreread less

Abstract: A memory-based networking approach provides clusters of computers up to 1,000 times the communication performance of conventional networks, with no compromise in cost or reliability. The memory channel for PCI's performance gains are the result of a system design approach that exploits natural cluster constraints to define a memory-based network. MC implements a form of virtual shared memory that permits applications to completely bypass the operating system and perform cluster communication directly from the user level. The hardware's simple and powerful communication model supports error handling at almost no cost or complexity to the application; guaranteed ordering under errors is the key innovation. The end result: Real-world cluster communication latency dropped by up to two orders of magnitude, and overhead by up to three orders of magnitude. These improvements elevate a lowly set of standard PCI computers running Unix into an impressive, highly available, parallel computing system.

...read moreread less

155 citations

Collapse

Network Information

Performance

Metrics

329

Papers

5,922

Citations

No. of papers in the topic in previous years
Year	Papers
2021	11
2020	18
2019	23
2018	15
2017	17
2016	28

Multi-channel memory architecture

Papers published on a yearly basis

Papers

Trending Questions (3)

Network Information

Related Topics (5)

Performance

Metrics