NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules

doi:10.1109/HPCA.2015.7056040

Proceedings ArticleDOI

NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules

Amin Farmahini-Farahani, +3 more

- pp 283-295

Chats0

TLDR

This paper proposes near-DRAM acceleration (NDA) architectures, which process data using accelerators 3D-stacked on DRAM devices comprising off-chip main memory modules, substantially reducing energy consumption and improving performance.

Abstract:

Energy consumed for transferring data across the processor memory hierarchy constitutes a large fraction of total system energy consumption, and this fraction has steadily increased with technology scaling. In this paper, we propose near-DRAM acceleration (NDA) architectures, which process data using accelerators 3D-stacked on DRAM devices comprising off-chip main memory modules. NDA transfers most data through high-bandwidth and low-energy 3D interconnects between accelerators and DRAM devices instead of low-bandwidth and high-energy off-chip interconnects between a processor and DRAM devices, substantially reducing energy consumption and improving performance. Unlike previous near-memory processing architectures, NDA is built upon commodity DRAM devices; apart from inserting through-silicon vias (TSVs) to 3D-interconnect DRAM devices and accelerators, NDA requires minimal changes to the commodity DRAM device and standard memory module architectures. This allows NDA to be more easily adopted in both existing and emerging systems. Our experiments demonstrate that, on average, our NDA-based system consumes 46% (68%) lower (data transfer) energy at 1.67× higher performance than a system that integrates the same accelerator logic within the processor itself.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning

Linghao Song, +3 more

TL;DR: PipeLayer is presented, a ReRAM-based PIM accelerator for CNNs that support both training and testing and proposes highly parallel design based on the notion of parallelism granularity and weight replication, which enables the highly pipelined execution of bothTraining and testing, without introducing the potential stalls in previous work.

...read moreread less

Proceedings ArticleDOI

Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology

Vivek Seshadri, +9 more

TL;DR: Ambit is proposed, an Accelerator-in-Memory for bulk bitwise operations that largely exploits existing DRAM structure, and hence incurs low cost on top of commodity DRAM designs (1% of DRAM chip area).

...read moreread less

Proceedings ArticleDOI

PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture

Junwhan Ahn, +3 more

TL;DR: In this article, the authors propose a new PIM architecture that does not change the existing sequential programming models and automatically decides whether to execute PIM operations in memory or processors depending on the locality of data.

...read moreread less

Proceedings ArticleDOI

DRISA: a DRAM-based Reconfigurable In-Situ Accelerator

Shuangchen Li, +5 more

TL;DR: DRISA, a DRAM-based Reconfigurable In-Situ Accelerator architecture, is proposed to provide both powerful computing capability and large memory capacity/bandwidth to address the memory wall problem in traditional von Neumann architecture.

...read moreread less

Proceedings ArticleDOI

Practical Near-Data Processing for In-Memory Analytics Frameworks

Mingyu Gao, +2 more

TL;DR: This paper develops the hardware and software of an NDP architecture for in-memory analytics frameworks, including MapReduce, graphprocessing, and deep neural networks, and shows that it is critical to optimize software frameworks for spatial locality as it leads to 2.9x efficiency improvements for NDP.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

The SPLASH-2 programs: characterization and methodological considerations

Steven Cameron Woo, +4 more

TL;DR: This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality.

...read moreread less

Proceedings ArticleDOI

Rodinia: A benchmark suite for heterogeneous computing

Shuai Che, +6 more

TL;DR: This characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.

...read moreread less

Journal ArticleDOI

Reconfigurable computing: a survey of systems and software

Katherine Compton, +1 more

- 01 Jun 2002 -

ACM Computing Surveys

TL;DR: The hardware aspects of reconfigurable computing machines, from single chip architectures to multi-chip systems, including internal structures and external coupling are explored, and the software that targets these machines is focused on.

...read moreread less

Proceedings ArticleDOI

Garp: a MIPS processor with a reconfigurable coprocessor

Jay Hauser, +1 more

TL;DR: Novel aspects of the Garp Architecture are presented, as well as a prototype software environment and preliminary performance results, which suggest that a Garp of similar technology could achieve speedups ranging from a factor of 2 to as high as a factors of 24 for some useful applications.

...read moreread less

Collapse

IEEE Micro

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory

Ping Chi, +7 more

NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules

Citations

PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning

Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology

PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture

DRISA: a DRAM-based Reconfigurable In-Situ Accelerator

Practical Near-Data Processing for In-Memory Analytics Frameworks

References

The gem5 simulator

The SPLASH-2 programs: characterization and methodological considerations

Rodinia: A benchmark suite for heterogeneous computing

Reconfigurable computing: a survey of systems and software

Garp: a MIPS processor with a reconfigurable coprocessor

Related Papers (5)

A scalable processing-in-memory accelerator for parallel graph processing

PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture

TOP-PIM: throughput-oriented programmable processing in memory

A case for intelligent RAM

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory