A Heterogeneous PIM Hardware-Software Co-Design for Energy-Efficient Graph Processing

doi:10.1109/IPDPS47924.2020.00076

Proceedings ArticleDOI

A Heterogeneous PIM Hardware-Software Co-Design for Energy-Efficient Graph Processing

Yu Huang, +6 more

- pp 684-695

Chats0

TLDR

There is no absolute winner between these two representative PIM technologies for graph applications, which often exhibit irregular workloads, and a new heterogeneous PIM hardware, called Hetraph, is introduced to facilitate energy-efficient graph processing.

Abstract:

Processing-In-Memory (PIM) is an emerging technology that addresses the memory bottleneck of graph processing. In general, analog memristor-based PIM promises high parallelism provided that the underlying matrix-structured crossbar can be fully utilized while digital CMOS-based PIM has a faster single-edge execution but its parallelism can be low. In this paper, we observe that there is no absolute winner between these two representative PIM technologies for graph applications, which often exhibit irregular workloads. To reap the best of both worlds, we introduce a new heterogeneous PIM hardware, called Hetraph, to facilitate energy-efficient graph processing. Hetraph incorporates memristor-based analog computation units (for high-parallelism computing) and CMOS-based digital computation cores (for efficient computing) on the same logic layer of a 3D die-stacked memory device. To maximize the hardware utilization, our software design offers a hardware heterogeneity-aware execution model and a workload offloading mechanism. For performance speedups, such a hardware-software co-design outperforms the state-of-the-art by 7.54 ×(CPU), 1.56 ×(GPU), 4.13× (memristor-based PIM) and 3.05× (CMOS-based PIM), on average. For energy savings, Hetraph reduces the energy consumption by 57.58× (CPU), 19.93× (GPU), 14.02 ×(memristor-based PIM) and 10.48 ×(CMOS-based PIM), on average.

Citations

PDF

Open Access

More filters

Posted Content

A Modern Primer on Processing in Memory.

Onur Mutlu, +3 more

- 05 Dec 2020 -

arXiv: Hardware Architecture

TL;DR: This chapter discusses recent research that aims to practically enable computation close to data, an approach called processing-in-memory (PIM).

...read moreread less

Journal ArticleDOI

Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System

Juan Gómez-Luna, +5 more

IEEE Access

TL;DR: This paper provides the first comprehensive analysis of the first publicly-available real-world PIM architecture, and presents PrIM (Processing-In-Memory benchmarks), a benchmark suite of 16 workloads from different application domains, which are identified as memory-bound.

...read moreread less

Journal ArticleDOI

Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System

- 01 Jan 2022 -

IEEE Access

TL;DR: The UPMEM PIM as mentioned in this paper architecture combines traditional DRAM memory arrays with general-purpose in-order cores, called DRAM Processing Units (DPUs), integrated in the same chip.

...read moreread less

Posted Content

Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture.

Juan Gómez-Luna, +5 more

- 09 May 2021 -

arXiv: Hardware Architecture

TL;DR: In this article, the UPMEM-based processing-in-memory (PIM) architecture is evaluated in a real-world PIM system with 16 workloads from different application domains (e.g., linear algebra, databases, graph processing, neural networks, neural network, bioinformatics).

...read moreread less

Proceedings ArticleDOI

Polynesia: Enabling High-Performance and Energy-Efficient Hybrid Transactional/Analytical Databases with Hardware/Software Co-Design

Amir Bahador Boroumand, +3 more

TL;DR: Polynesia is proposed, a hardware-software co-designed system for in-memory HTAP databases that avoids the large throughput losses of traditional HTAP systems and reduces energy consumption by 48% over the prior lowest-energy HTAP sys-tem.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Sheng Li, +5 more

TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.

...read moreread less

Proceedings ArticleDOI

PowerGraph: distributed graph-parallel computation on natural graphs

Joseph E. Gonzalez, +4 more

TL;DR: This paper describes the challenges of computation on natural graphs in the context of existing graph-parallel abstractions and introduces the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges.

...read moreread less

Journal ArticleDOI

ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars

Ali Shafiee, +7 more

TL;DR: This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner.

...read moreread less

Journal ArticleDOI

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory

Ping Chi, +7 more

TL;DR: This work proposes a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory, and distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving.

...read moreread less

Journal ArticleDOI

NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory

Xiangyu Dong, +3 more

- 01 Jul 2012 -

IEEE Transactions on Computer-Aided Desi...

TL;DR: NVSim is developed, a circuit-level model for NVM performance, energy, and area estimation, which supports various NVM technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash and is expected to help boost architecture-level NVM-related studies.

...read moreread less

Collapse

Related Papers (5)

ASTRO: Synthesizing application-specific reconfigurable hardware traces to exploit memory-level parallelism

Mingjie Lin, +3 more

- 01 Oct 2015 -

Microprocessors and Microsystems

Adaptive multi-constraints in hardware-software partitioning for embedded multiprocessor FPGA systems

Trong-Yen Lee, +2 more

- 01 Feb 2009 -

WSEAS Transactions on Computers archive

A Heterogeneous PIM Hardware-Software Co-Design for Energy-Efficient Graph Processing

Citations

A Modern Primer on Processing in Memory.

Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System

Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System

Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture.

Polynesia: Enabling High-Performance and Energy-Efficient Hybrid Transactional/Analytical Databases with Hardware/Software Co-Design

References

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

PowerGraph: distributed graph-parallel computation on natural graphs

ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory

NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory

Related Papers (5)

ASTRO: Synthesizing application-specific reconfigurable hardware traces to exploit memory-level parallelism

Adaptive multi-constraints in hardware-software partitioning for embedded multiprocessor FPGA systems

Towards highly parallel event processing through reconfigurable hardware

Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures

Hardware/Software Integration for FPGA-based All-Pairs Shortest-Paths