scispace - formally typeset
Proceedings ArticleDOI

A Heterogeneous PIM Hardware-Software Co-Design for Energy-Efficient Graph Processing

Reads0
Chats0
TLDR
There is no absolute winner between these two representative PIM technologies for graph applications, which often exhibit irregular workloads, and a new heterogeneous PIM hardware, called Hetraph, is introduced to facilitate energy-efficient graph processing.
Abstract
Processing-In-Memory (PIM) is an emerging technology that addresses the memory bottleneck of graph processing. In general, analog memristor-based PIM promises high parallelism provided that the underlying matrix-structured crossbar can be fully utilized while digital CMOS-based PIM has a faster single-edge execution but its parallelism can be low. In this paper, we observe that there is no absolute winner between these two representative PIM technologies for graph applications, which often exhibit irregular workloads. To reap the best of both worlds, we introduce a new heterogeneous PIM hardware, called Hetraph, to facilitate energy-efficient graph processing. Hetraph incorporates memristor-based analog computation units (for high-parallelism computing) and CMOS-based digital computation cores (for efficient computing) on the same logic layer of a 3D die-stacked memory device. To maximize the hardware utilization, our software design offers a hardware heterogeneity-aware execution model and a workload offloading mechanism. For performance speedups, such a hardware-software co-design outperforms the state-of-the-art by 7.54 ×(CPU), 1.56 ×(GPU), 4.13× (memristor-based PIM) and 3.05× (CMOS-based PIM), on average. For energy savings, Hetraph reduces the energy consumption by 57.58× (CPU), 19.93× (GPU), 14.02 ×(memristor-based PIM) and 10.48 ×(CMOS-based PIM), on average.

read more

Citations
More filters
Posted Content

A Modern Primer on Processing in Memory.

TL;DR: This chapter discusses recent research that aims to practically enable computation close to data, an approach called processing-in-memory (PIM).
Journal ArticleDOI

Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System

TL;DR: This paper provides the first comprehensive analysis of the first publicly-available real-world PIM architecture, and presents PrIM (Processing-In-Memory benchmarks), a benchmark suite of 16 workloads from different application domains, which are identified as memory-bound.
Journal ArticleDOI

Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System

- 01 Jan 2022 - 
TL;DR: The UPMEM PIM as mentioned in this paper architecture combines traditional DRAM memory arrays with general-purpose in-order cores, called DRAM Processing Units (DPUs), integrated in the same chip.
Posted Content

Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture.

TL;DR: In this article, the UPMEM-based processing-in-memory (PIM) architecture is evaluated in a real-world PIM system with 16 workloads from different application domains (e.g., linear algebra, databases, graph processing, neural networks, neural network, bioinformatics).
Proceedings ArticleDOI

Polynesia: Enabling High-Performance and Energy-Efficient Hybrid Transactional/Analytical Databases with Hardware/Software Co-Design

TL;DR: Polynesia is proposed, a hardware-software co-designed system for in-memory HTAP databases that avoids the large throughput losses of traditional HTAP systems and reduces energy consumption by 48% over the prior lowest-energy HTAP sys-tem.
References
More filters
Proceedings ArticleDOI

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.
Proceedings ArticleDOI

PowerGraph: distributed graph-parallel computation on natural graphs

TL;DR: This paper describes the challenges of computation on natural graphs in the context of existing graph-parallel abstractions and introduces the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges.
Journal ArticleDOI

ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars

TL;DR: This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner.
Journal ArticleDOI

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory

TL;DR: This work proposes a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory, and distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving.
Journal ArticleDOI

NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory

TL;DR: NVSim is developed, a circuit-level model for NVM performance, energy, and area estimation, which supports various NVM technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash and is expected to help boost architecture-level NVM-related studies.
Related Papers (5)