Proceedings ArticleDOI
A Heterogeneous PIM Hardware-Software Co-Design for Energy-Efficient Graph Processing
Yu Huang,Long Zheng,Pengcheng Yao,Jieshan Zhao,Xiaofei Liao,Hai Jin,Jingling Xue +6 more
- pp 684-695
Reads0
Chats0
TLDR
There is no absolute winner between these two representative PIM technologies for graph applications, which often exhibit irregular workloads, and a new heterogeneous PIM hardware, called Hetraph, is introduced to facilitate energy-efficient graph processing.Abstract:
Processing-In-Memory (PIM) is an emerging technology that addresses the memory bottleneck of graph processing. In general, analog memristor-based PIM promises high parallelism provided that the underlying matrix-structured crossbar can be fully utilized while digital CMOS-based PIM has a faster single-edge execution but its parallelism can be low. In this paper, we observe that there is no absolute winner between these two representative PIM technologies for graph applications, which often exhibit irregular workloads. To reap the best of both worlds, we introduce a new heterogeneous PIM hardware, called Hetraph, to facilitate energy-efficient graph processing. Hetraph incorporates memristor-based analog computation units (for high-parallelism computing) and CMOS-based digital computation cores (for efficient computing) on the same logic layer of a 3D die-stacked memory device. To maximize the hardware utilization, our software design offers a hardware heterogeneity-aware execution model and a workload offloading mechanism. For performance speedups, such a hardware-software co-design outperforms the state-of-the-art by 7.54 ×(CPU), 1.56 ×(GPU), 4.13× (memristor-based PIM) and 3.05× (CMOS-based PIM), on average. For energy savings, Hetraph reduces the energy consumption by 57.58× (CPU), 19.93× (GPU), 14.02 ×(memristor-based PIM) and 10.48 ×(CMOS-based PIM), on average.read more
Citations
More filters
Posted Content
A Modern Primer on Processing in Memory.
TL;DR: This chapter discusses recent research that aims to practically enable computation close to data, an approach called processing-in-memory (PIM).
Journal ArticleDOI
Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System
Juan Gómez-Luna,Izzat El Hajj,Ivan Fernandez,Christina Giannoula,Geraldo F. Oliveira,Onur Mutlu +5 more
TL;DR: This paper provides the first comprehensive analysis of the first publicly-available real-world PIM architecture, and presents PrIM (Processing-In-Memory benchmarks), a benchmark suite of 16 workloads from different application domains, which are identified as memory-bound.
Journal ArticleDOI
Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System
TL;DR: The UPMEM PIM as mentioned in this paper architecture combines traditional DRAM memory arrays with general-purpose in-order cores, called DRAM Processing Units (DPUs), integrated in the same chip.
Posted Content
Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture.
Juan Gómez-Luna,Izzat El Hajj,Iván López Fernández,Christina Giannoula,Geraldo F. Oliveira,Onur Mutlu +5 more
TL;DR: In this article, the UPMEM-based processing-in-memory (PIM) architecture is evaluated in a real-world PIM system with 16 workloads from different application domains (e.g., linear algebra, databases, graph processing, neural networks, neural network, bioinformatics).
Proceedings ArticleDOI
Polynesia: Enabling High-Performance and Energy-Efficient Hybrid Transactional/Analytical Databases with Hardware/Software Co-Design
TL;DR: Polynesia is proposed, a hardware-software co-designed system for in-memory HTAP databases that avoids the large throughput losses of traditional HTAP systems and reduces energy consumption by 48% over the prior lowest-energy HTAP sys-tem.
References
More filters
Proceedings ArticleDOI
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures
TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.
Proceedings ArticleDOI
PowerGraph: distributed graph-parallel computation on natural graphs
TL;DR: This paper describes the challenges of computation on natural graphs in the context of existing graph-parallel abstractions and introduces the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges.
Journal ArticleDOI
ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars
Ali Shafiee,Anirban Nag,Naveen Muralimanohar,Rajeev Balasubramonian,John Paul Strachan,Miao Hu,R. Stanley Williams,Vivek Srikumar +7 more
TL;DR: This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner.
Journal ArticleDOI
PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory
TL;DR: This work proposes a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory, and distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving.
Journal ArticleDOI
NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory
TL;DR: NVSim is developed, a circuit-level model for NVM performance, energy, and area estimation, which supports various NVM technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash and is expected to help boost architecture-level NVM-related studies.