Proceedings ArticleDOI
A scalable processing-in-memory accelerator for parallel graph processing
Junwhan Ahn,Sungpack Hong,Sungjoo Yoo,Onur Mutlu,Kiyoung Choi +4 more
- Vol. 43, Iss: 3, pp 105-117
TLDR
This work argues that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve memory-capacity-proportional performance and designs a programmable PIM accelerator for large-scale graph processing called Tesseract.Abstract:
The explosion of digital data and the ever-growing need for fast data analysis have made in-memory big-data processing in computer systems increasingly important. In particular, large-scale graph processing is gaining attention due to its broad applicability from social science to machine learning. However, scalable hardware design that can efficiently process large graphs in main memory is still an open problem. Ideally, cost-effective and scalable graph processing systems can be realized by building a system whose performance increases proportionally with the sizes of graphs that can be stored in the system, which is extremely challenging in conventional systems due to severe memory bandwidth limitations. In this work, we argue that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve such an objective. The key modern enabler for PIM is the recent advancement of the 3D integration technology that facilitates stacking logic and memory dies in a single package, which was not available when the PIM concept was originally examined. In order to take advantage of such a new technology to enable memory-capacity-proportional performance, we design a programmable PIM accelerator for large-scale graph processing called Tesseract. Tesseract is composed of (1) a new hardware architecture that fully utilizes the available memory bandwidth, (2) an efficient method of communication between different memory partitions, and (3) a programming interface that reflects and exploits the unique hardware design. It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model. Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems.read more
Citations
More filters
Journal ArticleDOI
PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory
TL;DR: This work proposes a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory, and distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving.
Proceedings ArticleDOI
PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning
TL;DR: PipeLayer is presented, a ReRAM-based PIM accelerator for CNNs that support both training and testing and proposes highly parallel design based on the notion of parallelism granularity and weight replication, which enables the highly pipelined execution of bothTraining and testing, without introducing the potential stalls in previous work.
Proceedings ArticleDOI
TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory
TL;DR: The hardware architecture and software scheduling and partitioning techniques for TETRIS, a scalable NN accelerator using 3D memory, are presented and it is shown that despite the use of small SRAM buffers, the presence of3D memory simplifies dataflow scheduling for NN computations.
Proceedings ArticleDOI
Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology
Vivek Seshadri,Donghyuk Lee,Thomas Mullins,Hasan Hassan,Amirali Boroumand,Jeremie S. Kim,Michael Kozuch,Onur Mutlu,Phillip B. Gibbons,Todd C. Mowry +9 more
TL;DR: Ambit is proposed, an Accelerator-in-Memory for bulk bitwise operations that largely exploits existing DRAM structure, and hence incurs low cost on top of commodity DRAM designs (1% of DRAM chip area).
Journal ArticleDOI
Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory
TL;DR: The basic architecture of the Neurocube is presented and an analysis of the logic tier synthesized in 28nm and 15nm process technologies are presented and the performance is evaluated through the mapping of a Convolutional Neural Network and estimating the subsequent power and performance for both training and inference.
References
More filters
Proceedings ArticleDOI
FlexRAM: Toward an advanced Intelligent Memory system
Yi Kang,Wei Huang,Seung-Moon Yoo,Diana Keen,Zhenzhou Ge,Vinh Lam,Pratap Pattnaik,Josep Torrellas +7 more
TL;DR: A PIM chip and a PIM-based memory system to satisfy requirements of general purpose and low programming cost and Evaluation of the system through simulations shows that 4 FlexRAM chips often allow a workstation to run 25-40 times faster.
Journal ArticleDOI
Near-Data Processing: Insights from a MICRO-46 Workshop
Rajeev Balasubramonian,Jichuan Chang,Troy A. Manning,Jaime H. Moreno,Richard C. Murphy,Ravi Nair,Steven Swanson +6 more
TL;DR: The many reasons why NDP is compelling today are described and key upcoming challenges in realizing the potential of NDP are identified.
Journal ArticleDOI
Transparent offloading and mapping (TOM): enabling programmer-transparent near-data processing in GPU systems
Kevin Hsieh,Eiman Ebrahimi,Gwangsun Kim,Niladrish Chatterjee,Mike O'Connor,Nandita Vijaykumar,Onur Mutlu,Stephen W. Keckler +7 more
TL;DR: Extensive evaluations across a variety of modern memory-intensive GPU workloads show that TOM significantly improves performance compared to a baseline GPU system that cannot offload computation to 3D-stacked memories.
Proceedings ArticleDOI
Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture
Mary Hall,Peter M. Kogge,Jeff Koller,Pedro C. Diniz,Jacqueline Chame,Jeffrey Draper,Jeff LaCoss,John J. Granacki,Jay B. Brockman,Apoorv Srivastava,William C. Athas,Vincent W. Freeh,Jaewook Shin,Joonseok Park +13 more
TL;DR: The potential of PIM-based architectures in accelerating the performance of three irregular computations, sparse conjugate gradient, a natural-join database operation and an object-oriented database query are demonstrated.
Proceedings ArticleDOI
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor
Taeho Kgil,Shaun C. D'Souza,Ali G. Saidi,Nathan Binkert,Ronald G. Dreslinski,Trevor Mudge,Steven K. Reinhardt,Krisztian Flautner +7 more
TL;DR: It is shown how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing and that a PicoServer performs comparably to a Pentium 4-like class machine while consuming only about 1/10 of the power.