Proceedings ArticleDOI
An Initial Characterization of the Emu Chick
Eric R. Hein,Thomas M. Conte,Jeffrey Young,Srinivas Eswar,Jiajia Li,Patrick Lavin,Richard Vuduc,Jason Riedy +7 more
- pp 579-588
TLDR
This initial evaluation demonstrates that the Emu Chick uses available memory bandwidth more efficiently than a more traditional, cache-based architecture and provides stable, predictable performance with 80% bandwidth utilization on a random-access pointer chasing benchmark with weak locality.Abstract:
The Emu Chick is a prototype system designed around the concept of migratory memory-side processing. Rather than transferring large amounts of data across power-hungry, high-latency interconnects, the Emu Chick moves lightweight thread contexts to near-memory cores before the beginning of each memory read. The current prototype hardware uses FPGAs to implement cache-less "Gossamer" cores for doing computational work and a stationary core to run basic operating system functions and migrate threads between nodes. In this initial characterization of the Emu Chick, we study the memory bandwidth characteristics of the system through benchmarks like STREAM, pointer chasing, and sparse matrix vector multiply. We compare the Emu Chick hardware to architectural simulation and Intel Xeon-based platforms. While it is difficult to accurately compare prototype hardware with existing systems, our initial evaluation demonstrates that the Emu Chick uses available memory bandwidth more efficiently than a more traditional, cache-based architecture. Moreover, the Emu Chick provides stable, predictable performance with 80% bandwidth utilization on a random-access pointer chasing benchmark with weak locality.read more
Citations
More filters
Journal ArticleDOI
PASTA : a parallel sparse tensor algorithm benchmark suite
TL;DR: This work presents a sparse tensor algorithm benchmark suite (PASTA) for single- and multi-core CPUs that targets on helping application users to evaluate different computer systems using its representative computational workloads.
Posted Content
Programming Strategies for Irregular Algorithms on the Emu Chick
Eric R. Hein,Srinivas Eswar,Abdurrahman Yasar,Jiajia Li,Jeffrey Young,Thomas M. Conte,Ümit V. Çatalyürek,Rich Vuduc,Jason Riedy,Bora Uçar +9 more
TL;DR: This work evaluates irregular algorithms that could benefit from the lightweight, memory-side processing of the Chick and demonstrates techniques and optimization strategies for achieving performance in sparse matrix-vector multiply operation (SpMV), breadth-first search (BFS), and graph alignment across up to eight distributed nodes encompassing 64 nodelets in the Chick system.
Proceedings ArticleDOI
Experimental Insights from the Rogues Gallery
Jeffrey Young,Jason Riedy,Thomas M. Conte,Vivek Sarkar,Prasanth Chatarasi,Sriseshan Srikanth +5 more
TL;DR: Highlights of the first one to two years of post-Moore era research with the Rogues Gallery are presented and an indication of where the authors see future growth for this testbed and related efforts are given.
Proceedings ArticleDOI
A Preliminary Study of Compiler Transformations for Graph Applications on the Emu System
Prasanth Chatarasi,Vivek Sarkar +1 more
TL;DR: Two high- level compiler optimizations, i.e., loop fusion and edge flipping, and one low-level compiler transformation leveraging hardware support for remote atomic updates to address overheads arising from thread migration, creation, synchronization, and atomic operations are explored.
Journal ArticleDOI
A Microbenchmark Characterization of the Emu Chick
Jeffrey Young,Eric R. Hein,Srinivas Eswar,Patrick Lavin,Jiajia Li,Jason Riedy,Richard Vuduc,Thomas M. Conte +7 more
TL;DR: This multi-node characterization of the Emu Chick extends an earlier single-node investigation of the the memory bandwidth characteristics of the system through benchmarks like STREAM, pointer chasing, and sparse matrix-vector multiplication and demonstrates that for many basic operations the EmU Chick can use available memory bandwidth more efficiently than a more traditional, cache-based architecture.
References
More filters
Journal ArticleDOI
A case for intelligent RAM
David A. Patterson,Thomas Anderson,Neal Cardwell,Richard Fromm,Kimberly Keeton,Christos Kozyrakis,R. Thomas,Katherine Yelick +7 more
TL;DR: The state of microprocessors and DRAMs today is reviewed, some of the opportunities and challenges for IRAMs are explored, and performance and energy efficiency of three IRAM designs are estimated.
Proceedings ArticleDOI
The HPC Challenge (HPCC) benchmark suite
Piotr Luszczek,David H. Bailey,Jack Dongarra,Jeremy Kepner,Robert F. Lucas,Rolf Rabenseifner,Daisuke Takahashi +6 more
TL;DR: This tutorial will introduce attendees to HPCC, provide tools to examine differences in HPC architectures, and give hands-on training that will hopefully lead to better understanding of parallel environments.
Proceedings Article
Scalability! but at what cost?
TL;DR: This work surveys measurements of data-parallel systems recently reported in SOSP and OSDI, and finds that many systems have either a surprisingly large COST, often hundreds of cores, or simply underperform one thread for all of their reported configurations.
Proceedings ArticleDOI
Practical Near-Data Processing for In-Memory Analytics Frameworks
TL;DR: This paper develops the hardware and software of an NDP architecture for in-memory analytics frameworks, including MapReduce, graphprocessing, and deep neural networks, and shows that it is critical to optimize software frameworks for spatial locality as it leads to 2.9x efficiency improvements for NDP.
Proceedings ArticleDOI
Graphicionado: a high-performance and energy-efficient accelerator for graph analytics
TL;DR: Graphicionado augments the vertex programming paradigm, allowing different graph analytics applications to be mapped to the same accelerator framework, while maintaining flexibility through a small set of reconfigurable blocks, for high-performance, energy-efficient processing of graph analytics workloads.