Proceedings ArticleDOI
Implementing the Jaccard Index on the Migratory Memory-Side Processing Emu Architecture
Geraud P. Krawezik,Peter M. Kogge,Timothy J. Dysart,Shannon K. Kuntz,Janice O. McMahon +4 more
- pp 1-6
TLDR
An implementation of the Jaccard Index for graphs on the Migratory Memory-Side Processing Emu architecture, which was designed to find similarities between different vertices in a graph, and is often used to identify communities is presented.Abstract:
We present an implementation of the Jaccard Index for graphs on the Migratory Memory-Side Processing Emu architecture. This index was designed to find similarities between different vertices in a graph, and is often used to identify communities. The Emu architecture is a parallel system based on a partitioned global address space, with threads automatically migrating inside the memory. We introduce the parallel programming model used to exploit it, detail our implementation of the algorithm, and analyze simulated performance results as well as early hardware tests. We discuss its application to large scale problems.read more
Citations
More filters
Proceedings ArticleDOI
Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons
Maciej Besta,Raghavendra Kanakagiri,Harun Mustafa,Mikhail Karasikov,Gunnar Rätsch,Torsten Hoefler,Edgar Solomonik +6 more
TL;DR: SimilarityAtScale as mentioned in this paper is the first communication-efficient distributed algorithm for computing the Jaccard similarity among pairs of large datasets, which provides an efficient encoding of this problem into a multiplication of sparse matrices.
Posted Content
Communication-Efficient Jaccard Similarity for High-Performance Distributed Genome Comparisons
Maciej Besta,Raghavendra Kanakagiri,Harun Mustafa,Mikhail Karasikov,Gunnar Rätsch,Torsten Hoefler,Edgar Solomonik +6 more
TL;DR: The design and implementation of SimilarityAtScale is designed and implemented, the first communication-efficient distributed algorithm for computing the Jaccard similarity among pairs of large datasets, and the resulting scheme is the first to enable accurateJaccard distance derivations for massive datasets, using large-scale distributed-memory systems.
Proceedings ArticleDOI
GraphChallenge.org Triangle Counting Performance
Siddharth Samsi,Jeremy Kepner,Vijay Gadepally,Michael Hurley,Michael Jones,Edward K. Kao,Sanjeev Mohindra,Albert Reuther,Steven T. Smith,William S. Song,Diane Staheli,Paul Monticciolo +11 more
TL;DR: These submissions show that their state-of-the-art triangle counting execution time is a strong function of the number of edges in the graph, which improved significantly from 2017 to 2018 and remained comparable from 2018 to 2019.
Posted Content
Programming Strategies for Irregular Algorithms on the Emu Chick
Eric R. Hein,Srinivas Eswar,Abdurrahman Yasar,Jiajia Li,Jeffrey Young,Thomas M. Conte,Ümit V. Çatalyürek,Rich Vuduc,Jason Riedy,Bora Uçar +9 more
TL;DR: This work evaluates irregular algorithms that could benefit from the lightweight, memory-side processing of the Chick and demonstrates techniques and optimization strategies for achieving performance in sparse matrix-vector multiply operation (SpMV), breadth-first search (BFS), and graph alignment across up to eight distributed nodes encompassing 64 nodelets in the Chick system.
Journal ArticleDOI
A Microbenchmark Characterization of the Emu Chick
Jeffrey Young,Eric R. Hein,Srinivas Eswar,Patrick Lavin,Jiajia Li,Jason Riedy,Richard Vuduc,Thomas M. Conte +7 more
TL;DR: This multi-node characterization of the Emu Chick extends an earlier single-node investigation of the the memory bandwidth characteristics of the system through benchmarks like STREAM, pointer chasing, and sparse matrix-vector multiplication and demonstrates that for many basic operations the EmU Chick can use available memory bandwidth more efficiently than a more traditional, cache-based architecture.
References
More filters
Journal ArticleDOI
Cilk: An Efficient Multithreaded Runtime System
Robert D. Blumofe,Christopher F. Joerg,Bradley C. Kuszmaul,Charles E. Leiserson,Keith H. Randall,Yuli Zhou +5 more
TL;DR: It is shown that on real and synthetic applications, the “work” and “critical-path length” of a Cilk computation can be used to model performance accurately, and it is proved that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal.
Proceedings Article
R-MAT: A Recursive Model for Graph Mining
TL;DR: A simple, parsimonious model, the “recursive matrix” (R-MAT) model, which can quickly generate realistic graphs, capturing the essence of each graph in only a few parameters is proposed.
Proceedings ArticleDOI
Cilk: an efficient multithreaded runtime system
Robert D. Blumofe,Christopher F. Joerg,Bradley C. Kuszmaul,Charles E. Leiserson,Keith H. Randall,Yuli Zhou +5 more
TL;DR: This paper shows that on real and synthetic applications, the “work” and “critical path” of a Cilk computation can be used to accurately model performance, and proves that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time and communication bounds all within a constant factor of optimal.
Book ChapterDOI
A Pragmatic Implementation of Non-blocking Linked-Lists
TL;DR: This work presents a new non-blocking implementation of concurrent linked-lists supporting linearizable insertion and deletion operations, conceptually simpler and substantially faster than previous schemes.