scispace - formally typeset
Proceedings ArticleDOI

Implementing the Jaccard Index on the Migratory Memory-Side Processing Emu Architecture

TLDR
An implementation of the Jaccard Index for graphs on the Migratory Memory-Side Processing Emu architecture, which was designed to find similarities between different vertices in a graph, and is often used to identify communities is presented.
Abstract
We present an implementation of the Jaccard Index for graphs on the Migratory Memory-Side Processing Emu architecture. This index was designed to find similarities between different vertices in a graph, and is often used to identify communities. The Emu architecture is a parallel system based on a partitioned global address space, with threads automatically migrating inside the memory. We introduce the parallel programming model used to exploit it, detail our implementation of the algorithm, and analyze simulated performance results as well as early hardware tests. We discuss its application to large scale problems.

read more

Citations
More filters
Proceedings ArticleDOI

Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons

TL;DR: SimilarityAtScale as mentioned in this paper is the first communication-efficient distributed algorithm for computing the Jaccard similarity among pairs of large datasets, which provides an efficient encoding of this problem into a multiplication of sparse matrices.
Posted Content

Communication-Efficient Jaccard Similarity for High-Performance Distributed Genome Comparisons

TL;DR: The design and implementation of SimilarityAtScale is designed and implemented, the first communication-efficient distributed algorithm for computing the Jaccard similarity among pairs of large datasets, and the resulting scheme is the first to enable accurateJaccard distance derivations for massive datasets, using large-scale distributed-memory systems.
Proceedings ArticleDOI

GraphChallenge.org Triangle Counting Performance

TL;DR: These submissions show that their state-of-the-art triangle counting execution time is a strong function of the number of edges in the graph, which improved significantly from 2017 to 2018 and remained comparable from 2018 to 2019.
Posted Content

Programming Strategies for Irregular Algorithms on the Emu Chick

TL;DR: This work evaluates irregular algorithms that could benefit from the lightweight, memory-side processing of the Chick and demonstrates techniques and optimization strategies for achieving performance in sparse matrix-vector multiply operation (SpMV), breadth-first search (BFS), and graph alignment across up to eight distributed nodes encompassing 64 nodelets in the Chick system.
Journal ArticleDOI

A Microbenchmark Characterization of the Emu Chick

TL;DR: This multi-node characterization of the Emu Chick extends an earlier single-node investigation of the the memory bandwidth characteristics of the system through benchmarks like STREAM, pointer chasing, and sparse matrix-vector multiplication and demonstrates that for many basic operations the EmU Chick can use available memory bandwidth more efficiently than a more traditional, cache-based architecture.
References
More filters
Journal ArticleDOI

Cilk: An Efficient Multithreaded Runtime System

TL;DR: It is shown that on real and synthetic applications, the “work” and “critical-path length” of a Cilk computation can be used to model performance accurately, and it is proved that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal.
Proceedings Article

R-MAT: A Recursive Model for Graph Mining

TL;DR: A simple, parsimonious model, the “recursive matrix” (R-MAT) model, which can quickly generate realistic graphs, capturing the essence of each graph in only a few parameters is proposed.
Proceedings ArticleDOI

Cilk: an efficient multithreaded runtime system

TL;DR: This paper shows that on real and synthetic applications, the “work” and “critical path” of a Cilk computation can be used to accurately model performance, and proves that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time and communication bounds all within a constant factor of optimal.
Book ChapterDOI

A Pragmatic Implementation of Non-blocking Linked-Lists

TL;DR: This work presents a new non-blocking implementation of concurrent linked-lists supporting linearizable insertion and deletion operations, conceptually simpler and substantially faster than previous schemes.
Related Papers (5)