Implementing the Jaccard Index on the Migratory Memory-Side Processing Emu Architecture

doi:10.1109/HPEC.2018.8547631

Proceedings ArticleDOI

Implementing the Jaccard Index on the Migratory Memory-Side Processing Emu Architecture

- pp 1-6

TLDR

An implementation of the Jaccard Index for graphs on the Migratory Memory-Side Processing Emu architecture, which was designed to find similarities between different vertices in a graph, and is often used to identify communities is presented.

Abstract:

We present an implementation of the Jaccard Index for graphs on the Migratory Memory-Side Processing Emu architecture. This index was designed to find similarities between different vertices in a graph, and is often used to identify communities. The Emu architecture is a parallel system based on a partitioned global address space, with threads automatically migrating inside the memory. We introduce the parallel programming model used to exploit it, detail our implementation of the algorithm, and analyze simulated performance results as well as early hardware tests. We discuss its application to large scale problems.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons

Maciej Besta, +6 more

TL;DR: SimilarityAtScale as mentioned in this paper is the first communication-efficient distributed algorithm for computing the Jaccard similarity among pairs of large datasets, which provides an efficient encoding of this problem into a multiplication of sparse matrices.

...read moreread less

Posted Content

Communication-Efficient Jaccard Similarity for High-Performance Distributed Genome Comparisons

Maciej Besta, +6 more

- 11 Nov 2019 -

arXiv: Computational Engineering, Financ...

TL;DR: The design and implementation of SimilarityAtScale is designed and implemented, the first communication-efficient distributed algorithm for computing the Jaccard similarity among pairs of large datasets, and the resulting scheme is the first to enable accurateJaccard distance derivations for massive datasets, using large-scale distributed-memory systems.

...read moreread less

Proceedings ArticleDOI

GraphChallenge.org Triangle Counting Performance

Siddharth Samsi, +11 more

- 18 Mar 2020 -

arXiv: Distributed, Parallel, and Cluste...

TL;DR: These submissions show that their state-of-the-art triangle counting execution time is a strong function of the number of edges in the graph, which improved significantly from 2017 to 2018 and remained comparable from 2018 to 2019.

...read moreread less

Posted Content

Programming Strategies for Irregular Algorithms on the Emu Chick

Eric R. Hein, +9 more

- 03 Dec 2018 -

arXiv: Distributed, Parallel, and Cluste...

TL;DR: This work evaluates irregular algorithms that could benefit from the lightweight, memory-side processing of the Chick and demonstrates techniques and optimization strategies for achieving performance in sparse matrix-vector multiply operation (SpMV), breadth-first search (BFS), and graph alignment across up to eight distributed nodes encompassing 64 nodelets in the Chick system.

...read moreread less

Journal ArticleDOI

A Microbenchmark Characterization of the Emu Chick

Jeffrey Young, +7 more

TL;DR: This multi-node characterization of the Emu Chick extends an earlier single-node investigation of the the memory bandwidth characteristics of the system through benchmarks like STREAM, pointer chasing, and sparse matrix-vector multiplication and demonstrates that for many basic operations the EmU Chick can use available memory bandwidth more efficiently than a more traditional, cache-based architecture.

...read moreread less

References

PDF

Open Access

More filters

DOI

Étude comparative de la distribution florale dans une portion des Alpes et du Jura

Paul Jaccard

Journal ArticleDOI

Cilk: An Efficient Multithreaded Runtime System

Robert D. Blumofe, +5 more

- 25 Aug 1996 -

Journal of Parallel and Distributed Comp...

TL;DR: It is shown that on real and synthetic applications, the “work” and “critical-path length” of a Cilk computation can be used to model performance accurately, and it is proved that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal.

...read moreread less

Proceedings Article

R-MAT: A Recursive Model for Graph Mining

Deepayan Chakrabarti, +2 more

TL;DR: A simple, parsimonious model, the “recursive matrix” (R-MAT) model, which can quickly generate realistic graphs, capturing the essence of each graph in only a few parameters is proposed.

...read moreread less

Proceedings ArticleDOI

Cilk: an efficient multithreaded runtime system

Robert D. Blumofe, +5 more

TL;DR: This paper shows that on real and synthetic applications, the “work” and “critical path” of a Cilk computation can be used to accurately model performance, and proves that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time and communication bounds all within a constant factor of optimal.

...read moreread less

Book ChapterDOI

A Pragmatic Implementation of Non-blocking Linked-Lists

Tim Harris

TL;DR: This work presents a new non-blocking implementation of concurrent linked-lists supporting linearizable insertion and deletion operations, conceptually simpler and substantially faster than previous schemes.

...read moreread less

Implementing the Jaccard Index on the Migratory Memory-Side Processing Emu Architecture

Citations

Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons

Communication-Efficient Jaccard Similarity for High-Performance Distributed Genome Comparisons

GraphChallenge.org Triangle Counting Performance

Programming Strategies for Irregular Algorithms on the Emu Chick

A Microbenchmark Characterization of the Emu Chick

References

Étude comparative de la distribution florale dans une portion des Alpes et du Jura

Cilk: An Efficient Multithreaded Runtime System

R-MAT: A Recursive Model for Graph Mining

Cilk: an efficient multithreaded runtime system

A Pragmatic Implementation of Non-blocking Linked-Lists

Related Papers (5)

An OpenMP algorithm and implementation for clustering biological graphs

On Analyzing Large Graphs Using GPUs

A scalable parallel union-find algorithm for distributed memory computers

Parallel breadth-first search on distributed memory systems

Parallel algorithms for clustering biological graphs on distributed and shared memory architectures