Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures

doi:10.1016/J.PARCO.2018.06.009

Open AccessJournal ArticleDOI

Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures

- Vol. 78, pp 33-46

TLDR

This paper develops parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures and develops a meta-algorithm, kkSpGEMM, to choose the right algorithm and data structure based on the characteristics of the problem.

Abstract:

Sparse matrix-matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM , to choose the right algorithm and data structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

IA-SpGEMM: an input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication

Zhen Xie, +3 more

TL;DR: IA-SpGEMM is proposed, an input-aware auto-tuning Framework for SpGemM that provides a unified programming interface in the CSR format and automatically determines the best format and algorithm for arbitrary sparse matrices.

...read moreread less

Journal ArticleDOI

Accelerating Sparse Matrix-Matrix Multiplication with GPU Tensor Cores

Orestis Zachariadis, +3 more

- 01 Dec 2020 -

Computers & Electrical Engineering

TL;DR: The key idea of the spGEMM algorithm, tSparse, is to multiply sparse rectangular blocks using the mixed precision mode of TCUs, the first time that TCUs are used in the context of spGemM.

...read moreread less

Proceedings ArticleDOI

Adaptive sparse matrix-matrix multiplication on the GPU

Martin Winter, +4 more

TL;DR: Evaluation on an extensive sparse matrix benchmark suggests this approach being the fastest SpGEMM implementation for highly sparse matrices (80% of the set) and when bit-stable results are sought, the approach is the fastest across the entire test set.

...read moreread less

Proceedings ArticleDOI

Fast Triangle Counting Using Cilk

Abdurrahman Yasar, +4 more

TL;DR: This paper develops an SpGEMM implementation that relies on a highly efficient, work-stealing, multithreaded runtime, and presents analysis of the scaling of the triangle counting implementation as the graph sizes increase using both synthetic and real graphs from the graph challenge data set.

...read moreread less

Proceedings ArticleDOI

Scalable Inference for Sparse Deep Neural Networks using Kokkos Kernels

J. Austin Ellis, +1 more

TL;DR: This work bases their sparse network for DNNs, KK-SpDNN, on the sparse linear algebra kernels within the Kokkos Kernels library, and uses the sparse matrix-matrix multiplication in Kok Kos Kernels to reuse a highly optimized kernel.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

The university of Florida sparse matrix collection

Timothy A. Davis, +1 more

- 07 Dec 2011 -

ACM Transactions on Mathematical Softwar...

TL;DR: The University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications, is described and a new multilevel coarsening scheme is proposed to facilitate this task.

...read moreread less

Journal ArticleDOI

Kokkos: Enabling manycore performance portability through polymorphic memory access patterns

H. Carter Edwards, +2 more

- 01 Dec 2014 -

Journal of Parallel and Distributed Comp...

TL;DR: Kokkos’ abstractions are described, its application programmer interface (API) is summarized, performance results for unit-test kernels and mini-applications are presented, and an incremental strategy for migrating legacy C++ codes to Kokkos is outlined.

...read moreread less

Book ChapterDOI

Intel Math Kernel Library

Endong Wang, +6 more

TL;DR: In order to achieve optimal performance on multi-core and multi-processor systems, the features of parallelism and manage the memory hierarchical characters efficiently need to be used.

...read moreread less

Journal ArticleDOI

The Combinatorial BLAS: design, implementation, and applications

Aydin Buluc, +1 more

TL;DR: The parallel Combinatorial BLAS is described, which consists of a small but powerful set of linear algebra primitives specifically targeting graph and data mining applications, and an extensible library interface and some guiding principles for future development are provided.

...read moreread less

Journal ArticleDOI

Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition

Fred G. Gustavson

- 01 Sep 1978 -

ACM Transactions on Mathematical Softwar...

TL;DR: An O(M) algorithm is produced to solve A x = b where M is the number of multiplications needed to factor A into L U and the concept of an unordered merge plays a key role in obtaining this algorithm.

...read moreread less