Architecture- and workload- aware heterogeneous algorithms for sparse matrix vector multiplication

doi:10.1145/2675744.2675749

Proceedings ArticleDOI

Architecture- and workload- aware heterogeneous algorithms for sparse matrix vector multiplication

Sivaramakrishna Bharadwaj Indarapu, +2 more

- pp 3

Chats0

TLDR

This paper considers a class of sparse matrices that exhibit a scale-free nature and identifies a scheme that works well for such matrices and uses simple and effective mechanisms to determine the appropriate amount of work to be alloted to the CPU and the GPU.

Abstract:

Multiplying a sparse matrix with a vector, denoted spmv, is a fundamental operation in linear algebra with several applications. Hence, efficient and scalable implementation of spmv has been a topic of immense research. Recent efforts are aimed at implementations on GPUs, multicore architectures, and such emerging computational platforms. Owing to the highly irregular nature of spmv, it is observed that GPUs and CPUs can offer comparable performance.In this paper, we propose three heterogeneous algorithms for spmv that simultaneously utilize both the CPU and the GPU. This is shown to lead to better resource utilization apart from performance gains. Our experiments of the work division schemes on standard datasets indicate that it is not in general possible to choose the most appropriate scheme given a matrix. We therefore consider a class of sparse matrices that exhibit a scale-free nature and identify a scheme that works well for such matrices. Finally, we use simple and effective mechanisms to determine the appropriate amount of work to be alloted to the CPU and the GPU.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Sparse Matrix-Vector Multiplication on GPGPUs

Salvatore Filippone, +3 more

- 09 Jan 2017 -

ACM Transactions on Mathematical Softwar...

TL;DR: This article provides a review of the techniques for implementing the SpMV kernel on GPGPUs that have appeared in the literature of the last few years, and discusses the issues and tradeoffs that have been encountered by the various researchers.

...read moreread less

Journal ArticleDOI

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems

Christina Giannoula, +5 more

- 13 Jan 2022 -

arXiv.org

TL;DR: This work focuses on the development of near-bank PIM designs that tightly couple a PIM core with each DRAM bank, exploiting bank-level parallelism to expose high on-chip memory bandwidth of standard DRAM to processors.

...read moreread less

Proceedings ArticleDOI

Applications of Ear Decomposition to Efficient Heterogeneous Algorithms for Shortest Path/Cycle Problems

Debarshi Dutta, +3 more

TL;DR: The applicability of an ear decomposition of graphs to problems such as all-pairs-shortestpaths and minimum cost cycle basis is studied and it is shown that the resulting solutions are scalable in terms of both memory usage and also their speedup over best known current implementations.

...read moreread less

Proceedings ArticleDOI

A Novel Heterogeneous Algorithm for Multiplying Scale-Free Sparse Matrices

Kiran Raj Ramamoorthy, +3 more

TL;DR: This paper proposes an algorithm that multiplies two sparse matrices exhibiting scale-free nature on a CPU+GPU heterogeneous platform and shows that the approach is both architecture-aware, and workload-aware.

...read moreread less

Journal Article

Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs

Alexander Monakov, +1 more

- 01 Jan 2009 -

Lecture Notes in Computer Science

TL;DR: This work discusses implementing blocked sparse matrix-vector multiplication for NVIDIA GPUs with various optimizations, and outlines an algorithm and various optimizations that are faster on matrices having many high fill-ratio blocks but slower onMatrices with low number of non-zero elements per row.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

The university of Florida sparse matrix collection

Timothy A. Davis, +1 more

- 07 Dec 2011 -

ACM Transactions on Mathematical Softwar...

TL;DR: The University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications, is described and a new multilevel coarsening scheme is proposed to facilitate this task.

...read moreread less

Book

Applied Numerical Linear Algebra

James Demmel

TL;DR: The symmetric Eigenproblem and singular value decomposition and the Iterative methods for linear systems Bibliography Index.

...read moreread less

Proceedings ArticleDOI

Implementing sparse matrix-vector multiplication on throughput-oriented processors

Nathan Bell, +1 more

TL;DR: This work explores SpMV methods that are well-suited to throughput-oriented architectures like the GPU and which exploit several common sparsity classes, including structured grid and unstructured mesh matrices.

...read moreread less

Proceedings ArticleDOI

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Victor W. Lee, +11 more

TL;DR: This paper discusses optimization techniques for both CPU and GPU, analyzes what architecture features contributed to performance differences between the two architectures, and recommends a set of architectural features which provide significant improvement in architectural efficiency for throughput kernels.

...read moreread less

Ecient Sparse Matrix-Vector Multiplication on CUDA

Nathan Bell, +1 more

TL;DR: Data structures and algorithms for SpMV that are eciently implemented on the CUDA platform for the ne-grained parallel architecture of the GPU and develop methods to exploit several common forms of matrix structure while oering alternatives which accommodate greater irregularity are developed.

...read moreread less

Collapse

Architecture- and workload- aware heterogeneous algorithms for sparse matrix vector multiplication

Citations

Sparse Matrix-Vector Multiplication on GPGPUs

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems

Applications of Ear Decomposition to Efficient Heterogeneous Algorithms for Shortest Path/Cycle Problems

A Novel Heterogeneous Algorithm for Multiplying Scale-Free Sparse Matrices

Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs

References

The university of Florida sparse matrix collection

Applied Numerical Linear Algebra

Implementing sparse matrix-vector multiplication on throughput-oriented processors

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Ecient Sparse Matrix-Vector Multiplication on CUDA

Related Papers (5)

Performance Optimization Using Partitioned SpMV on GPUs and Multicore CPUs

Auto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors

Sparse Matrix-Vector Multiplication Optimizations based on Matrix Bandwidth Reduction using NVIDIA CUDA

Improving the Performance of the Sparse Matrix Vector Product with GPUs

Ecient Sparse Matrix-Vector Multiplication on CUDA