scispace - formally typeset
Proceedings ArticleDOI

Architecture- and workload- aware heterogeneous algorithms for sparse matrix vector multiplication

Reads0
Chats0
TLDR
This paper considers a class of sparse matrices that exhibit a scale-free nature and identifies a scheme that works well for such matrices and uses simple and effective mechanisms to determine the appropriate amount of work to be alloted to the CPU and the GPU.
Abstract
Multiplying a sparse matrix with a vector, denoted spmv, is a fundamental operation in linear algebra with several applications. Hence, efficient and scalable implementation of spmv has been a topic of immense research. Recent efforts are aimed at implementations on GPUs, multicore architectures, and such emerging computational platforms. Owing to the highly irregular nature of spmv, it is observed that GPUs and CPUs can offer comparable performance.In this paper, we propose three heterogeneous algorithms for spmv that simultaneously utilize both the CPU and the GPU. This is shown to lead to better resource utilization apart from performance gains. Our experiments of the work division schemes on standard datasets indicate that it is not in general possible to choose the most appropriate scheme given a matrix. We therefore consider a class of sparse matrices that exhibit a scale-free nature and identify a scheme that works well for such matrices. Finally, we use simple and effective mechanisms to determine the appropriate amount of work to be alloted to the CPU and the GPU.

read more

Citations
More filters
Journal ArticleDOI

Sparse Matrix-Vector Multiplication on GPGPUs

TL;DR: This article provides a review of the techniques for implementing the SpMV kernel on GPGPUs that have appeared in the literature of the last few years, and discusses the issues and tradeoffs that have been encountered by the various researchers.
Journal ArticleDOI

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems

TL;DR: This work focuses on the development of near-bank PIM designs that tightly couple a PIM core with each DRAM bank, exploiting bank-level parallelism to expose high on-chip memory bandwidth of standard DRAM to processors.
Proceedings ArticleDOI

Applications of Ear Decomposition to Efficient Heterogeneous Algorithms for Shortest Path/Cycle Problems

TL;DR: The applicability of an ear decomposition of graphs to problems such as all-pairs-shortestpaths and minimum cost cycle basis is studied and it is shown that the resulting solutions are scalable in terms of both memory usage and also their speedup over best known current implementations.
Proceedings ArticleDOI

A Novel Heterogeneous Algorithm for Multiplying Scale-Free Sparse Matrices

TL;DR: This paper proposes an algorithm that multiplies two sparse matrices exhibiting scale-free nature on a CPU+GPU heterogeneous platform and shows that the approach is both architecture-aware, and workload-aware.
Journal Article

Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs

TL;DR: This work discusses implementing blocked sparse matrix-vector multiplication for NVIDIA GPUs with various optimizations, and outlines an algorithm and various optimizations that are faster on matrices having many high fill-ratio blocks but slower onMatrices with low number of non-zero elements per row.
References
More filters
Journal ArticleDOI

The university of Florida sparse matrix collection

TL;DR: The University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications, is described and a new multilevel coarsening scheme is proposed to facilitate this task.
Book

Applied Numerical Linear Algebra

TL;DR: The symmetric Eigenproblem and singular value decomposition and the Iterative methods for linear systems Bibliography Index.
Proceedings ArticleDOI

Implementing sparse matrix-vector multiplication on throughput-oriented processors

TL;DR: This work explores SpMV methods that are well-suited to throughput-oriented architectures like the GPU and which exploit several common sparsity classes, including structured grid and unstructured mesh matrices.
Proceedings ArticleDOI

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

TL;DR: This paper discusses optimization techniques for both CPU and GPU, analyzes what architecture features contributed to performance differences between the two architectures, and recommends a set of architectural features which provide significant improvement in architectural efficiency for throughput kernels.

Ecient Sparse Matrix-Vector Multiplication on CUDA

TL;DR: Data structures and algorithms for SpMV that are eciently implemented on the CUDA platform for the ne-grained parallel architecture of the GPU and develop methods to exploit several common forms of matrix structure while oering alternatives which accommodate greater irregularity are developed.
Related Papers (5)