Proceedings ArticleDOI
Architecture- and workload- aware heterogeneous algorithms for sparse matrix vector multiplication
Reads0
Chats0
TLDR
This paper considers a class of sparse matrices that exhibit a scale-free nature and identifies a scheme that works well for such matrices and uses simple and effective mechanisms to determine the appropriate amount of work to be alloted to the CPU and the GPU.Abstract:
Multiplying a sparse matrix with a vector, denoted spmv, is a fundamental operation in linear algebra with several applications. Hence, efficient and scalable implementation of spmv has been a topic of immense research. Recent efforts are aimed at implementations on GPUs, multicore architectures, and such emerging computational platforms. Owing to the highly irregular nature of spmv, it is observed that GPUs and CPUs can offer comparable performance.In this paper, we propose three heterogeneous algorithms for spmv that simultaneously utilize both the CPU and the GPU. This is shown to lead to better resource utilization apart from performance gains. Our experiments of the work division schemes on standard datasets indicate that it is not in general possible to choose the most appropriate scheme given a matrix. We therefore consider a class of sparse matrices that exhibit a scale-free nature and identify a scheme that works well for such matrices. Finally, we use simple and effective mechanisms to determine the appropriate amount of work to be alloted to the CPU and the GPU.read more
Citations
More filters
Journal ArticleDOI
Sparse Matrix-Vector Multiplication on GPGPUs
TL;DR: This article provides a review of the techniques for implementing the SpMV kernel on GPGPUs that have appeared in the literature of the last few years, and discusses the issues and tradeoffs that have been encountered by the various researchers.
Journal ArticleDOI
SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems
Christina Giannoula,Ivan Fernandez,Juan G'omez-Luna,Nectarios Koziris,Georgios Goumas,Onur Mutlu +5 more
TL;DR: This work focuses on the development of near-bank PIM designs that tightly couple a PIM core with each DRAM bank, exploiting bank-level parallelism to expose high on-chip memory bandwidth of standard DRAM to processors.
Proceedings ArticleDOI
Applications of Ear Decomposition to Efficient Heterogeneous Algorithms for Shortest Path/Cycle Problems
TL;DR: The applicability of an ear decomposition of graphs to problems such as all-pairs-shortestpaths and minimum cost cycle basis is studied and it is shown that the resulting solutions are scalable in terms of both memory usage and also their speedup over best known current implementations.
Proceedings ArticleDOI
A Novel Heterogeneous Algorithm for Multiplying Scale-Free Sparse Matrices
TL;DR: This paper proposes an algorithm that multiplies two sparse matrices exhibiting scale-free nature on a CPU+GPU heterogeneous platform and shows that the approach is both architecture-aware, and workload-aware.
Journal Article
Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs
TL;DR: This work discusses implementing blocked sparse matrix-vector multiplication for NVIDIA GPUs with various optimizations, and outlines an algorithm and various optimizations that are faster on matrices having many high fill-ratio blocks but slower onMatrices with low number of non-zero elements per row.
References
More filters
Journal ArticleDOI
The university of Florida sparse matrix collection
Timothy A. Davis,Yifan Hu +1 more
TL;DR: The University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications, is described and a new multilevel coarsening scheme is proposed to facilitate this task.
Book
Applied Numerical Linear Algebra
TL;DR: The symmetric Eigenproblem and singular value decomposition and the Iterative methods for linear systems Bibliography Index.
Proceedings ArticleDOI
Implementing sparse matrix-vector multiplication on throughput-oriented processors
Nathan Bell,Michael Garland +1 more
TL;DR: This work explores SpMV methods that are well-suited to throughput-oriented architectures like the GPU and which exploit several common sparsity classes, including structured grid and unstructured mesh matrices.
Proceedings ArticleDOI
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Victor W. Lee,Changkyu Kim,Jatin Chhugani,Michael E. Deisher,Daehyun Kim,Anthony D. Nguyen,Nadathur Satish,Mikhail Smelyanskiy,Srinivas Chennupaty,Per Hammarlund,Ronak Singhal,Pradeep Dubey +11 more
TL;DR: This paper discusses optimization techniques for both CPU and GPU, analyzes what architecture features contributed to performance differences between the two architectures, and recommends a set of architectural features which provide significant improvement in architectural efficiency for throughput kernels.
Ecient Sparse Matrix-Vector Multiplication on CUDA
Nathan Bell,Michael Garland +1 more
TL;DR: Data structures and algorithms for SpMV that are eciently implemented on the CUDA platform for the ne-grained parallel architecture of the GPU and develop methods to exploit several common forms of matrix structure while oering alternatives which accommodate greater irregularity are developed.
Related Papers (5)
Performance Optimization Using Partitioned SpMV on GPUs and Multicore CPUs
Auto-Tuning Strategies for Parallelizing Sparse Matrix-Vector (SpMV) Multiplication on Multi- and Many-Core Processors
Kaixi Hou,Wu-chun Feng,Shuai Che +2 more
Sparse Matrix-Vector Multiplication Optimizations based on Matrix Bandwidth Reduction using NVIDIA CUDA
Shiming Xu,Hai Xiang Lin,Wei Xue +2 more