Journal ArticleDOI
GPU-accelerated preconditioned iterative linear solvers
Ruipeng Li,Yousef Saad +1 more
TLDR
This work is an overview of the preliminary experience in developing a high-performance iterative linear solver accelerated by GPU coprocessors and techniques for speeding up sparse matrix-vector product (SpMV) kernels and finding suitable preconditioning methods are discussed.Abstract:
This work is an overview of our preliminary experience in developing a high-performance iterative linear solver accelerated by GPU coprocessors. Our goal is to illustrate the advantages and difficulties encountered when deploying GPU technology to perform sparse linear algebra computations. Techniques for speeding up sparse matrix-vector product (SpMV) kernels and finding suitable preconditioning methods are discussed. Our experiments with an NVIDIA TESLA M2070 show that for unstructured matrices SpMV kernels can be up to 8 times faster on the GPU than the Intel MKL on the host Intel Xeon X5675 Processor. Overall performance of the GPU-accelerated Incomplete Cholesky (IC) factorization preconditioned CG method can outperform its CPU counterpart by a smaller factor, up to 3, and GPU-accelerated The incomplete LU (ILU) factorization preconditioned GMRES method can achieve a speed-up nearing 4. However, with better suited preconditioning techniques for GPUs, this performance can be further improved.read more
Citations
More filters
Proceedings ArticleDOI
Multicore bundle adjustment
TL;DR: The design and implementation of new inexact Newton type Bundle Adjustment algorithms that exploit hardware parallelism for efficiently solving large scale 3D scene reconstruction problems and show that overcoming the severe memory and bandwidth limitations of current generation GPUs not only leads to more space efficient algorithms, but also to surprising savings in runtime.
Proceedings ArticleDOI
CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication
Weifeng Liu,Brian Vinter +1 more
TL;DR: CSR5 (Compressed Sparse Row 5), a new storage format, which offers high-throughput SpMV on various platforms including CPUs, GPUs and Xeon Phi, is proposed for real-world applications such as a solver with only tens of iterations because of its low-overhead for format conversion.
Journal ArticleDOI
Fine-Grained Parallel Incomplete LU Factorization
Edmond Chow,Aftab Patel +1 more
TL;DR: Numerical tests show that very few sweeps are needed to construct a factorization that is an effective preconditioner, and the amount of parallelism is large irrespective of the ordering of the matrix, and matrix ordering can be used to enhance the accuracy of the factorization rather than to increase parallelism.
Posted Content
CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication
Weifeng Liu,Brian Vinter +1 more
TL;DR: In this article, the authors proposed CSR5 (Compressed Sparse Row 5), a new storage format, which offers high-throughput SpMV on various platforms including CPUs, GPUs and Xeon Phi.
Journal ArticleDOI
Sparse Matrix-Vector Multiplication on GPGPUs
TL;DR: This article provides a review of the techniques for implementing the SpMV kernel on GPGPUs that have appeared in the literature of the last few years, and discusses the issues and tradeoffs that have been encountered by the various researchers.
References
More filters
Book
Iterative Methods for Sparse Linear Systems
TL;DR: This chapter discusses methods related to the normal equations of linear algebra, and some of the techniques used in this chapter were derived from previous chapters of this book.
Journal ArticleDOI
GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems
Youcef Saad,Martin H. Schultz +1 more
TL;DR: An iterative method for solving linear systems, which has the property of minimizing at every step the norm of the residual vector over a Krylov subspace.
Journal ArticleDOI
An iteration method for the solution of the eigenvalue problem of linear differential and integral operators
TL;DR: In this article, a systematic method for finding the latent roots and principal axes of a matrix, without reducing the order of the matrix, has been proposed, which is characterized by a wide field of applicability and great accuracy, since the accumulation of rounding errors is avoided, through the process of minimized iterations.
Journal ArticleDOI
The university of Florida sparse matrix collection
Timothy A. Davis,Yifan Hu +1 more
TL;DR: The University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications, is described and a new multilevel coarsening scheme is proposed to facilitate this task.