scispace - formally typeset
Journal ArticleDOI

GPU-accelerated preconditioned iterative linear solvers

Ruipeng Li, +1 more
- 01 Feb 2013 - 
- Vol. 63, Iss: 2, pp 443-466
TLDR
This work is an overview of the preliminary experience in developing a high-performance iterative linear solver accelerated by GPU coprocessors and techniques for speeding up sparse matrix-vector product (SpMV) kernels and finding suitable preconditioning methods are discussed.
Abstract
This work is an overview of our preliminary experience in developing a high-performance iterative linear solver accelerated by GPU coprocessors. Our goal is to illustrate the advantages and difficulties encountered when deploying GPU technology to perform sparse linear algebra computations. Techniques for speeding up sparse matrix-vector product (SpMV) kernels and finding suitable preconditioning methods are discussed. Our experiments with an NVIDIA TESLA M2070 show that for unstructured matrices SpMV kernels can be up to 8 times faster on the GPU than the Intel MKL on the host Intel Xeon X5675 Processor. Overall performance of the GPU-accelerated Incomplete Cholesky (IC) factorization preconditioned CG method can outperform its CPU counterpart by a smaller factor, up to 3, and GPU-accelerated The incomplete LU (ILU) factorization preconditioned GMRES method can achieve a speed-up nearing 4. However, with better suited preconditioning techniques for GPUs, this performance can be further improved.

read more

Citations
More filters
Proceedings ArticleDOI

Multicore bundle adjustment

TL;DR: The design and implementation of new inexact Newton type Bundle Adjustment algorithms that exploit hardware parallelism for efficiently solving large scale 3D scene reconstruction problems and show that overcoming the severe memory and bandwidth limitations of current generation GPUs not only leads to more space efficient algorithms, but also to surprising savings in runtime.
Proceedings ArticleDOI

CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication

TL;DR: CSR5 (Compressed Sparse Row 5), a new storage format, which offers high-throughput SpMV on various platforms including CPUs, GPUs and Xeon Phi, is proposed for real-world applications such as a solver with only tens of iterations because of its low-overhead for format conversion.
Journal ArticleDOI

Fine-Grained Parallel Incomplete LU Factorization

TL;DR: Numerical tests show that very few sweeps are needed to construct a factorization that is an effective preconditioner, and the amount of parallelism is large irrespective of the ordering of the matrix, and matrix ordering can be used to enhance the accuracy of the factorization rather than to increase parallelism.
Posted Content

CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication

TL;DR: In this article, the authors proposed CSR5 (Compressed Sparse Row 5), a new storage format, which offers high-throughput SpMV on various platforms including CPUs, GPUs and Xeon Phi.
Journal ArticleDOI

Sparse Matrix-Vector Multiplication on GPGPUs

TL;DR: This article provides a review of the techniques for implementing the SpMV kernel on GPGPUs that have appeared in the literature of the last few years, and discusses the issues and tradeoffs that have been encountered by the various researchers.
References
More filters
Book

Iterative Methods for Sparse Linear Systems

Yousef Saad
TL;DR: This chapter discusses methods related to the normal equations of linear algebra, and some of the techniques used in this chapter were derived from previous chapters of this book.
Journal ArticleDOI

GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems

TL;DR: An iterative method for solving linear systems, which has the property of minimizing at every step the norm of the residual vector over a Krylov subspace.
Journal ArticleDOI

An iteration method for the solution of the eigenvalue problem of linear differential and integral operators

TL;DR: In this article, a systematic method for finding the latent roots and principal axes of a matrix, without reducing the order of the matrix, has been proposed, which is characterized by a wide field of applicability and great accuracy, since the accumulation of rounding errors is avoided, through the process of minimized iterations.
Journal ArticleDOI

The university of Florida sparse matrix collection

TL;DR: The University of Florida Sparse Matrix Collection, a large and actively growing set of sparse matrices that arise in real applications, is described and a new multilevel coarsening scheme is proposed to facilitate this task.