scispace - formally typeset
Search or ask a question
Topic

QR decomposition

About: QR decomposition is a research topic. Over the lifetime, 3504 publications have been published within this topic receiving 100599 citations. The topic is also known as: QR factorization.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, a new way to represent products of Householder matrices is given that makes a typical Householder matrix algorithm rich in matrix-matrix multiplication, which is very desirable in that matrixmatrix...
Abstract: A new way to represent products of Householder matrices is given that makes a typical Householder matrix algorithm rich in matrix-matrix multiplication. This is very desirable in that matrix-matrix...

257 citations

Journal ArticleDOI
TL;DR: This work generalizes a lower bound on the amount of communication needed to perform dense, n-by-n matrix multiplication using the conventional O(n3) algorithm to a much wider variety of algorithms, including LU factorization, Cholesky factors, LDLT factors, QR factors, the Gram–Schmidt algorithm, and algorithms for eigenvalues and singular values.
Abstract: In 1981 Hong and Kung [HK81] proved a lower bound on the amount of communication (amount of data moved between a small, fast memory and large, slow memory) needed to perform dense,n-by-nmatrix-multiplication using the conventionalO(n 3 ) algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo and Tiskin [ITT04] gave a new proof of this result and extended it to the parallel case (where communication means the amount of data moved between processors). In both cases the lower bound may be expressed as !(#arithmetic operations / ! M), whereMis the size of the fast memory (or local memory in the parallel case). Here we generalize these results to a much wider variety of algorithms, including LU factorization, Cholesky factorization,LDL T factorization, QR factorization, algorithms for eigenvalues and singular values, i.e., essentially all direct methods of linear algebra. The proof works for dense or sparse matrices, and for sequential or parallel algorithms. In addition to lower bounds on the amount of data moved (bandwidth) we get lower bounds on the number of messages required to move it (latency). We illustrate how to extend our lower bound technique to compositions of linear algebra operations (like computing powers of a matrix), to decide whether it is enough to call a sequence of simpler optimal algorithms (like matrix multiplication) to minimize communication, or if we can do better. We give examples of both. We also show how to extend our lower bounds to certain graph theoretic problems. We point out recently designed algorithms for dense LU, Cholesky, QR, eigenvalue and the SVD problems that attain these lower bounds; implementations of LU and QR show large speedups over conventional linear algebra algorithms in standard libraries like LAPACK and ScaLAPACK. Many open problems remain.

257 citations

Journal ArticleDOI
TL;DR: An approximation algorithm for finding optimal decompositions which is based on the insight provided by the theorem and significantly outperforms a greedy approximation algorithms for a set covering problem to which the problem of matrix decomposition is easily shown to be reducible.

254 citations

Journal ArticleDOI
TL;DR: The classical and modified Gram-Schmidt (CGS) orthogonalization is one of the fundamental procedures in linear algebra as mentioned in this paper, and it is equivalent to the factorization AQ1R, where Q1∈Rm×n with orthonormal columns and R upper triangular.

242 citations

Journal ArticleDOI
TL;DR: SuiteSparseQR is a sparse QR factorization package based on the multifrontal method that obtains a substantial fraction of the theoretical peak performance of a multicore computer.
Abstract: SuiteSparseQR is a sparse QR factorization package based on the multifrontal method Within each frontal matrix, LAPACK and the multithreaded BLAS enable the method to obtain high performance on multicore architectures Parallelism across different frontal matrices is handled with Intel's Threading Building Blocks library The symbolic analysis and ordering phase pre-eliminates singletons by permuting the input matrix A into the form [R11R12; 0 A22] where R11 is upper triangular with diagonal entries above a given tolerance Next, the fill-reducing ordering, column elimination tree, and frontal matrix structures are found without requiring the formation of the pattern of ATA Approximate rank-detection is performed within each frontal matrix using Heath's method While Heath's method is not always exact, it has the advantage of not requiring column pivoting and thus does not interfere with the fill-reducing ordering For sufficiently large problems, the resulting sparse QR factorization obtains a substantial fraction of the theoretical peak performance of a multicore computer

241 citations


Network Information
Related Topics (5)
Optimization problem
96.4K papers, 2.1M citations
85% related
Network packet
159.7K papers, 2.2M citations
84% related
Robustness (computer science)
94.7K papers, 1.6M citations
83% related
Wireless network
122.5K papers, 2.1M citations
83% related
Wireless sensor network
142K papers, 2.4M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202331
202273
202190
2020132
2019126
2018139