Topic
QR decomposition
About: QR decomposition is a research topic. Over the lifetime, 3504 publications have been published within this topic receiving 100599 citations. The topic is also known as: QR factorization.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: In this paper, a technique for computing partial rank-revealing factorizations, such as a partial QR factorization or a partial singular value decomposition, is described, which is inspired by the Gram-Schmidt algorithm and has the same asymptotic flop count.
Abstract: This manuscript describes a technique for computing partial rank-revealing factorizations, such as a partial QR factorization or a partial singular value decomposition. The method takes as input a tolerance $\varepsilon$ and an $m\times n$ matrix $\boldsymbol{\mathsf{A}}$ and returns an approximate low-rank factorization of $\boldsymbol{\mathsf{A}}$ that is accurate to within precision $\varepsilon$ in the Frobenius norm (or some other easily computed norm). The rank $k$ of the computed factorization (which is an output of the algorithm) is in all examples we examined very close to the theoretically optimal $\varepsilon$-rank. The proposed method is inspired by the Gram--Schmidt algorithm and has the same $O(mnk)$ asymptotic flop count. However, the method relies on randomized sampling to avoid column pivoting, which allows it to be blocked, and hence accelerates practical computations by reducing communication. Numerical experiments demonstrate that the accuracy of the scheme is for every matrix that was...
97 citations
••
TL;DR: This paper presents architectures and field-programmable gate-array designs of two variants of the DCD algorithm, known as cyclic and leading DCD algorithms, and proposes fixed-point designs that provide an accuracy performance that is very close to the performance of floating-point counterparts and require significantly lower FPGA resources than techniques based on QR decomposition.
Abstract: In the areas of signal processing and communications, such as antenna-array beamforming, adaptive filtering, multiuser and multiple-input-multiple-output (MIMO) detection, channel estimation and equalization, echo and interference cancellation, and others, solving linear systems of equations often provides an optimal performance. However, this is also a very complicated operation that designers try to avoid by proposing different suboptimal techniques. The dichotomous coordinate descent (DCD) algorithm allows linear systems of equations to be solved with high computational efficiency. In this paper, we present architectures and field-programmable gate-array (FPGA) designs of two variants of the DCD algorithm, which are known as cyclic and leading DCD algorithms. For each of these techniques, we present serial designs, group-2 and group-4 designs, as well as a design with parallel update of the residual vector for the cyclic DCD algorithm. These designs have different degrees of parallelism, thus enabling a tradeoff between FPGA resources and computation time. The serial designs require the smallest FPGA resources; they are well suited for applications where many parallel solvers are required, e.g., for detection in MIMO-orthogonal-frequency-division-multiplexing communication systems. The parallelism introduced in the proposed group-2 and group-4 designs allows faster convergence to the true solution at the expense of an increase in FPGA resources. The design with parallel update of the residual vector provides the fastest convergence speed; however, if the system size is high, it may result in a significant increase in FPGA resources. The proposed fixed-point designs provide an accuracy performance that is very close to the performance of floating-point counterparts and require significantly lower FPGA resources than techniques based on QR decomposition.
96 citations
••
27 May 2007TL;DR: The architecture and results of the first VLSI implementation of an iterative sorted QR decomposition preprocessor for MIMO receivers are described, which performs MIMM channel preprocessing using Givens rotations and provides the base for an improved layered stream decoding.
Abstract: The QR decomposition is an important, but often underestimated prerequisite for pseudo- or non-linear detection methods such as successive interference cancellation or sphere decoding for multiple-input multiple-output (MIMO) systems. The ability of concurrent iterative sorting during the QR decomposition introduces a moderate overall latency, but provides the base for an improved layered stream decoding. This paper describes the architecture and results of the first VLSI implementation of an iterative sorted QR decomposition preprocessor for MIMO receivers. The presented architecture performs MIMO channel preprocessing using Givens rotations in order to compute the minimum mean squared error QR decomposition
95 citations
••
05 Sep 2010TL;DR: This work improves on the latest published approaches to bundle adjustment with conjugate gradients by making full use of the least squares nature of the problem and shows how a certain property of the preconditioned system allows us to reduce the work per iteration to roughly half of the standard CG algorithm.
Abstract: Bundle adjustment for multi-view reconstruction is traditionally done using the Levenberg-Marquardt algorithm with a direct linear solver, which is computationally very expensive. An alternative to this approach is to apply the conjugate gradients algorithm in the inner loop. This is appealing since the main computational step of the CG algorithm involves only a simple matrix-vector multiplication with the Jacobian. In this work we improve on the latest published approaches to bundle adjustment with conjugate gradients by making full use of the least squares nature of the problem. We employ an easy-to-compute QR factorization based block preconditioner and show how a certain property of the preconditioned system allows us to reduce the work per iteration to roughly half of the standard CG algorithm.
94 citations
••
TL;DR: In this paper, the generalized lasso dual path algorithm given by Tibshirani and Taylor in 2011 is considered and a generic approach that covers any penalty matrix D and any (full column rank) matrix X of predictor variables is described.
Abstract: We consider efficient implementations of the generalized lasso dual path algorithm given by Tibshirani and Taylor in 2011. We first describe a generic approach that covers any penalty matrix D and any (full column rank) matrix X of predictor variables. We then describe fast implementations for the special cases of trend filtering problems, fused lasso problems, and sparse fused lasso problems, both with X = I and a general matrix X. These specialized implementations offer a considerable improvement over the generic implementation, both in terms of numerical stability and efficiency of the solution path computation. These algorithms are all available for use in the genlasso R package, which can be found in the CRAN repository.
94 citations