scispace - formally typeset
Search or ask a question
Topic

QR decomposition

About: QR decomposition is a research topic. Over the lifetime, 3504 publications have been published within this topic receiving 100599 citations. The topic is also known as: QR factorization.


Papers
More filters
Proceedings ArticleDOI
21 Jul 2015
TL;DR: The design successfully integrates the SVD module with the QR decomposition (QRD) module (for MIMO signal detection) under a unified hardware framework and outperform previous designs significantly.
Abstract: Precoding is an effective scheme in pre-compensating the wireless channel impairments and the singular value decomposition (SVD) scheme is a popular choice. This paper presents a unified high throughput SVD/QRD precoder chip design for MIMO OFDM systems. A hardware-implementation-friendly Givens Rotation (GR) based SVD computing scheme is developed first. It starts with a bi-diagonalization phase followed by an iterative diagonalization phase consisting of successive nullification sweeps. A convergence detection mechanism is employed to terminate the computations if the required precision is achieved. The design successfully integrates the SVD module (for precoding) with the QR decomposition (QRD) module (for MIMO signal detection) under a unified hardware framework. The design features a two-level pipelined, fully parallel architecture and CORDIC processors are employed to implement the GR modules efficiently. Various design optimization techniques are applied to reduce the circuit complexity and the power consumption. The implementation using TSMC 90nm process technology indicates a 35.75M SVDs per second throughput rate when operating at 143MHz. Both the throughput rate and the gate count efficiency of the proposed one outperform previous designs significantly.

14 citations

Journal ArticleDOI
TL;DR: Algorithms for accurate matrix factorizations named inverse LU and inverse QR factorizations for extremely ill-conditioned matrices based on standard numerical algorithms using pure floating-point arithmetic and accurate dot product are proposed.
Abstract: In this paper, algorithms for accurate matrix factorizations named inverse LU and inverse QR factorizations for extremely ill-conditioned matrices are proposed. The proposed algorithms are based on standard numerical algorithms using pure floating-point arithmetic and accurate dot product. Detailed analysis of the algorithms is presented. As an application of the proposed algorithms, a method of computing accurate solutions of linear systems is also proposed. Numerical results are presented for illustrating the performance of the proposed algorithms. Computing times for the algorithms adaptively change according to the difficulty of given problems.

14 citations

Journal ArticleDOI
TL;DR: This paper examines the mapping of algorithms encountered when solving dense linear systems and linear least-squares problems to a custom Linear Algebra Processor and exposes the benefits of redesigning floating point units and their surrounding data-paths to support these complicated operations.
Abstract: This paper examines the mapping of algorithms encountered when solving dense linear systems and linear least-squares problems to a custom Linear Algebra Processor. Specifically, the focus is on Cholesky, LU (with partial pivoting), and QR factorizations and their blocked algorithms. As part of the study, we expose the benefits of redesigning floating point units and their surrounding data-paths to support these complicated operations. We show how adding moderate complexity to the architecture greatly alleviates complexities in the algorithm. We study design tradeoffs and the effectiveness of architectural modifications to demonstrate that we can improve power and performance efficiency to a level that can otherwise only be expected of full-custom ASIC designs. A feasibility study of inner kernels is extended to blocked level and shows that, at block level, the Linear Algebra Core (LAC) can achieve high efficiencies with up to 45 GFLOPS/W for both Cholesky and LU factorization, and over 35 GFLOPS/W for QR factorization. While maintaining such efficiencies, our extensions to the MAC units can achieve up to 10, 12, and 20 percent speedup for the blocked algorithms of Cholesky, LU, and QR factorization, respectively.

14 citations

Patent
Kenichi Higuchi1
30 Jan 2009
TL;DR: In this article, a transmitting device Fourier-transforms symbols in a transmission symbol sequence, maps the Fouriertransformed symbols to subcarriers, inverse-Fouriertransforms the mapped symbols, and transmits the inverse Fourier transformed symbols from multiple transmitting antennas.
Abstract: A transmitting device Fourier-transforms symbols in a transmission symbol sequence, maps the Fourier-transformed symbols to subcarriers, inverse-Fourier-transforms the mapped symbols, and transmits the inverse-Fourier-transformed symbols from multiple transmitting antennas. A receiving device Fourier-transforms received signals, extracts signal components mapped to the subcarriers, and estimates the symbols transmitted via the subcarriers by applying a QR decomposition algorithm to the extracted signal components. The receiving device obtains a unitary matrix Q H such that the product of the unitary matrix Q H , a weight matrix W determining a correspondence between the transmission symbol sequence and the subcarriers, and a channel matrix H becomes a triangular matrix R, and estimates candidates of the symbols transmitted from the transmitting antennas based on the unitary matrix Q H and the triangular matrix R.

14 citations

Book ChapterDOI
03 Sep 2007
TL;DR: This paper optimize two of the major components of rectangular SVD, namely, QR decomposition of the input matrix and back-transformation of the left singular vectors by matrix Q, so that large-size matrix multiplications can be used efficiently.
Abstract: We propose an approach to speed up the singular value decomposition (SVD) of very large rectangular matrices using the CSX600 floating point coprocessor. The CSX600-based acceleration board we use offers 50GFLOPS of sustained performance, which is many times greater than that provided by standard microprocessors. However, this performance can be achieved only when a vendor-supplied matrix-matrix multiplication routine is used and the matrix size is sufficiently large. In this paper, we optimize two of the major components of rectangular SVD, namely, QR decomposition of the input matrix and back-transformation of the left singular vectors by matrix Q, so that large-size matrix multiplications can be used efficiently. In addition, we use the Integrable SVD algorithm to compute the SVD of an intermediate bidiagonal matrix. This helps to further speed up the computation and reduce the memory requirements. As a result, we achieved up to 3.5 times speedup over the Intel Math Kernel Library running on an 3.2GHz Xeon processor when computing the SVD of a 100,000 × 4000 matrix.

14 citations


Network Information
Related Topics (5)
Optimization problem
96.4K papers, 2.1M citations
85% related
Network packet
159.7K papers, 2.2M citations
84% related
Robustness (computer science)
94.7K papers, 1.6M citations
83% related
Wireless network
122.5K papers, 2.1M citations
83% related
Wireless sensor network
142K papers, 2.4M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202331
202273
202190
2020132
2019126
2018139