Topic

QR decomposition

About: QR decomposition is a research topic. Over the lifetime, 3504 publications have been published within this topic receiving 100599 citations. The topic is also known as: QR factorization.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A high throughput unified SVD/QRD precoder design for MIMO OFDM systems

[...]

Yin-Tsung Hwang¹, Kuan-Ting Chen¹, Chau-Kai Wu¹•Institutions (1)

National Chung Hsing University¹

21 Jul 2015

TL;DR: The design successfully integrates the SVD module with the QR decomposition (QRD) module (for MIMO signal detection) under a unified hardware framework and outperform previous designs significantly.

...read moreread less

Abstract: Precoding is an effective scheme in pre-compensating the wireless channel impairments and the singular value decomposition (SVD) scheme is a popular choice. This paper presents a unified high throughput SVD/QRD precoder chip design for MIMO OFDM systems. A hardware-implementation-friendly Givens Rotation (GR) based SVD computing scheme is developed first. It starts with a bi-diagonalization phase followed by an iterative diagonalization phase consisting of successive nullification sweeps. A convergence detection mechanism is employed to terminate the computations if the required precision is achieved. The design successfully integrates the SVD module (for precoding) with the QR decomposition (QRD) module (for MIMO signal detection) under a unified hardware framework. The design features a two-level pipelined, fully parallel architecture and CORDIC processors are employed to implement the GR modules efficiently. Various design optimization techniques are applied to reduce the circuit complexity and the power consumption. The implementation using TSMC 90nm process technology indicates a 35.75M SVDs per second throughput rate when operating at 143MHz. Both the throughput rate and the gate count efficiency of the proposed one outperform previous designs significantly.

...read moreread less

14 citations

Journal Article•DOI•

Accurate Matrix Factorization: Inverse LU and Inverse QR Factorizations

[...]

Takeshi Ogita

01 Jul 2010-SIAM Journal on Matrix Analysis and Applications

TL;DR: Algorithms for accurate matrix factorizations named inverse LU and inverse QR factorizations for extremely ill-conditioned matrices based on standard numerical algorithms using pure floating-point arithmetic and accurate dot product are proposed.

...read moreread less

Abstract: In this paper, algorithms for accurate matrix factorizations named inverse LU and inverse QR factorizations for extremely ill-conditioned matrices are proposed. The proposed algorithms are based on standard numerical algorithms using pure floating-point arithmetic and accurate dot product. Detailed analysis of the algorithms is presented. As an application of the proposed algorithms, a method of computing accurate solutions of linear systems is also proposed. Numerical results are presented for illustrating the performance of the proposed algorithms. Computing times for the algorithms adaptively change according to the difficulty of given problems.

...read moreread less

14 citations

Journal Article•DOI•

Algorithm, Architecture, and Floating-Point Unit Codesign of a Matrix Factorization Accelerator

[...]

Ardavan Pedram¹, Andreas Gerstlauer¹, Robert A. van de Geijn¹•Institutions (1)

University of Texas at Austin¹

01 Aug 2014-IEEE Transactions on Computers

TL;DR: This paper examines the mapping of algorithms encountered when solving dense linear systems and linear least-squares problems to a custom Linear Algebra Processor and exposes the benefits of redesigning floating point units and their surrounding data-paths to support these complicated operations.

...read moreread less

Abstract: This paper examines the mapping of algorithms encountered when solving dense linear systems and linear least-squares problems to a custom Linear Algebra Processor. Specifically, the focus is on Cholesky, LU (with partial pivoting), and QR factorizations and their blocked algorithms. As part of the study, we expose the benefits of redesigning floating point units and their surrounding data-paths to support these complicated operations. We show how adding moderate complexity to the architecture greatly alleviates complexities in the algorithm. We study design tradeoffs and the effectiveness of architectural modifications to demonstrate that we can improve power and performance efficiency to a level that can otherwise only be expected of full-custom ASIC designs. A feasibility study of inner kernels is extended to blocked level and shows that, at block level, the Linear Algebra Core (LAC) can achieve high efficiencies with up to 45 GFLOPS/W for both Cholesky and LU factorization, and over 35 GFLOPS/W for QR factorization. While maintaining such efficiencies, our extensions to the MAC units can achieve up to 10, 12, and 20 percent speedup for the blocked algorithms of Cholesky, LU, and QR factorization, respectively.

...read moreread less

14 citations

Patent•

Mobile communication system, receiving device, and method

[...]

Kenichi Higuchi¹•Institutions (1)

NTT DoCoMo¹

30 Jan 2009

TL;DR: In this article, a transmitting device Fourier-transforms symbols in a transmission symbol sequence, maps the Fouriertransformed symbols to subcarriers, inverse-Fouriertransforms the mapped symbols, and transmits the inverse Fourier transformed symbols from multiple transmitting antennas.

...read moreread less

Abstract: A transmitting device Fourier-transforms symbols in a transmission symbol sequence, maps the Fourier-transformed symbols to subcarriers, inverse-Fourier-transforms the mapped symbols, and transmits the inverse-Fourier-transformed symbols from multiple transmitting antennas. A receiving device Fourier-transforms received signals, extracts signal components mapped to the subcarriers, and estimates the symbols transmitted via the subcarriers by applying a QR decomposition algorithm to the extracted signal components. The receiving device obtains a unitary matrix Q H such that the product of the unitary matrix Q H , a weight matrix W determining a correspondence between the transmission symbol sequence and the subcarriers, and a channel matrix H becomes a triangular matrix R, and estimates candidates of the symbols transmitted from the transmitting antennas based on the unitary matrix Q H and the triangular matrix R.

...read moreread less

14 citations

Book Chapter•DOI•

Accelerating the singular value decomposition of rectangular matrices with the CSK600 and the integrable SVD

[...]

Yusaku Yamamoto¹, Takeshi Fukaya¹, Takashi Uneyama², Masami Takata³, Kinji Kimura⁴, Masashi Iwasaki⁵, Yoshimasa Nakamura² - Show less +3 more•Institutions (5)

Nagoya University¹, Kyoto University², Nara Women's University³, Niigata University⁴, Kyoto Prefectural University⁵

03 Sep 2007

TL;DR: This paper optimize two of the major components of rectangular SVD, namely, QR decomposition of the input matrix and back-transformation of the left singular vectors by matrix Q, so that large-size matrix multiplications can be used efficiently.

...read moreread less

Abstract: We propose an approach to speed up the singular value decomposition (SVD) of very large rectangular matrices using the CSX600 floating point coprocessor. The CSX600-based acceleration board we use offers 50GFLOPS of sustained performance, which is many times greater than that provided by standard microprocessors. However, this performance can be achieved only when a vendor-supplied matrix-matrix multiplication routine is used and the matrix size is sufficiently large. In this paper, we optimize two of the major components of rectangular SVD, namely, QR decomposition of the input matrix and back-transformation of the left singular vectors by matrix Q, so that large-size matrix multiplications can be used efficiently. In addition, we use the Integrable SVD algorithm to compute the SVD of an intermediate bidiagonal matrix. This helps to further speed up the computation and reduce the memory requirements. As a result, we achieved up to 3.5 times speedup over the Intel Math Kernel Library running on an 3.2GHz Xeon processor when computing the SVD of a 100,000 × 4000 matrix.

...read moreread less

14 citations

Collapse

Network Information

Performance

Metrics

3,607

Papers

106,604

Citations

No. of papers in the topic in previous years
Year	Papers
2023	31
2022	73
2021	90
2020	132
2019	126
2018	139

QR decomposition

Papers published on a yearly basis

Papers

Trending Questions (4)

Network Information

Related Topics (5)

Performance

Metrics