Efficient QR Decomposition Using Low Complexity Column-wise Givens Rotation (CGR)

doi:10.1109/VLSID.2014.51

Proceedings ArticleDOI

Efficient QR Decomposition Using Low Complexity Column-wise Givens Rotation (CGR)

- pp 258-263

TLDR

A novel Givens Rotation (GR) based QRD (GR-QRD) where the computational complexity of GR is reduced and the algorithm is implemented on REDEFINE which is a Coarse Grained run-time Reconfigurable Architecture (CGRA).

Abstract:

QR decomposition (QRD) is a widely used Numerical Linear Algebra (NLA) kernel with applications ranging from SONAR beamforming to wireless MIMO receivers. In this paper, we propose a novel Givens Rotation (GR) based QRD (GR-QRD) where we reduce the computational complexity of GR and exploit higher degree of parallelism. This low complexity Column-wise GR (CGR) can annihilate multiple elements of a column of a matrix simultaneously. The algorithm is first realized on a Two-Dimensional (2D) systolic array and then implemented on REDEFINE which is a Coarse Grained run-time Reconfigurable Architecture (CGRA). We benchmark the proposed implementation against state-of-the-art implementations to report better throughput, convergence and scalability.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Micro-architectural Enhancements in Distributed Memory CGRAs for LU and QR Factorizations

Farhad Merchant, +10 more

TL;DR: This paper carries out extensive micro-architectural exploration for accelerating core kernels like Matrix Multiplication (MM) (BLAS-3) for LU and QR factorizations and achieves up to 8x speed-up for MM in a CGRA environment.

...read moreread less

Proceedings ArticleDOI

Scalable and energy-efficient reconfigurable accelerator for column-wise givens rotation

Zoltan Endre Rakossy, +4 more

TL;DR: A new layered reconfigurable architecture is proposed which exploits modularity, scalability and flexibility to achieve high energy efficiency and memory bandwidth and achieves a clean trade-off of execution speed versus area, while keeping relatively constant energy.

...read moreread less

Proceedings ArticleDOI

Efficient and scalable CGRA-based implementation of Column-wise Givens Rotation

Zoltan Endre Rakossy, +4 more

TL;DR: These algorithms allow annihilation of multiple elements in a column of the input matrix simultaneously, without a dependency bottle-neck allowing increased parallelism, resource sharing and scalability.

...read moreread less

An Efficient Speech Perceptual Hashing Authentication Algorithm Based on Wavelet Packet Decomposition

Zhang Qiuyu, +4 more

TL;DR: The experiment results illustrate that the proposed algorithm was very robust in content preserving operations, had a very low hash bit rate, and can meet the requirements of real-time speech authentication with high certification efficiency.

...read moreread less

Journal ArticleDOI

Efficient Realization of Householder Transform through Algorithm-Architecture Co-design for Acceleration of QR Factorization

Farhad Merchant, +5 more

- 14 Dec 2016 -

arXiv: Performance

TL;DR: Efficient realization of Householder Transform (HT) based QR factorization through algorithm-architecture co-design is presented where performance improvement of 3-90x in-terms of Gflops/watt over state-of-the-art multicore, General Purpose Graphics Processing Units (GPGPUs), Field Programmable Gate Arrays (FPGAs), and ClearSpeed CSX700 is achieved.

...read moreread less

References

PDF

Open Access

More filters

Book

Matrix computations (3rd ed.)

Gene H. Golub, +1 more

Proceedings ArticleDOI

Matrix Triangularization By Systolic Arrays

W. M. Gentleman, +1 more

TL;DR: In this paper, a unified concept of using systolic arrays to perform real-time triangularization for both general and band matrices is presented, and a framework is presented for the solution of linear systems with pivoting and for least squares computations.

...read moreread less

Least squares computation by Givens transformation without square roots

W. E. Gentleman

Journal ArticleDOI

Least Squares Computations by Givens Transformations Without Square Roots

W. Morven Gentleman

- 01 Dec 1973 -

Ima Journal of Applied Mathematics

Journal ArticleDOI

On Stable Parallel Linear System Solvers

Ahmed H. Sameh, +1 more

- 01 Jan 1978 -

Journal of the ACM

TL;DR: Three stable parallel algorithms for solving dense and tndlagonai systems of lmear equations are discussed and one of the algorithms presented here is superior to the best previous algorithm in that with a modest increase in time.

...read moreread less

Related Papers (5)

On systolic arrays for recursive complex Householder transformations with applications to array processing

C.F.T. Tang, +2 more

Virtual Systolic Array for QR Decomposition

Jakub Kurzak, +4 more

An improved hardware design for matrix inverse based on systolic array QR decomposition and piecewise polynomial approximation

L. Canche Santos, +5 more

Dataflow Systolic Array Implementations of Matrix Decomposition Using High Level Synthesis

Jie Liu, +1 more

On a systolic implementation and the numerical properties of a multiple constrained adaptive beamformer

Bin Yang, +1 more