Proceedings ArticleDOI
Efficient QR Decomposition Using Low Complexity Column-wise Givens Rotation (CGR)
Farhad Merchant,Anupam Chattopadhyay,Ganesh Garga,S. K. Nandy,Ranjani Narayan,Nandhini Gopalan +5 more
- pp 258-263
TLDR
A novel Givens Rotation (GR) based QRD (GR-QRD) where the computational complexity of GR is reduced and the algorithm is implemented on REDEFINE which is a Coarse Grained run-time Reconfigurable Architecture (CGRA).Abstract:
QR decomposition (QRD) is a widely used Numerical Linear Algebra (NLA) kernel with applications ranging from SONAR beamforming to wireless MIMO receivers. In this paper, we propose a novel Givens Rotation (GR) based QRD (GR-QRD) where we reduce the computational complexity of GR and exploit higher degree of parallelism. This low complexity Column-wise GR (CGR) can annihilate multiple elements of a column of a matrix simultaneously. The algorithm is first realized on a Two-Dimensional (2D) systolic array and then implemented on REDEFINE which is a Coarse Grained run-time Reconfigurable Architecture (CGRA). We benchmark the proposed implementation against state-of-the-art implementations to report better throughput, convergence and scalability.read more
Citations
More filters
Proceedings ArticleDOI
Micro-architectural Enhancements in Distributed Memory CGRAs for LU and QR Factorizations
Farhad Merchant,Arka Maity,Mahesh Mahadurkar,Kapil Vatwani,Ishan Munje,Madhava Krishna,S. Nalesh,Nandhini Gopalan,Soumyendu Raha,S. K. Nandy,Ranjani Narayan +10 more
TL;DR: This paper carries out extensive micro-architectural exploration for accelerating core kernels like Matrix Multiplication (MM) (BLAS-3) for LU and QR factorizations and achieves up to 8x speed-up for MM in a CGRA environment.
Proceedings ArticleDOI
Scalable and energy-efficient reconfigurable accelerator for column-wise givens rotation
TL;DR: A new layered reconfigurable architecture is proposed which exploits modularity, scalability and flexibility to achieve high energy efficiency and memory bandwidth and achieves a clean trade-off of execution speed versus area, while keeping relatively constant energy.
Proceedings ArticleDOI
Efficient and scalable CGRA-based implementation of Column-wise Givens Rotation
TL;DR: These algorithms allow annihilation of multiple elements in a column of the input matrix simultaneously, without a dependency bottle-neck allowing increased parallelism, resource sharing and scalability.
An Efficient Speech Perceptual Hashing Authentication Algorithm Based on Wavelet Packet Decomposition
TL;DR: The experiment results illustrate that the proposed algorithm was very robust in content preserving operations, had a very low hash bit rate, and can meet the requirements of real-time speech authentication with high certification efficiency.
Journal ArticleDOI
Efficient Realization of Householder Transform through Algorithm-Architecture Co-design for Acceleration of QR Factorization
Farhad Merchant,Tarun Vatwani,Anupam Chattopadhyay,Soumyendu Raha,S. K. Nandy,Ranjani Narayan +5 more
TL;DR: Efficient realization of Householder Transform (HT) based QR factorization through algorithm-architecture co-design is presented where performance improvement of 3-90x in-terms of Gflops/watt over state-of-the-art multicore, General Purpose Graphics Processing Units (GPGPUs), Field Programmable Gate Arrays (FPGAs), and ClearSpeed CSX700 is achieved.
References
More filters
Proceedings ArticleDOI
Matrix Triangularization By Systolic Arrays
TL;DR: In this paper, a unified concept of using systolic arrays to perform real-time triangularization for both general and band matrices is presented, and a framework is presented for the solution of linear systems with pivoting and for least squares computations.
Journal ArticleDOI
On Stable Parallel Linear System Solvers
Ahmed H. Sameh,David J. Kuck +1 more
TL;DR: Three stable parallel algorithms for solving dense and tndlagonai systems of lmear equations are discussed and one of the algorithms presented here is superior to the best previous algorithm in that with a modest increase in time.