scispace - formally typeset
Search or ask a question
Topic

QR decomposition

About: QR decomposition is a research topic. Over the lifetime, 3504 publications have been published within this topic receiving 100599 citations. The topic is also known as: QR factorization.


Papers
More filters
Proceedings Article
01 Sep 2007
TL;DR: This paper reports on a highly optimized 4×4 MMSE detector implementation that resulted in a real-time FPGA based implementation on a Xilinx Virtex-II 6000 part that delivers over 420 Mbps sustained throughput, with a small 2.77 μs latency.
Abstract: This paper reports on a highly optimized 4×4 MMSE detector implementation. The work resulted in a real-time FPGA based implementation on a Xilinx Virtex-II 6000 part. It utilizes 8,513 logic slices, 64 multipliers, and 23 Block RAMs (less than 30% of the overall resources of this part). The design delivers over 420 Mbps sustained throughput, with a small 2.77 μs latency. Three main techniques are responsible for the improvements over other MIMO detectors reported in literature. They are: (a) the combination of a modified Gram-Schmidt QR decomposition algorithm with Square-Root linear MMSE detection; (b) a dynamic scaling algorithm that enhances numerical stability; and (c) an aggressive time-shared VLSI architecture. The above techniques are quite general and are readily applicable to any MIMO detector implementation.

15 citations

Journal ArticleDOI
01 Jul 1989
TL;DR: The synchronization cost of the algorithm proposed in this article is bounded by O(n2/p) when m ⪰ n, which is important for machines where synchronization cost is high, and when m⪢n.
Abstract: A new algorithm for computing an orthogonal decomposition of a rectangular m × n matrix A on a shared-memory parallel computer is described. The algorithm uses Givens rotations, and has the feature that its synchronization cost is low. In particular, for a multiprocessor having p processors, an analysis of the algorithm shows that this cost is O(n2/p) if m/p ⪰ n, and O(mn/p2) of m/p <. Note that in the latter case, the synchronization cost is smaller than O(n2/p). Therefore, the synchronization cost of the algorithm proposed in this article is bounded by O(n2/p) when m ⪰ n. This is important for machines where synchronization cost is high, and when m⪢n. Analysis and experiments show that the algorithm is effective in balancing the load and producing high efficiency (speedup).

15 citations

Journal ArticleDOI
TL;DR: Simulation results show that the proposed algorithms offer improved performance over the conventional PAST algorithm and a comparable performance to the Kalman filter with variable measurement subspace tracking algorithm, which requires a considerably higher arithmetic complexity.
Abstract: This paper proposes a new local polynomial modeling based variable forgetting factor (VFF) and variable regularized (VR) projection approximation subspace tracking (PAST) algorithm, which is based on a novel VR-VFF recursive least squares (RLS) algorithm with multiple outputs. The subspace to be estimated is modeled as a local polynomial model so that a new locally optimal forgetting factor (LOFF) can be obtained by minimizing the resulting mean square deviation of the RLS algorithm after using the projection approximation. An $l_2$ -regularization term is also incorporated to the LOFF-PAST algorithm to reduce the estimation variance of the subspace during signal fading. The proposed LOFF-VR-PAST algorithm can be implemented by the conventional RLS algorithm as well as the numerically more stable QR decomposition. Applications of the proposed algorithms to subspace-based direction-of-arrival estimation under stationary and nonstationary environments are presented to validate their effectiveness. Simulation results show that the proposed algorithms offer improved performance over the conventional PAST algorithm and a comparable performance to the Kalman filter with variable measurement subspace tracking algorithm, which requires a considerably higher arithmetic complexity. The new LOFF-VR-RLS algorithm may also be applicable to other RLS problems involving multiple outputs.

15 citations

Proceedings ArticleDOI
05 Jan 2014
TL;DR: A novel Givens Rotation (GR) based QRD (GR-QRD) where the computational complexity of GR is reduced and the algorithm is implemented on REDEFINE which is a Coarse Grained run-time Reconfigurable Architecture (CGRA).
Abstract: QR decomposition (QRD) is a widely used Numerical Linear Algebra (NLA) kernel with applications ranging from SONAR beamforming to wireless MIMO receivers. In this paper, we propose a novel Givens Rotation (GR) based QRD (GR-QRD) where we reduce the computational complexity of GR and exploit higher degree of parallelism. This low complexity Column-wise GR (CGR) can annihilate multiple elements of a column of a matrix simultaneously. The algorithm is first realized on a Two-Dimensional (2D) systolic array and then implemented on REDEFINE which is a Coarse Grained run-time Reconfigurable Architecture (CGRA). We benchmark the proposed implementation against state-of-the-art implementations to report better throughput, convergence and scalability.

15 citations

Journal ArticleDOI
TL;DR: A novel most significant digit first CORDIC architecture is presented that is suitable for the VLSI design of systolic array processor cells for performing QR decomposition and it is shown that simplifying the calculation of convergence bounds also greatly simplifies the derivation of suitable V LSI architectures.
Abstract: A novel most significant digit first CORDIC architecture is presented that is suitable for the VLSI design of systolic array processor cells for performing QR decomposition. This is based on an online CORDIC algorithm with a constant scale factor and a latency independent of the wordlength. This has been derived through the extension of previously published CORDIC algorithms. It is shown that simplifying the calculation of convergence bounds also greatly simplifies the derivation of suitable VLSI architectures. Design studies, based on a 0.35-/spl mu/ CMOS standard cell process, indicate that 20 such QR processor cells operating at rates suitable for radar beamforming can be readily accommodated on a single chip.

15 citations


Network Information
Related Topics (5)
Optimization problem
96.4K papers, 2.1M citations
85% related
Network packet
159.7K papers, 2.2M citations
84% related
Robustness (computer science)
94.7K papers, 1.6M citations
83% related
Wireless network
122.5K papers, 2.1M citations
83% related
Wireless sensor network
142K papers, 2.4M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202331
202273
202190
2020132
2019126
2018139