scispace - formally typeset
Search or ask a question

Showing papers on "QR decomposition published in 2008"


Journal ArticleDOI
TL;DR: iSAM is efficient even for robot trajectories with many loops as it avoids unnecessary fill-in in the factor matrix by periodic variable reordering and provides efficient algorithms to access the estimation uncertainties of interest based on the factored information matrix.
Abstract: In this paper, we present incremental smoothing and mapping (iSAM), which is a novel approach to the simultaneous localization and mapping problem that is based on fast incremental matrix factorization. iSAM provides an efficient and exact solution by updating a QR factorization of the naturally sparse smoothing information matrix, thereby recalculating only those matrix entries that actually change. iSAM is efficient even for robot trajectories with many loops as it avoids unnecessary fill-in in the factor matrix by periodic variable reordering. Also, to enable data association in real time, we provide efficient algorithms to access the estimation uncertainties of interest based on the factored information matrix. We systematically evaluate the different components of iSAM as well as the overall algorithm using various simulated and real-world datasets for both landmark and pose-only settings.

1,091 citations


Journal ArticleDOI
TL;DR: A robust approach is presented which removes the sparsity of the block-structured least-squares equations by a direct application of the QR decomposition and considerable savings in terms of computation time and memory requirements are obtained.
Abstract: Broadband macromodeling of large multiport systems by vector fitting can be time consuming and resource demanding when all elements of the system matrix share a common set of poles. This letter presents a robust approach which removes the sparsity of the block-structured least-squares equations by a direct application of the QR decomposition. A 60-port printed circuit board example illustrates that considerable savings in terms of computation time and memory requirements are obtained.

473 citations


Posted Content
TL;DR: In this article, the authors present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR.
Abstract: We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. We prove optimality by extending known lower bounds on communication bandwidth for sequential and parallel matrix multiplication to provide latency lower bounds, and show these bounds apply to the LU and QR decompositions. We not only show that our QR algorithms attain these lower bounds (up to polylogarithmic factors), but that existing LAPACK and ScaLAPACK algorithms perform asymptotically more communication. We also point out recent LU algorithms in the literature that attain at least some of these lower bounds.

300 citations


01 Jan 2008
TL;DR: The Fortran subroutine BVLS (bounded variable least-squares) solves linear least-Squares problems with upper and lower bounds on the variables, using an active set strategy, and is used to solve minimum l1 and l∞ fitting problems.
Abstract: The Fortran subroutine BVLS (bounded variable least-squares) solves linear least-squares problems with upper and lower bounds on the variables, using an active set strategy. The unconstrained least-squares problems for each candidate set of free variables are solved using the QR decomposition. BVLS has a “warm-start” feature permitting some of the variables to be initialized at their upper or lower bounds, which speeds the solution of a sequence of related problems. Such sequences of problems arise, for example, when BVLS is used to find bounds on linear functionals of a model constrained to satisfy, in an approximate lp-norm sense, a set of linear equality constraints in addition to upper and lower bounds. We show how to use BVLS to solve that problem when p = 1, 2, or ∞, and to solve minimum l1 and l∞ fitting problems. FORTRAN 77 code implementing BVLS is available from the statlib gopher at Carnegie Mellon University.

163 citations


Journal IssueDOI
TL;DR: An algorithm for the QR factorization where the operations can be represented as a sequence of small tasks that operate on square blocks of data (referred to as ‘tiles’) where parallelism can be exploited only at the level of the BLAS operations and with vendor implementations is presented.
Abstract: As multicore systems continue to gain ground in the high-performance computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advantage of the architectural features on these new processors. Fine-grain parallelism becomes a major requirement and introduces the necessity of loose synchronization in the parallel execution of an operation. This paper presents an algorithm for the QR factorization where the operations can be represented as a sequence of small tasks that operate on square blocks of data (referred to as ‘tiles’). These tasks can be dynamically scheduled for execution based on the dependencies among them and on the availability of computational resources. This may result in an out-of-order execution of the tasks that will completely hide the presence of intrinsically sequential tasks in the factorization. Performance comparisons are presented with the LAPACK algorithm for QR factorization where parallelism can be exploited only at the level of the BLAS operations and with vendor implementations. Copyright © 2008 John Wiley & Sons, Ltd.

151 citations


Proceedings ArticleDOI
01 Nov 2008
TL;DR: An optimized fixed-point VLSI implementation of the modified Gram-Schmidt (MGS) QRD algorithm that incorporates regularization and additional sorting of the MIMO channel matrix that clearly showed superiority of the Givens rotation (GR) solution in terms of area, processing cycles, and throughput.
Abstract: The QR decomposition (QRD) is an important prerequisite for many different detection algorithms in multiple-input multiple-output (MIMO) wireless communication systems. This paper presents an optimized fixed-point VLSI implementation of the modified Gram-Schmidt (MGS) QRD algorithm that incorporates regularization and additional sorting of the MIMO channel matrix. Integrated in 0.18 mum CMOS technology, the proposed VLSI architecture processes up to 1.56 million complex-valued 4times4-dimensional matrices per second. The implementation results of this work are extensively compared to the Givens rotation (GR)-based QRD implementation of Luethi et al., ISCAS 2007. In order to ensure a fair comparison, both QRD circuits have been integrated in the same IC manufacturing technology, with equal functionality, and the same numeric precision. The comparison of the implementation results clearly showed superiority of the GR-based VLSI solution in terms of area, processing cycles, and throughput.

71 citations


Journal ArticleDOI
TL;DR: It is proven that the throughput of this scheme scales as M log log(K) and asymptotically (K rarr infin) tends to the sum-capacity of the multiple-input multiple-output (MIMO) broadcast channel.
Abstract: A simple signaling method for broadcast channels with multiple-transmit multiple-receive antennas is proposed. In this method, for each user, the direction in which the user has the maximum gain is determined. The best user in terms of the largest gain is selected. The corresponding direction is used as the modulation vector (MV) for the data stream transmitted to the selected user. The algorithm proceeds in a recursive manner where in each step, the search for the best direction is performed in the null space of the previously selected MVs. It is demonstrated that with the proposed method, each selected MV has no interference on the previously selected MVs. Dirty-paper coding is used to cancel the remaining interference. For the case that each receiver has one antenna, the presented scheme coincides with the known scheme based on Gram-Schmidt orthogonalization (QR decomposition). To analyze the performance of the scheme, an upper bound on the cumulative distribution function (CDF) of each subchannel is derived which is used to establish the diversity order and the asymptotic sum-rate of the scheme. It is shown that using fixed rate codebooks, the diversity order of the jth data stream, 1 les j les M, is equal to N(M - j + 1)(K - j + 1), where M, N, and K indicate the number of transmit antennas, the number of receive antennas, and the number of users, respectively. Furthermore, it is proven that the throughput of this scheme scales as M log log(K) and asymptotically (K rarr infin) tends to the sum-capacity of the multiple-input multiple-output (MIMO) broadcast channel. The simulation results indicate that the achieved sum-rate is close to the sum-capacity of the underlying broadcast channel.

62 citations


Posted Content
12 Jun 2008
TL;DR: Both parallel and sequential performance results show that TSQR outperforms competing methods, and CAQR (Communication-Avoiding QR), factors general rectangular matrices distributed in a two-dimensional block cyclic layout, removes a latency bottleneck in ScaLAPACK's current parallel approach.
Abstract: We present parallel and sequential dense QR factorization algorithms that are optimized to avoid communication. Some of these are novel, and some extend earlier work. Communication includes both messages between processors (in the parallel case), and data movement between slow and fast memory (in either the sequential or parallel cases). Our first algorithm, Tall Skinny QR (TSQR), factors m× n matrices in a one-dimensional (1-D) block cyclic row layout, storing the Q factor (if desired) implictly as a tree of blocks of Householder reflectors. TSQR is optimized for matrices with many more rows than columns (hence the name). In the parallel case, TSQR requires no more than the minimum number of messages Θ(logP ) between P processors. In the sequential case, TSQR transfers 2mn + o(mn) words between slow and fast memory, which is the theoretical lower bound, and performs Θ(mn/W ) block reads and writes (as a function of the fast memory size W ), which is within a constant factor of the theoretical lower bound. In contrast, the conventional parallel algorithm as implemented in ScaLAPACK requires Θ(n logP ) messages, a factor of n times more, and the analogous sequential algorithm transfers Θ(mn) words between slow and fast memory, also a factor of n times more. TSQR only uses orthogonal transforms, so it is just as stable as standard Householder QR. Both parallel and sequential performance results show that TSQR outperforms competing methods. Our second algorithm, CAQR (Communication-Avoiding QR), factors general rectangular matrices distributed in a two-dimensional block cyclic layout. It invokes TSQR for each block column factorization, which both remove a latency bottleneck in ScaLAPACK’s current parallel approach, and both bandwidth and latency bottlenecks in ScaLAPACK’s out-of-core QR factorization. CAQR achieves modeled speedups of 2.1× on an IBM POWER5 cluster, 3.0× on a future petascale machine, and 3.8× on the Grid.

58 citations


Proceedings ArticleDOI
12 May 2008
TL;DR: A processor based complex-valued QR decomposition is presented that is enhanced with complex arithmetic and inverse square root function units and fits well with the real-time requirements of the MIMO receiver.
Abstract: Multiple input multiple output (MIMO) transmission is an emerging technique targeted at 3G long term evolution (LTE) systems. One vital baseband function in MIMO receivers is QR decomposition of the channel matrix. In this paper, a processor based complex-valued QR decomposition is presented. The processor is enhanced with complex arithmetic and inverse square root function units. The proposed processor fits well with the real-time requirements of the MIMO receiver. The computing power is tailored for typical MIMO systems. Due to the generality of the applied computing resources it can also be used for other tasks. Also, the presented principles can be applied on any customizable processor architectures to accelerate QR decomposition.

55 citations


Proceedings ArticleDOI
13 Feb 2008
TL;DR: This paper examines the scalable parallel implementation of the QR factorization of a general matrix, targeting SMP and multi-core architectures, and shows that the implementation effort is greatly simplified by expressing the algorithms in code with the FLAME/FLASH API, which allows matrices stored by blocks to be viewed and managed as matrices of matrix blocks.
Abstract: This paper examines the scalable parallel implementation of the QR factorization of a general matrix, targeting SMP and multi-core architectures. Two implementations of algorithms-by-blocks are presented. Each implementation views a block of a matrix as the fundamental unit of data, and likewise, operations over these blocks as the primary unit of computation. The first is a conventional blocked algorithm similar to those included in libFLAME and LAPACK but expressed in a way that allows operations in the so-called critical path of execution to be computed as soon as their dependencies are satisfied. The second algorithm captures a higher degree of parallelism with an approach based on Givens rotations while preserving the performance benefits of algorithms based on blocked Householder transformations. We show that the implementation effort is greatly simplified by expressing the algorithms in code with the FLAME/FLASH API, which allows matrices stored by blocks to be viewed and managed as matrices of matrix blocks. The SuperMatrix run-time system utilizes FLASH to assemble and represent matrices but also provides out-of-order scheduling of operations that is transparent to the programmer. Scalability of the solution is demonstrated on ccNUMA platform with 16 processors and an SMP architecture with 16 cores.

53 citations


Patent
26 Nov 2008
TL;DR: In this article, a closed loop MIMO communication utilizing implicit or explicit channel state information (CSI) at the transmitter and the receiver is described, where the receiver mitigates the mutual interference between the streams by performing MMSE processing on the received signals, and the MMSE matrix is computed with respect to the processed channel that may estimated by the receiver through preprocessed pilot signals.
Abstract: The present invention describes a method of closed loop MIMO communication utilizing implicit or explicit channel state information (CSI) at the transmitter and the receiver. The transmitter performs linear pre-processing (for example, QR decomposition or bi-diagonal decomposition or Jacobi rotations, and/or sporadic SVDs) on a channel matrix, and the receiver mitigates the mutual interference between the streams by performing MMSE processing on the received signals. The MMSE matrix is computed with respect to the processed channel that may estimated by the receiver through preprocessed pilot signals. The transmitters preprocessing is of much lesser cost and complexity than full SVD.

14 Nov 2008
TL;DR: The problem of updating the QR factorization is treated, with applications to the least squares problem, and algorithms are presented that compute the factorization A1 = Q1 R1, where A1 is the matrix A = QR after it has had a number of rows or columns added or deleted.
Abstract: In this paper we treat the problem of updating the QR factorization, with applications to the least squares problem. Algorithms are presented that compute the factorization A1 = Q1 R1, where A1 is the matrix A = QR after it has had a number of rows or columns added or deleted. This is achieved by updating the factors Q and R, and we show this can be much faster than computing the factorization of A1 from scratch. We consider algorithms that exploit the Level 3 BLAS where possible and place no restriction on the dimensions of A or the number of rows and columns added or deleted. For some of our algorithms we present Fortran 77 LAPACK-style code and show the backward error of our updated factors is comparable to the error bounds of the QR factorization of A1.

Posted Content
TL;DR: In this article, the authors present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR.
Abstract: We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. Our first algorithm, Tall Skinny QR (TSQR), factors m-by-n matrices in a one-dimensional (1-D) block cyclic row layout, and is optimized for m >> n. Our second algorithm, CAQR (Communication-Avoiding QR), factors general rectangular matrices distributed in a two-dimensional block cyclic layout. It invokes TSQR for each block column factorization.

Patent
23 Jul 2008
TL;DR: In this article, a multi-dimensional detector for a receiver of an MIMO system and a method thereof is presented. But the method is not suitable for the detection of a single-input single-out (SISO) system.
Abstract: Provided are a multi-dimensional detector for a receiver of an MIMO system and a method thereof. The multi-dimensional detector includes a first symbol detecting unit for calculating symbol distance values using an upper triangular matrix (R) obtained from QR decomposition to detect an m th symbol; a symbol deciding unit for deciding a symbol having a minimum distance value among the calculated symbol distance values from the first symbol detecting unit; and a second symbol detecting unit for calculating symbol distance values using an updated received signal y and the upper triangular matrix R to detect a (m−1) th symbol.

Journal ArticleDOI
TL;DR: In this article, a linear decomposition algorithm is proposed to solve the homogeneous transformation equation of the form AX=XB in hand-eye calibration, where X represents an unknown transformation from the camera to the robot hand, and A and B denote the known movement transformations associated with the robot hands and the camera, respectively.
Abstract: To solve the homogeneous transformation equation of the form AX=XB in hand-eye calibration, where X represents an unknown transformation from the camera to the robot hand, and A and B denote the known movement transformations associated with the robot hand and the camera, respectively, this paper introduces a new linear decomposition algorithm which consists of singular value decomposition followed by the estimation of the optimal rotation matrix and the least squares equation to solve the rotation matrix of X. Without the requirements of traditional methods that A and B be rigid transformations with the same rotation angle, it enables the extension to non-rigid transformations for A and B. The details of our method are given, together with a short discussion of experimental results, showing that more precision and robustness can be achieved.

Journal ArticleDOI
TL;DR: An unexpected and rather erratic behavior of the LAPACK software implementation of the QR factorization with Businger-Golub column pivoting is reported and a new, equally efficient, and provably numerically safe partial-column norm-updating strategy is provided.
Abstract: This article reports an unexpected and rather erratic behavior of the LAPACK software implementation of the QR factorization with Businger-Golub column pivoting. It is shown that, due to finite precision arithmetic, the software implementation of the factorization can catastrophically fail to produce a properly structured triangular factor, thus leading to a potentially severe underestimate of a matrix's numerical rank. The 30-year old problem, dating back to LINPACK, has (undetectedly) badly affected many computational routines and software packages, as well as the study of rank-revealing QR factorizations. We combine computer experiments and numerical analysis to isolate, analyze, and fix the problem. Our modification of the current LAPACK xGEQP3 routine is already included in the LAPACK 3.1.0 release. The modified routine is numerically more robust and with a negligible overhead. We also provide a new, equally efficient, and provably numerically safe partial-column norm-updating strategy.

Journal ArticleDOI
TL;DR: A block algorithm is derived that can be quite effective for large scale problems, even when the matrix $X$ is rank degenerate, and matrix-vector products are converted into matrix-matrix products, allowing level-3 BLAS cache performance.
Abstract: The classical Gram-Schmidt algorithm for computing the QR factorization of a matrix $X$ requires at least one pass over the current orthogonalized matrix $Q$ as each column of $X$ is added to the factorization. When $Q$ becomes so large that it must be maintained on a backing store, each pass involves the costly transfer of data from the backing store to main memory. However, if one orthogonalizes the columns of $X$ in blocks of $m$ columns, the number of passes is reduced by a factor of $1/m$. Moreover, matrix-vector products are converted into matrix-matrix products, allowing level-3 BLAS cache performance. In this paper we derive such a block algorithm and give some experimental results that suggest it can be quite effective for large scale problems, even when the matrix $X$ is rank degenerate.

Journal ArticleDOI
TL;DR: By using the concept of principal fiber bundles, FastICA is proven to be locally quadratically convergent to a correct separation and the so-called QR FastICA algorithm, which employs the QR decomposition instead of the polar decomposition, is shown to share similar local convergence properties with the original FastICA.
Abstract: The FastICA algorithm is one of the most prominent methods to solve the problem of linear independent component analysis (ICA). Although there have been several attempts to prove local convergence properties of FastICA, rigorous analysis is still missing in the community. The major difficulty of analysis is because of the well-known sign-flipping phenomenon of FastICA, which causes the discontinuity of the corresponding FastICA map on the unit sphere. In this paper, by using the concept of principal fiber bundles, FastICA is proven to be locally quadratically convergent to a correct separation. Higher order local convergence properties of FastICA are also investigated in the framework of a scalar shift strategy. Moreover, as a parallelized version of FastICA, the so-called QR FastICA algorithm, which employs the QR decomposition (Gram-Schmidt orthonormalization process) instead of the polar decomposition, is shown to share similar local convergence properties with the original FastICA.

Proceedings ArticleDOI
18 May 2008
TL;DR: The Givens Rotation based factorization algorithm is revised and an efficient scheme working in the real number domain is developed, which can reduce the computing complexity to almost one half by exploiting the symmetric property.
Abstract: Complex QR factorization is a fundamental operation used in various MIMO signal detection algorithms. In this paper, we revise the Givens Rotation based factorization algorithm and develop an efficient scheme working in the real number domain. The complex matrix is first extended into a block-wise symmetric real number counterpart. The proposed scheme can reduce the computing complexity to almost one half by exploiting the symmetric property. Computing complexity analysis also shows the superiority of our scheme over various factorization schemes. Finally, subject to the EWC 802.11n recommendation, a novel systolic array design featuring fully parallel and deeply pipelined processing was presented. CORDIC algorithm is employed to implement the required rotation operations with low circuit complexity. Synthesis results in TSMC 0.18mum process indicate the proposed design, with a gate count of merely 17.06 K and a maximum clock rate of 202 MHz, can admit a new 2 x 2 complex matrix for factorization in every 8 clock cycles.

Journal ArticleDOI
TL;DR: A parallel detection algorithm using multiple QR decompositions with permuted channel matrices for SDM/OFDM systems is proposed for reducing the system complexity, while maintaining the performance of the system.
Abstract: Space division multiplexing (SDM)/orthogonal frequency division multiplexing (OFDM) systems transmit different data using the same frequency, so it is necessary to separate the simultaneously received signals in the receiver. Previous studies have shown that maximum likelihood detection (MLD) provides the best bit error rate (BER) performance. However, the complexity of MLD exponentially increases with the constellation size and the number of transmit antenna branches. Therefore, it is impractical to use a full MLD without reducing its computational complexity, because it would be prohibitively large for implementation. Recently, the use of QR decomposition with an M-algorithm (QRD-M) has been proposed to reduce the system complexity while maintaining the performance of the system. However, the QRD-M performance depends on the number of surviving symbol replica candidates. When QRD-M is used with a small number of surviving symbol replica candidates, the performance declines, but when there is a large number of surviving symbol replica candidates and the transmitter antenna branches, QRD-M requires a large memory to maintain their branch metrics, and a long latency time is also required. To reduce these problems, in this paper, we propose a parallel detection algorithm using multiple QR decompositions with permuted channel matrices for SDM/OFDM systems.

Journal ArticleDOI
TL;DR: A high-speed space-division multiplexing (SDM) multiple-input-multiple output (MIMO) decoder using efficient candidate searching is proposed by exploiting the characteristics of QR decomposition and sphere decoder for high throughput rate and low hardware-complexity.
Abstract: In this brief, a high-speed space-division multiplexing (SDM) multiple-input-multiple output (MIMO) decoder using efficient candidate searching is proposed by exploiting the characteristics of QR decomposition and sphere decoder for high throughput rate and low hardware-complexity. A process of efficient candidate searching by shifting the center of constellation with scalable radius reduces the processing time and improves the operational frequency. The proposed architecture can operate at a 166-MHz clock frequency, and the core area is smaller than results from using the K-best SD algorithm since large memory is not required to store extreme candidate paths. In our implementation, the core area is 0.675 mm using TSMC 90-nm technology. The average throughput of the proposed SDM-MIMO decoder is 95 Mbps with 64-QAM modulation at 30-dB signal-to-noise ratio.

Journal ArticleDOI
TL;DR: As multicore systems continue to gain ground in the high-performance computing world, linear algebra algorithms have been reformulated or new algorithms have to be developed in order to take advantage of multicore architectures.
Abstract: As multicore systems continue to gain ground in the high-performance computing world, linear algebra algorithms have to be reformulated or new algorithms have to be developed in order to take advan...

Journal ArticleDOI
Kyeong Jin Kim1, Tony Reid1, R.A. Iltis1
TL;DR: An iterative soft detection algorithm based on the QR decomposition and M-algorithm (soft-QRD- M) for MIMO-OFDM incorporating error correction coding with significant computational savings over the optimal sum-product algorithm.
Abstract: We propose an iterative soft detection algorithm based on the QR decomposition and M-algorithm (soft-QRD- M) for MIMO-OFDM incorporating error correction coding. The soft-QRD-M step generates approximate a posteriori probabilities (APPs) with significant computational savings over the optimal sum-product algorithm. Simulation results show comparable performance of the soft-QRD-M detector and SPA in a Turbo- coded iterative MIMO-OFDM receiver.

01 Jan 2008
TL;DR: In this paper, the authors used the idea of updating of QR factorization, rendering an algorithm which is much more scalable and much more suitable for implementation on a multi-core processor.
Abstract: The QR factorization is one of the most important operations in dense linear algebra, offering a numerically stable method for solving linear systems of equations including overdetermined and underdetermined systems. Classic implementation of the QR factorization suffers from performance limitations due to the use of matrix-vector type operations in the phase of panel factorization. These limitations can be remedied by using the idea of updating of QR factorization, rendering an algorithm, which is much more scalable and much more suitable for implementation on a multi-core processor. It is demonstrated how the potential of the CELL processor can be utilized to the fullest by employing the new algorithmic approach and successfully exploiting the capabilities of the CELL processor in terms of Instruction Level Parallelism and Thread-Level Parallelism.

Patent
12 Jun 2008
TL;DR: In this article, a decomposition of a channel matrix to a matrix Q and a matrix R through a QR decomposition is presented, and a detector for determining a candidate group of an n-th phase by estimating a plurality of transmit signal vectors.
Abstract: Receiving apparatus and method in a Multiple-Input Multiple-Output (MIMO) wireless communication system are provided. The receiver having N-ary receive antennas includes a decomposer for decomposing a channel matrix to a matrix Q and a matrix R through a QR decomposition; a detector for determining a candidate group of an n-th phase by estimating a plurality of transmit signal vectors by substituting a plurality of transmittable symbols into symbol combinations of a candidate group of a (n−1)-th phase as an n-th symbol and detecting (n+1)-th through N-th symbols using characteristics of the matrix R; a calculator for calculating square Euclidean distance values between the transmit signal vectors and a receive signal vector; and a determiner for determining the candidate group of the n-th phase by selecting transmit signal vectors having the smallest square Euclidean distance value among the transmit signal vectors.

Proceedings ArticleDOI
11 May 2008
TL;DR: A very low complexity QRD-M algorithm for MIMO systems that achieves the detection performance near to that of the MLD with negligibly low complexity.
Abstract: We present a very low complexity QRD-M algorithm for MIMO systems. The original QRD-M algorithm decomposes the MIMO channel matrix into upper triangular matrix and applies a limited tree search. To accomplish near- MLD(Maximum Likelihood Detection) performance for QRD-M algorithm, number of search points at each layer must be the modulation size. In the proposed scheme, each of survival branches are extended only to the corresponding QR decomposition (QRD)-based detection symbol in the next layer and its neighboring symbols in the constellation. Using this approach, we can significantly decrease the complexity of conventional QRD-M algorithm. Simulation results show that the proposed algorithm scheme achieves the detection performance near to that of the MLD with negligibly low complexity.

Journal ArticleDOI
TL;DR: This paper shows how to compute the QR-factorization of a rank structured matrix in an efficient way by means of the Givens-weight representation, and shows how this representation can be used as a preprocessing step for the solution of linear systems.
Abstract: In this paper we show how to compute the QR-factorization of a rank structured matrix in an efficient way by means of the Givens-weight representation. We also show how the QR-factorization can be used as a preprocessing step for the solution of linear systems. Provided the representation is chosen in an appropriate manner, the complexity of the QR-factorization is $O((ar^2+brs+cs^2)n)$ operations, where $n$ is the matrix size, $r$ is some measure for the average rank of the rank structure, $s$ is some measure for the bandwidth of the unstructured matrix part around the main diagonal, and $a,b,c\in \mathbb{R}$ are certain weighting parameters. The complexity of the solution of the linear system with given QR-factorization is then only $O((dr+es)n)$ operations for suitable $d,e\in \mathbb{R}$. The performance of this scheme will be demonstrated by some numerical experiments.

Journal ArticleDOI
TL;DR: Numerical experiments show that the novel algorithm outperforms available implementations of the Hessenberg QR algorithm already for small values of N and is proved to be backward stable.
Abstract: In this paper we address the problem of efficiently computing all the eigenvalues of a large N×N Hermitian matrix modified by a possibly non Hermitian perturbation of low rank. Previously proposed fast adaptations of the QR algorithm are considerably simplified by performing a preliminary transformation of the matrix by similarity into an upper Hessenberg form. The transformed matrix can be specified by a small set of parameters which are easily updated during the QR process. The resulting structured QR iteration can be carried out in linear time using linear memory storage. Moreover, it is proved to be backward stable. Numerical experiments show that the novel algorithm outperforms available implementations of the Hessenberg QR algorithm already for small values of N.

Proceedings ArticleDOI
15 Aug 2008
TL;DR: In this article, the authors proposed a low complexity maximum likelihood decoding (MLD) algorithm for orthogonal space-time block codes (OSTBCs) based on the real-valued lattice representation and QR decomposition.
Abstract: In this paper, we discuss three applications of the QR decomposition algorithm to decoding in a number of Multi-Input Multi-Output (MIMO) systems In the first application, we propose a new structure for MIMO Sphere Decoding (SD) We show that the new approach achieves 80% reduction in the overall complexity compared to conventional SD for a 2 times 2 system, and almost 50% reduction for the 4 times 4 and 6 times 6 cases In the second application, we propose a low complexity Maximum Likelihood Decoding (MLD) algorithm for quasi-orthogonal space-time block codes (QOSTBCs) We show that for N = 8 transmit antennas and 16-QAM modulation scheme, the new approach achieves > 97% reduction in the overall complexity compared to conventional MLD, and > 89% reduction compared to the most competitive reported algorithms in the literature This complexity gain becomes greater when the number of transmit antennas (N) or the constellation size (L) becomes larger In the third application, we propose a low complexity Maximum Likelihood Decoding (MLD) algorithm for orthogonal space-time block codes (OSTBCs) based on the real-valued lattice representation and QR decomposition For a system employing the well-known Alamouti OSTBC and 16-QAM modulation scheme, the new approach achieves > 87% reduction in the overall complexity compared to conventional MLD Moreover, we show that for square L-QAM constellations, the proposed algorithm reduces the decoding computational complexity from O(LN/2) for conventional MLD to O(L) for systems employing QOSTBCs and from O(L) for conventional MLD to O(radicL) for those employing OSTBCs without sacrificing the performance

Proceedings ArticleDOI
19 May 2008
TL;DR: Analytical and simulation results show that the proposed QRD-based preceded MIMO-OFDM system with reduced feedback can achieve impressive BER performance compared to the conventional schemes, yet with considerably reduced feedback.
Abstract: A QRD-based preceded MIMO-OFDM system with reduced feedback is proposed. Unlike the conventional preceding schemes in which the receiver has to feed back information about the channel frequency response of each carrier to the transmitter, the proposed system converts the MIMO-OFDM channel into layered channels by effectively exploiting the QR decomposition of the time-domain channel impulse response matrix. As a result, the receiver in the proposed system only needs to feed back information about one preceding matrix, regardless of the total number of carriers. Furthermore, a computationally efficient implementation scheme is devised for the proposed system. Analytical and simulation results show that the proposed scheme can achieve impressive BER performance compared to the conventional schemes, yet with considerably reduced feedback.