scispace - formally typeset
Search or ask a question

Showing papers on "QR decomposition published in 2001"


Proceedings Article
03 Jan 2001
TL;DR: It is shown that a relaxed version of the trace maximization problem possesses global optimal solutions which can be obtained by Computing a partial eigendecomposition of the Gram matrix, and the cluster assignment for each data vectors can be found by computing a pivoted QR decomposition ofThe eigenvector matrix.
Abstract: The popular K-means clustering partitions a data set by minimizing a sum-of-squares cost function. A coordinate descend method is then used to find local minima. In this paper we show that the minimization can be reformulated as a trace maximization problem associated with the Gram matrix of the data vectors. Furthermore, we show that a relaxed version of the trace maximization problem possesses global optimal solutions which can be obtained by computing a partial eigendecomposition of the Gram matrix, and the cluster assignment for each data vectors can be found by computing a pivoted QR decomposition of the eigenvector matrix. As a by-product we also derive a lower bound for the minimum of the sum-of-squares cost function.

657 citations


Journal ArticleDOI
TL;DR: A new efficient decoding algorithm based on QR decomposition is presented, which requires only a fraction of the computational effort compared with the standard decoding algorithm requiring the multiple calculation of the pseudo inverse of the channel matrix.
Abstract: Layered space-time codes have been designed to exploit the capacity advantage of multiple antenna systems in Rayleigh fading environments. A new efficient decoding algorithm based on QR decomposition is presented, which requires only a fraction of the computational effort compared with the standard decoding algorithm requiring the multiple calculation of the pseudo inverse of the channel matrix.

560 citations


Book
26 Feb 2001
TL;DR: In this paper, the authors propose a framework for convergence of operators based on the convergence of a sequence of subspaces and Inverse Iteration Error Analysis (IIA).
Abstract: SPECTRAL DECOMPOSITION Genera Notions Decompositions Spectral Sets of Finite Type Adjoint and Product Spaces SPECTRAL APPROXIMATION Convergence of operators Property U Property L Error Estimates IMPROVEMENT OF ACCURACY Iterative Refinement Acceleration FINITE RANK APPROXIMATIONS Approximations Based on Projection Approximations of Integral Operators A Posteriori Error Estimates MATRIX FORMULATIONS Finite Rank Operators Iterative Refinement Acceleration Numerical Examples MATRIX COMPUTATIONS QR Factorization Convergence of a Sequence of Subspaces QR Methods and Inverse Iteration Error Analysis REFERENCES INDEX Each chapter also includes exercises

170 citations


Proceedings ArticleDOI
21 May 2001
TL;DR: A numerical method for the determination of the identifiable parameters of parallel robots based on QR decomposition of the observation matrix of the calibration system is presented.
Abstract: Presents a numerical method for the determination of the identifiable parameters of parallel robots. The special case of Stewart-Gough 6 degrees-of-freedom parallel robots is studied for classical and self calibration methods, but this method can be generalized to any kind of parallel robot. The method is based on QR decomposition of the observation matrix of the calibration system. Numerical relations between the parameters which are identified and those which are not identifiable can be obtained for each method.

109 citations


Journal ArticleDOI
TL;DR: This paper presents a small-bulge multishift variation of the multishIFT QR algorithm that avoids the phenomenon of shift blurring, which retards convergence and limits the number of simultaneous shifts.
Abstract: This paper presents a small-bulge multishift variation of the multishift QR algorithm that avoids the phenomenon of shift blurring, which retards convergence and limits the number of simultaneous shifts. It replaces the large diagonal bulge in the multishift QR sweep with a chain of many small bulges. The small-bulge multishift QR sweep admits nearly any number of simultaneous shifts---even hundreds---without adverse effects on the convergence rate. With enough simultaneous shifts, the small-bulge multishift QR algorithm takes advantage of the level 3 BLAS, which is a special advantage for computers with advanced architectures.

107 citations


Journal ArticleDOI
TL;DR: A new deflation strategy that takes advantage of matrix perturbations outside of the subdiagonal entries of the Hessenberg QR iterate and identifies and deflates converged eigenvalues long before the classic small-subdiagonal strategy would.
Abstract: Aggressive early deflation is a QR algorithm deflation strategy that takes advantage of matrix perturbations outside of the subdiagonal entries of the Hessenberg QR iterate. It identifies and deflates converged eigenvalues long before the classic small-subdiagonal strategy would. The new deflation strategy enhances the performance of conventional large-bulge multishift QR algorithms, but it is particularly effective in combination with the small-bulge multishift QR algorithm. The small-bulge multishift QR sweep with aggressive early deflation maintains a high rate of execution of floating point operations while significantly reducing the number of operations required.

106 citations


Journal ArticleDOI
TL;DR: In this paper, an alternative orthonormalization method that computes the orthonormization basis from the right singular vectors of a matrix was proposed, which is typically more stable than classical Gram-Schmidt (GS).
Abstract: First, we consider the problem of orthonormalizing skinny (long) matrices. We propose an alternative orthonormalization method that computes the orthonormal basis from the right singular vectors of a matrix. Its advantages are that (a) all operations are matrix-matrix multiplications and thus cache efficient, (b) only one synchronization point is required in parallel implementations, and (c) it is typically more stable than classical Gram--Schmidt (GS). Second, we consider the problem of orthonormalizing a block of vectors against a previously orthonormal set of vectors and among itself. We solve this problem by alternating iteratively between a phase of GS and a phase of the new method. We provide error analysis and use it to derive bounds on how accurately the two successive orthonormalization phases should be performed to minimize total work performed. Our experiments confirm the favorable numerical behavior of the new method and its effectiveness on modern parallel computers.

99 citations


Journal ArticleDOI
01 May 2001
TL;DR: It is shown how detection of redundant rules can be introduced in OLS by a simple extension of the algorithm and discusses the performance of rank-revealing reduction methods and advocate the use of a less complex method based on the pivoted QR decomposition.
Abstract: Comments on recent publications about the use of orthogonal transforms to order and select rules in a fuzzy rule base. The techniques are well-known from linear algebra, and we comment on their usefulness in fuzzy modeling. The application of rank-revealing methods based on singular value decomposition (SVD) to rule reduction gives rather conservative results. They are essentially subset selection methods, and we show that such methods do not produce an "importance ordering", contrary to what has been stated in the literature. The orthogonal least-squares (OLS) method, which evaluates the contribution of the rules to the output, is more attractive for systems modeling. However, it has been shown to sometimes assign high importance to rules that are correlated in the premise. This hampers the generalization capabilities of the resulting model. We discuss the performance of rank-revealing reduction methods and advocate the use of a less complex method based on the pivoted QR decomposition. Further, we show how detection of redundant rules can be introduced in OLS by a simple extension of the algorithm. The methods are applied to a problem known from the literature and compared to results reported by other researchers.

77 citations


Journal ArticleDOI
TL;DR: It is claimed that the same techniques can be applied to the pruning problem, and thus they are a useful tool for compaction of information.

57 citations


Journal ArticleDOI
TL;DR: An iterative detection algorithm of an uncoded multi-transmitter multi-receiver system that is 4 to 8 times less complex than that of the V-BLAST OPT, while maintaining comparable performance.
Abstract: We study an iterative detection algorithm of an uncoded multi-transmitter multi-receiver system. The main data stream is demultiplexed into M substreams, and each substream is modulated independently then transmitted by its dedicated antenna. The receiver disposes of N ≥ M antennas. Over each receive antenna the signal is a superposition of the M substreams affected by independent fades and disturbed by AWGN. The detection algorithm is based on the QR decomposition of the channel transfer matrix which is then used to perform hard or soft inter-substream interference cancellation. Comparisons are done with the V-BLAST optimal order (OPT) detection algorithm. The proposed algorithm is 4 to 8 times less complex than that of the V-BLAST OPT, while maintaining comparable performance.

56 citations


Journal ArticleDOI
TL;DR: A fast algorithm to compute the R factor of the QR factorization of a block-Hankel matrix H, based on the generalized Schur algorithm, which allows to handle the rank-deficient case.

Book
30 Nov 2001
TL;DR: In this article, the authors present a data structure for Sparse Matrix Computation and Sparse Symmetric Linear System Solver (SQLS) based on QR decomposition and load flow analysis.
Abstract: Preface. Acknowledgments. 1. Introduction. 2. Object Orientation for Modeling Computations. 3. Data Structure for Sparse Matrix Computation. 4. Sparse Symmetric Linear System Solver. 5. Sparse QR Decomposition. 6. Optimization Methods. 7. Sparse LP and QP Solvers. 8. Load Flow Analysis. 9. Short Circuit Analysis. 10. Power System State Estimation. 11. Optimal Power Flow. 12. Power System Dynamics. Appendices. References. Index.

Journal ArticleDOI
TL;DR: In this article, poles and zeros are defined for continuous-time, linear, time-varying systems, where a zero is a function of time corresponding to an exponential input whose transmission to the output is blocked.
Abstract: Definition of poles and zeros are presented for continuous-time, linear, time-varying systems. For a linear, time-varying state equation, a set of time-varying poles defines a stability-preserving variable change relating the original state equation to an upper triangular state equation. A zero is a function of time corresponding to an exponential input whose transmission to the output is blocked. Both definitions are shown to be generalizations of existing definitions of poles and zeros for linear, time-varying systems and are consistent with the definitions for linear, time-invariant systems. A computation procedure is presented using a QR decomposition of the transition matrix for the state equation. A numerical example is given to illustrate this procedure.

Proceedings ArticleDOI
23 Apr 2001
TL;DR: The out-of-core Cholesky factorization implementation is shown to achieve up to 80% of peak performance on a 64 node configuration of the Cray T3E-600 and preliminary results for parallel implementation of the resulting OOC QR factorization algorithm are included.
Abstract: In this paper the parallel implementation of out-of-core Cholesky factorization is used to introduce the Parallel Outof-Core Linear Algebra Package (POOCLAPACK), a flexible infrastructure for parallel implementation of out-ofcore linear algebra operations. POOCLAPACK builds on the Parallel Linear Algebra Package (PLAPACK) for incore parallel dense linear algebra computation. Despite the extreme simplicity of POOCLAPACK, the out-of-core Cholesky factorization implementation is shown to achieve up to 80% of peak performance on a 64 node configuration of the Cray T3E-600. The insights gained from examining the Cholesky factorization are also applied to the much more difficult and important QR factorization operation. Preliminary results for parallel implementation of the resulting OOC QR factorization algorithm are included.

Journal ArticleDOI
TL;DR: It is shown that for semidefinite matrices the VSV decomposition should be computed via the ULV decomposition, while for indefinite matrices it must be computed through a URV-like decomposition that involves hypernormal rotations.
Abstract: We present a family of algorithms for computing symmetric rank-revealing VSV decompositions based on triangular factorization of the matrix. The VSV decomposition consists of a middle symmetric matrix that reveals the numerical rank in having three blocks with small norm, plus an orthogonal matrix whose columns span approximations to the numerical range and null space. We show that for semidefinite matrices the VSV decomposition should be computed via the ULV decomposition, while for indefinite matrices it must be computed via a URV-like decomposition that involves hypernormal rotations.

Journal ArticleDOI
Frank Uhlig1
TL;DR: In this article, the authors give constructive proofs of a number of results to generate a generalized real (or complex) orthogonal (or unitary) matrix as the product of generalized Householder matrices.

Book ChapterDOI
TL;DR: A highly practical fpa-variant of the new segment LLL-reduction of KOY and SCHNORR that performs well beyond dimension 1000 and is much faster than previous codes for LLL -reduction.
Abstract: We associate with an integer lattice basis a scaled basis that has orthogonal vectors of nearly equal length. The orthogonal vectors or the QR-factorization of a scaled basis can be accurately computed up to dimension 216 by Householder reflexions in floating point arithmetic (fpa) with 53 precision bits. We develop a highly practical fpa-variant of the new segment LLL-reduction of KOY AND SCHNORR [KS01]. The LLL-steps are guided in this algorithm by the Gram-Schmidt coefficients of an associated scaled basis. The new reduction algorithm is much faster than previous codes for LLL-reduction and performs well beyond dimension 1000.

Journal ArticleDOI
TL;DR: In this paper, componentwise perturbation analyses for Q and R in the QR factorization A=QR, ��$Q^\mathrm{T}Q=I$¯¯¯¯, R upper triangular, for a given real $m\times n$ matrix A of rank n.
Abstract: This paper gives componentwise perturbation analyses for Q and R in the QR factorization A=QR, $Q^\mathrm{T}Q=I$ , R upper triangular, for a given real $m\times n$ matrix A of rank n. Such specific analyses are important for example when the columns of A are badly scaled. First order perturbation bounds are given for both Q and R. The analyses more accurately reflect the sensitivity of the problem than previous such results. The condition number for R is bounded for a fixed n when the standard column pivoting strategy is used. This strategy also tends to improve the condition of Q, so usually the computed Q and R will both have higher accuracy when we use the standard column pivoting strategy. Practical condition estimators are derived. The assumptions on the form of the perturbation $\Delta A$ are explained and extended. Weaker rigorous bounds are also given.

Journal ArticleDOI
TL;DR: In this article, a two-step procedure based on QR decompositions is proposed as a solution algorithm for this type of identification problem, which will always deliver the exact solution and is much easier to implement than a Newton-type iteration algorithm.

Proceedings ArticleDOI
01 Jan 2001
TL;DR: The QR decomposition algorithm is used to demonstrate the capability of the tool to quickly generate high performance parallel implementations, and results are presented showing how the control logic complexity and number of clock cycles vary with these transformations.
Abstract: Compaan is a software tool capable of automatically translating nested loop programs, written in Matlab, into parallel Kahn process network descriptions suitable for implementation in hardware. In this paper we present a tool for converting these process networks into FPGA implementations. The QR decomposition algorithm is used to demonstrate the capability of the tool to quickly generate high performance parallel implementations. This allows us to rapidly explore a range of transformations, such as loop unrolling and skewing, to generate a circuit that meets the requirements of a particular application. We present results showing how the control logic complexity and number of clock cycles vary with these transformations.

Journal ArticleDOI
TL;DR: In this paper, the possibility of obtaining certain direct sum decompositions, for a given complete two-dimensional behavior, is investigated and proved to be equivalent to the zero skew-primeness property of suitable matrix pairs.
Abstract: In this paper, the possibility of obtaining certain direct sum decompositions, for a given complete two-dimensional behavior, are investigated and proved to be equivalent to the zero skew-primeness property of suitable matrix pairs. Some known decomposition theorems for two-dimensional complete behaviors are later obtained as simple corollaries of this general result.

Journal ArticleDOI
TL;DR: These algorithms are based on earlier work on computing row and column counts for sparse Cholesky factorization, plus an efficient method to compute the column elimination tree of a sparse matrix without explicitly forming the product of the matrix and its transpose.
Abstract: We present algorithms to determine the number of nonzeros in each row and column of the factors of a sparse matrix, for both the QR factorization and the LU factorization with partial pivoting. The algorithms use only the nonzero structure of the input matrix, and run in time nearly linear in the number of nonzeros in that matrix. They may be used to set up data structures or schedule parallel operations in advance of the numerical factorization. The row and column counts we compute are upper bounds on the actual counts. If the input matrix is strong Hall and there is no coincidental numerical cancellation, the counts are exact for QR factorization and are the tightest bounds possible for LU factorization. These algorithms are based on our earlier work on computing row and column counts for sparse Cholesky factorization, plus an efficient method to compute the column elimination tree of a sparse matrix without explicitly forming the product of the matrix and its transpose.

Proceedings ArticleDOI
20 Nov 2001
TL;DR: An analysis was performed to determine the required fixed-point precision needed to compute the weights for an adaptive array system operating in the presence of interference, and found that the precision of a floating-point computation can be well approximated with a 13-bit to 19-bit word length fixed point computation for typical system jammer-to-noise levels.
Abstract: Adaptive array systems require the periodic solution of the well-known w=R1v equation in order to compute optimum adaptive array weights. The covariance matrix R is estimated by forming a product of noise sample matrices X:R=XHX. The operations-count cost of performing the required matrix inversion in real time can be prohibitively high for a high bandwidth system with a large number of sensors. Specialized hardware may be required to execute the requisite computations in real time. The choice of algorithm to perform these computations must be considered in conjunction with the hardware technology used to implement the computation engine. A systolic architecture implementation of the Givens rotation method for matrix inversion was selected to perform adaptive weight computation. The bit-level systolic approach enables a simple ASIC design and a very low power implementation. The bit-level systolic architecture must be implemented with fixed-point arithmetic to simplify the propagation of data through the computation cells. The Givens rotation approach has a highly parallel implementation and is ideally suited for a systolic implementation. Additionally, the adaptive weights are computed directly from the sample matrix X in the voltage domain, thus reducing the required dynamic range needed in carrying out the computations. An analysis was performed to determine the required fixed-point precision needed to compute the weights for an adaptive array system operating in the presence of interference. Based on the analysis results, it was determined that the precision of a floating-point computation can be well approximated with a 13-bit to 19-bit word length fixed point computation for typical system jammer-to-noise levels. This property has produced an order-of-magnitude reduction in required hardware complexity. A synthesis-based ASIC design process was used to generate preliminary layouts. These layouts were used to estimate the area and throughput of the VLSI QR decomposition architecture. The results show that this QR decomposition process, when implemented into a full-custom design, provides a computation time that is two orders of magnitude faster than a state-of-the-art microprocessor.© (2001) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
06 May 2001
TL;DR: This paper proposes a method to diagnose the multiple faults in linear analog circuits using the QR factorization and structural incident signal matrix to identify ambiguity groups in the test verification matrix.
Abstract: This paper proposes a method to diagnose the multiple faults in linear analog circuits. The test equation establishes the relationship between the measured responses and faulty excitations due to faulty elements. The QR factorization is applied to identify ambiguity groups in the test verification matrix. The suspicious faulty excitations of the minimum size are determined. Faulty parameters are evaluated using the structural incident signal matrix. Finally, this method is illustrated with an example circuit.

Journal ArticleDOI
TL;DR: The four different problems of DGELS are essentially reduced to two, by use of explicit transposition of A, and by avoiding redundant computations in the update of B the authors reduce the work needed to compute the minimum norm solution.
Abstract: We present new algorithms for computing the linear least squares solution to overdeter- mined linear systems and the minimum norm solution to underdetermined linear systems. For both problems, we consider the standard formulation minAX − BF and the trans- posed formulation minA T X − BF , i.e, four different problems in all. The functionality of our implementation corresponds to that of the LAPACK routine DGELS. The new implementation is significantly faster and simpler. It outperforms the LAPACK DGELS for all matrix sizes tested. The improvement is usually 50-100% and it is as high as 400%. The four different problems of DGELS are essentially reduced to two, by use of explicit transposition of A. By explicit transposition we avoid computing Householder transformations on vectors with large stride. The QR factorization of block columns of A is performed using a recursive level-3 algorithm. By interleaving updates of B with the factorization of A, we reduce the number of floating point operations performed for the linear least squares problem. By avoiding redundant computations in the update of B we reduce the work needed to compute the minimum norm solution. Finally, we outline fully recursive algorithms for the four problems of DGELS as well as for QR factorization.

Patent
04 Oct 2001
TL;DR: In this article, an improved symbol decision is generated of a desired subchannel of the signal vector by first generating a baseline decision for the sub-channel, which is then multiplied by a unitary matrix generated from a QR decomposition of another channel matrix.
Abstract: A system and method for performing extended space-time processing. An improved symbol decision is generated of a desired sub-channel of the signal vector by first generating a baseline decision for the sub-channel. A contribution of a strongest sub-channel is subtracted from the signal vector to generate a modified signal vector. The modified signal vector is multiplied by a unitary matrix generated from a QR decomposition of another channel matrix. Channel interference of the remaining sub-channels of the modified signal vector is cancelled from a remaining sub-channel.

Proceedings ArticleDOI
07 Feb 2001
TL;DR: The fine-grained parallelism of the two-sided block-Jacobi algorithm for the singular value decomposition (SVD) of matrix A/spl isin/R/sup m/spl times/n, shows that all updates can be realized by orthogonal modified Givens rotations.
Abstract: We analyse the fine-grained parallelism of the two-sided block-Jacobi algorithm for the singular value decomposition (SVD) of matrix A/spl isin/R/sup m/spl times/n/, m/spl ges/n. The algorithm involves the class CO of parallel orderings on the two-dimensional toroidal mesh with p processors. The mathematical background is based on the QR decomposition (QRD) of local data matrices and on the triangular Kogbetliantz algorithm (TKA) for local SVDs in the diagonal mesh processors. Subsequent updates of local matrices in the diagonal as well as nondiagonal mesh processors are required. WE show that all updates can be realized by orthogonal modified Givens rotations. These rotations can be efficiently pipelined in parallel in the horizontal and vertical rings of /spl radic/p processors through the toroidal mesh. For one mesh processor our solution requires O[(m+n)/sup 2///sub p/] systolic processing elements (PEs). O(m/sup 2//p) local memory registers and O[(m+n)/sup 2//p] additional delay elements. The time complexity of our solution is O[(m+n/sup 3/2//p/sup 3/4/)/spl Delta/] time steps per one global iteration where /spl Delta/ is the length of the global synchronization time step that is given by evaluation and application of two modified Givens rotations in TKA.

Journal ArticleDOI
TL;DR: A unified algebraic transformation approach is presented for designing parallel recursive and adaptive digital filters and singular value decomposition (SVD) algorithms, based on the explorations of some algebraic properties of the target algorithms' representations.
Abstract: In this paper, a unified algebraic transformation approach is presented for designing parallel recursive and adaptive digital filters and singular value decomposition (SVD) algorithms. The approach is based on the explorations of some algebraic properties of the target algorithms' representations. Several typical modern digital signal processing examples are presented to illustrate the applications of the technique. They include the cascaded orthogonal recursive digital filter, the Givens rotation-based adaptive inverse QR algorithm for channel equalization, and the QR decomposition-based SVD algorithms. All three examples exhibit similar throughput constraints. There exist long feedback loops in the algorithms' signal flow graph representation, and the critical path is proportional to the size of the problem. Applying the proposed algebraic transformation techniques, parallel architectures are obtained for all three examples. For cascade orthogonal recursive filter, retiming transformation and orthogonal matrix decompositions (or pseudo-commutativity) are applied to obtain parallel filter architectures with critical path of five Givens rotations. For adaptive inverse QR algorithm, the commutativity and associativity of the matrix multiplications are applied to obtain parallel architectures with critical path of either four Givens rotations or three Givens rotations plus two multiply-add operations, whichever turns out to be larger. For SVD algorithms, retiming and associativity of the matrix multiplications are applied to derive parallel architectures with critical path of eight Givens rotations. The critical paths of all parallel architectures are independent of the problem size as compared with being proportional to the problem size in the original sequential algorithms. Parallelism is achieved at the expense of slight increase (or the same for the SVD case) in the algorithms' computational complexity.

Journal ArticleDOI
TL;DR: In this paper, a direct regularization method using QR factorization for linear discrete ill-posed problems is proposed, which requires a parameter which is similar to the regularization parameter of Tikhonov's method.
Abstract: In this paper we propose a direct regularization method using QR factorization for solving linear discrete ill-posed problems. The decomposition of the coefficient matrix requires less computational cost than the singular value decomposition which is usually used for Tikhonov regularization. This method requires a parameter which is similar to the regularization parameter of Tikhonov's method. In order to estimate the optimal parameter, we apply three well-known parameter choice methods for Tikhonov regularization.

Journal ArticleDOI
TL;DR: HAT is presented as a solution to break the bottleneck of a high-throughput implementation introduced by the inherent recursive computation in the QRD based adaptive filters and allows a linear speedup in the throughput rate by a linear increase in hardware complexity.
Abstract: A novel transformation, referred to as hybrid annihilation transformation (HAT), for pipelining the QR decomposition (QRD) based least square adaptive filters has been developed. HAT provides a unified framework for the derivation of high-throughput/low-power VLSI architectures of three kinds of QRD adaptive filters, namely, QRD recursive least-square (LS) adaptive filters, QRD LS lattice adaptive filters, and QRD multichannel LS lattice adaptive filters. In this paper, HAT is presented as a solution to break the bottleneck of a high-throughput implementation introduced by the inherent recursive computation in the QRD based adaptive filters. The most important feature of the proposed solution is that it does not introduce any approximation in the entire filtering process. Therefore, it causes no performance degradation no matter how deep the filter is pipelined. It allows a linear speedup in the throughput rate by a linear increase in hardware complexity. The sampling rate can be traded off for power reduction with lower supply voltage for applications where high-speed is not required. The proposed transformation is addressed both analytically, with mathematical proofs, and experimentally, with computer simulation results on its applications in wireless code division multiple access (CDMA) communications, conventional digital communications and multichannel linear predictions.