Showing papers on "QR decomposition published in 2001"

PDF

Open Access

Proceedings Article•

Spectral Relaxation for K-means Clustering

[...]

Hongyuan Zha¹, Xiaofeng He¹, Chris Ding², Ming Gu³, Horst D. Simon² - Show less +1 more•Institutions (3)

Pennsylvania State University¹, Lawrence Berkeley National Laboratory², University of California, Berkeley³

03 Jan 2001

TL;DR: It is shown that a relaxed version of the trace maximization problem possesses global optimal solutions which can be obtained by Computing a partial eigendecomposition of the Gram matrix, and the cluster assignment for each data vectors can be found by computing a pivoted QR decomposition ofThe eigenvector matrix.

...read moreread less

Abstract: The popular K-means clustering partitions a data set by minimizing a sum-of-squares cost function. A coordinate descend method is then used to find local minima. In this paper we show that the minimization can be reformulated as a trace maximization problem associated with the Gram matrix of the data vectors. Furthermore, we show that a relaxed version of the trace maximization problem possesses global optimal solutions which can be obtained by computing a partial eigendecomposition of the Gram matrix, and the cluster assignment for each data vectors can be found by computing a pivoted QR decomposition of the eigenvector matrix. As a by-product we also derive a lower bound for the minimum of the sum-of-squares cost function.

...read moreread less

657 citations

Journal Article•DOI•

Efficient algorithm for decoding layered space-time codes

[...]

Dirk Wubben¹, Ronald Böhnke¹, J. Rinas¹, Volker Kühn¹, Karl-Dirk Kammeyer¹ - Show less +1 more•Institutions (1)

University of Bremen¹

25 Oct 2001-Electronics Letters

TL;DR: A new efficient decoding algorithm based on QR decomposition is presented, which requires only a fraction of the computational effort compared with the standard decoding algorithm requiring the multiple calculation of the pseudo inverse of the channel matrix.

...read moreread less

Abstract: Layered space-time codes have been designed to exploit the capacity advantage of multiple antenna systems in Rayleigh fading environments. A new efficient decoding algorithm based on QR decomposition is presented, which requires only a fraction of the computational effort compared with the standard decoding algorithm requiring the multiple calculation of the pseudo inverse of the channel matrix.

...read moreread less

560 citations

Book•

Spectral Computations for Bounded Operators

[...]

Mario Ahues, Alain Largillier, Balmohan V. Limaye

26 Feb 2001

TL;DR: In this paper, the authors propose a framework for convergence of operators based on the convergence of a sequence of subspaces and Inverse Iteration Error Analysis (IIA).

...read moreread less

Abstract: SPECTRAL DECOMPOSITION Genera Notions Decompositions Spectral Sets of Finite Type Adjoint and Product Spaces SPECTRAL APPROXIMATION Convergence of operators Property U Property L Error Estimates IMPROVEMENT OF ACCURACY Iterative Refinement Acceleration FINITE RANK APPROXIMATIONS Approximations Based on Projection Approximations of Integral Operators A Posteriori Error Estimates MATRIX FORMULATIONS Finite Rank Operators Iterative Refinement Acceleration Numerical Examples MATRIX COMPUTATIONS QR Factorization Convergence of a Sequence of Subspaces QR Methods and Inverse Iteration Error Analysis REFERENCES INDEX Each chapter also includes exercises

...read moreread less

170 citations

Proceedings Article•DOI•

Identifiable parameters for parallel robots kinematic calibration

[...]

S. Besnard¹, Wisama Khalil¹•Institutions (1)

Centre national de la recherche scientifique¹

21 May 2001

TL;DR: A numerical method for the determination of the identifiable parameters of parallel robots based on QR decomposition of the observation matrix of the calibration system is presented.

...read moreread less

Abstract: Presents a numerical method for the determination of the identifiable parameters of parallel robots. The special case of Stewart-Gough 6 degrees-of-freedom parallel robots is studied for classical and self calibration methods, but this method can be generalized to any kind of parallel robot. The method is based on QR decomposition of the observation matrix of the calibration system. Numerical relations between the parameters which are identified and those which are not identifiable can be obtained for each method.

...read moreread less

109 citations

Journal Article•DOI•

The Multishift QR Algorithm. Part I: Maintaining Well-Focused Shifts and Level 3 Performance

[...]

Karen S. Braman, Ralph Byers, Roy Mathias

01 Apr 2001-SIAM Journal on Matrix Analysis and Applications

TL;DR: This paper presents a small-bulge multishift variation of the multishIFT QR algorithm that avoids the phenomenon of shift blurring, which retards convergence and limits the number of simultaneous shifts.

...read moreread less

Abstract: This paper presents a small-bulge multishift variation of the multishift QR algorithm that avoids the phenomenon of shift blurring, which retards convergence and limits the number of simultaneous shifts. It replaces the large diagonal bulge in the multishift QR sweep with a chain of many small bulges. The small-bulge multishift QR sweep admits nearly any number of simultaneous shifts---even hundreds---without adverse effects on the convergence rate. With enough simultaneous shifts, the small-bulge multishift QR algorithm takes advantage of the level 3 BLAS, which is a special advantage for computers with advanced architectures.

...read moreread less

107 citations

Journal Article•DOI•

The Multishift QR Algorithm. Part II: Aggressive Early Deflation

[...]

Karen Braman¹, Ralph Byers¹, Roy Mathias¹•Institutions (1)

University of Kansas¹

01 Apr 2001-SIAM Journal on Matrix Analysis and Applications

TL;DR: A new deflation strategy that takes advantage of matrix perturbations outside of the subdiagonal entries of the Hessenberg QR iterate and identifies and deflates converged eigenvalues long before the classic small-subdiagonal strategy would.

...read moreread less

Abstract: Aggressive early deflation is a QR algorithm deflation strategy that takes advantage of matrix perturbations outside of the subdiagonal entries of the Hessenberg QR iterate. It identifies and deflates converged eigenvalues long before the classic small-subdiagonal strategy would. The new deflation strategy enhances the performance of conventional large-bulge multishift QR algorithms, but it is particularly effective in combination with the small-bulge multishift QR algorithm. The small-bulge multishift QR sweep with aggressive early deflation maintains a high rate of execution of floating point operations while significantly reducing the number of operations required.

...read moreread less

106 citations

Journal Article•DOI•

A Block Orthogonalization Procedure with Constant Synchronization Requirements

[...]

Andreas Stathopoulos, Kesheng Wu

01 Jun 2001-SIAM Journal on Scientific Computing

TL;DR: In this paper, an alternative orthonormalization method that computes the orthonormization basis from the right singular vectors of a matrix was proposed, which is typically more stable than classical Gram-Schmidt (GS).

...read moreread less

Abstract: First, we consider the problem of orthonormalizing skinny (long) matrices. We propose an alternative orthonormalization method that computes the orthonormal basis from the right singular vectors of a matrix. Its advantages are that (a) all operations are matrix-matrix multiplications and thus cache efficient, (b) only one synchronization point is required in parallel implementations, and (c) it is typically more stable than classical Gram--Schmidt (GS). Second, we consider the problem of orthonormalizing a block of vectors against a previously orthonormal set of vectors and among itself. We solve this problem by alternating iteratively between a phase of GS and a phase of the new method. We provide error analysis and use it to derive bounds on how accurately the two successive orthonormalization phases should be performed to minimize total work performed. Our experiments confirm the favorable numerical behavior of the new method and its effectiveness on modern parallel computers.

...read moreread less

99 citations

Journal Article•DOI•

Rule base reduction: some comments on the use of orthogonal transforms

[...]

M. Setnes, Robert Babuska¹•Institutions (1)

Delft University of Technology¹

01 May 2001

TL;DR: It is shown how detection of redundant rules can be introduced in OLS by a simple extension of the algorithm and discusses the performance of rank-revealing reduction methods and advocate the use of a less complex method based on the pivoted QR decomposition.

...read moreread less

Abstract: Comments on recent publications about the use of orthogonal transforms to order and select rules in a fuzzy rule base. The techniques are well-known from linear algebra, and we comment on their usefulness in fuzzy modeling. The application of rank-revealing methods based on singular value decomposition (SVD) to rule reduction gives rather conservative results. They are essentially subset selection methods, and we show that such methods do not produce an "importance ordering", contrary to what has been stated in the literature. The orthogonal least-squares (OLS) method, which evaluates the contribution of the rules to the output, is more attractive for systems modeling. However, it has been shown to sometimes assign high importance to rules that are correlated in the premise. This hampers the generalization capabilities of the resulting model. We discuss the performance of rank-revealing reduction methods and advocate the use of a less complex method based on the pivoted QR decomposition. Further, we show how detection of redundant rules can be introduced in OLS by a simple extension of the algorithm. The methods are applied to a problem known from the literature and compared to results reported by other researchers.

...read moreread less

77 citations

Journal Article•DOI•

Improved RAN sequential prediction using orthogonal techniques

[...]

Moisés Salmerón¹, Julio Ortega¹, Carlos G. Puntonet¹, Alberto Prieto¹•Institutions (1)

University of Granada¹

01 Oct 2001-Neurocomputing

TL;DR: It is claimed that the same techniques can be applied to the pruning problem, and thus they are a useful tool for compaction of information.

...read moreread less

57 citations

Journal Article•DOI•

Iterative QR Detection for BLAST

[...]

Mohamed Oussama Damen, Karim Abed-Meraim, S. Burykh

01 Dec 2001-Wireless Personal Communications

TL;DR: An iterative detection algorithm of an uncoded multi-transmitter multi-receiver system that is 4 to 8 times less complex than that of the V-BLAST OPT, while maintaining comparable performance.

...read moreread less

Abstract: We study an iterative detection algorithm of an uncoded multi-transmitter multi-receiver system. The main data stream is demultiplexed into M substreams, and each substream is modulated independently then transmitted by its dedicated antenna. The receiver disposes of N ≥ M antennas. Over each receive antenna the signal is a superposition of the M substreams affected by independent fades and disturbed by AWGN. The detection algorithm is based on the QR decomposition of the channel transfer matrix which is then used to perform hard or soft inter-substream interference cancellation. Comparisons are done with the V-BLAST optimal order (OPT) detection algorithm. The proposed algorithm is 4 to 8 times less complex than that of the V-BLAST OPT, while maintaining comparable performance.

...read moreread less

56 citations

Journal Article•DOI•

A fast algorithm for subspace state-space system identification via exploitation of the displacement structure

[...]

Nicola Mastronardi¹, Daniel Kressner, Vasile Sima¹, Paul Van Dooren², Sabine Van Huffel¹ - Show less +1 more•Institutions (2)

Katholieke Universiteit Leuven¹, Université catholique de Louvain²

01 Jul 2001-Journal of Computational and Applied Mathematics

TL;DR: A fast algorithm to compute the R factor of the QR factorization of a block-Hankel matrix H, based on the generalized Schur algorithm, which allows to handle the rank-deficient case.

...read moreread less

Book•

Computational Methods for Large Sparse Power Systems Analysis: An Object Oriented Approach

[...]

S. A. Soman, S.A. Khaparde, Shubha Pandit

30 Nov 2001

TL;DR: In this article, the authors present a data structure for Sparse Matrix Computation and Sparse Symmetric Linear System Solver (SQLS) based on QR decomposition and load flow analysis.

...read moreread less

Abstract: Preface. Acknowledgments. 1. Introduction. 2. Object Orientation for Modeling Computations. 3. Data Structure for Sparse Matrix Computation. 4. Sparse Symmetric Linear System Solver. 5. Sparse QR Decomposition. 6. Optimization Methods. 7. Sparse LP and QP Solvers. 8. Load Flow Analysis. 9. Short Circuit Analysis. 10. Power System State Estimation. 11. Optimal Power Flow. 12. Power System Dynamics. Appendices. References. Index.

...read moreread less

Journal Article•DOI•

On the poles and zeros of linear, time-varying systems

[...]

Richard T. O'Brien¹, Pablo A. Iglesias²•Institutions (2)

United States Naval Academy¹, Johns Hopkins University²

01 May 2001-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: In this article, poles and zeros are defined for continuous-time, linear, time-varying systems, where a zero is a function of time corresponding to an exponential input whose transmission to the output is blocked.

...read moreread less

Abstract: Definition of poles and zeros are presented for continuous-time, linear, time-varying systems. For a linear, time-varying state equation, a set of time-varying poles defines a stability-preserving variable change relating the original state equation to an upper triangular state equation. A zero is a function of time corresponding to an exponential input whose transmission to the output is blocked. Both definitions are shown to be generalizations of existing definitions of poles and zeros for linear, time-varying systems and are consistent with the definitions for linear, time-invariant systems. A computation procedure is presented using a QR decomposition of the transition matrix for the state equation. A numerical example is given to illustrate this procedure.

...read moreread less

Proceedings Article•DOI•

Parallel out-of-core cholesky and QR factorizations with POOCLAPACK

[...]

Brian Gunter, Wesley C. Reiley, R.A. van de Geijn

23 Apr 2001

TL;DR: The out-of-core Cholesky factorization implementation is shown to achieve up to 80% of peak performance on a 64 node configuration of the Cray T3E-600 and preliminary results for parallel implementation of the resulting OOC QR factorization algorithm are included.

...read moreread less

Abstract: In this paper the parallel implementation of out-of-core Cholesky factorization is used to introduce the Parallel Outof-Core Linear Algebra Package (POOCLAPACK), a flexible infrastructure for parallel implementation of out-ofcore linear algebra operations. POOCLAPACK builds on the Parallel Linear Algebra Package (PLAPACK) for incore parallel dense linear algebra computation. Despite the extreme simplicity of POOCLAPACK, the out-of-core Cholesky factorization implementation is shown to achieve up to 80% of peak performance on a 64 node configuration of the Cray T3E-600. The insights gained from examining the Cholesky factorization are also applied to the much more difficult and important QR factorization operation. Preliminary results for parallel implementation of the resulting OOC QR factorization algorithm are included.

...read moreread less

Journal Article•DOI•

Computing Symmetric Rank-Revealing Decompositions via Triangular Factorization

[...]

Per Christian Hansen, Plamen Y. Yalamov

01 Feb 2001-SIAM Journal on Matrix Analysis and Applications

TL;DR: It is shown that for semidefinite matrices the VSV decomposition should be computed via the ULV decomposition, while for indefinite matrices it must be computed through a URV-like decomposition that involves hypernormal rotations.

...read moreread less

Abstract: We present a family of algorithms for computing symmetric rank-revealing VSV decompositions based on triangular factorization of the matrix. The VSV decomposition consists of a middle symmetric matrix that reveals the numerical rank in having three blocks with small norm, plus an orthogonal matrix whose columns span approximations to the numerical range and null space. We show that for semidefinite matrices the VSV decomposition should be computed via the ULV decomposition, while for indefinite matrices it must be computed via a URV-like decomposition that involves hypernormal rotations.

...read moreread less

Journal Article•DOI•

Constructive ways for generating (generalized) real orthogonal matrices as products of (generalized) symmetries

[...]

Frank Uhlig¹•Institutions (1)

Auburn University¹

01 Aug 2001-Linear Algebra and its Applications

TL;DR: In this article, the authors give constructive proofs of a number of results to generate a generalized real (or complex) orthogonal (or unitary) matrix as the product of generalized Householder matrices.

...read moreread less

Book Chapter•DOI•

Segment LLL-Reduction with Floating Point Orthogonalization

[...]

Henrik Koy¹, Claus-Peter Schnorr•Institutions (1)

Deutsche Bank¹

29 Mar 2001-Lecture Notes in Computer Science

TL;DR: A highly practical fpa-variant of the new segment LLL-reduction of KOY and SCHNORR that performs well beyond dimension 1000 and is much faster than previous codes for LLL -reduction.

...read moreread less

Abstract: We associate with an integer lattice basis a scaled basis that has orthogonal vectors of nearly equal length. The orthogonal vectors or the QR-factorization of a scaled basis can be accurately computed up to dimension 216 by Householder reflexions in floating point arithmetic (fpa) with 53 precision bits. We develop a highly practical fpa-variant of the new segment LLL-reduction of KOY AND SCHNORR [KS01]. The LLL-steps are guided in this algorithm by the Gram-Schmidt coefficients of an associated scaled basis. The new reduction algorithm is much faster than previous codes for LLL-reduction and performs well beyond dimension 1000.

...read moreread less

Journal Article•DOI•

Componentwise perturbation analyses for the QR factorization

[...]

Xiao-Wen Chang, Christopher C. Paige

01 Apr 2001-Numerische Mathematik

TL;DR: In this paper, componentwise perturbation analyses for Q and R in the QR factorization A=QR, ��$Q^\mathrm{T}Q=I$¯¯¯¯, R upper triangular, for a given real $m\times n$ matrix A of rank n.

...read moreread less

Abstract: This paper gives componentwise perturbation analyses for Q and R in the QR factorization A=QR, $Q^\mathrm{T}Q=I$ , R upper triangular, for a given real $m\times n$ matrix A of rank n. Such specific analyses are important for example when the columns of A are badly scaled. First order perturbation bounds are given for both Q and R. The analyses more accurately reflect the sensitivity of the problem than previous such results. The condition number for R is bounded for a fixed n when the standard column pivoting strategy is used. This strategy also tends to improve the condition of Q, so usually the computed Q and R will both have higher accuracy when we use the standard column pivoting strategy. Practical condition estimators are derived. The assumptions on the form of the perturbation $\Delta A$ are explained and extended. Weaker rigorous bounds are also given.

...read moreread less

Journal Article•DOI•

Long run recursive VAR models and QR decompositions

[...]

Mathias Hoffmann¹•Institutions (1)

University of Southampton¹

01 Oct 2001-Economics Letters

TL;DR: In this article, a two-step procedure based on QR decompositions is proposed as a solution algorithm for this type of identification problem, which will always deliver the exact solution and is much easier to implement than a Newton-type iteration algorithm.

...read moreread less

Proceedings Article•DOI•

Compilation from Matlab to process networks realised in FPGA

[...]

T. Harriss¹, R. Walke¹, Bart Kienhuis², Ed F. Deprettere²•Institutions (2)

Qinetiq¹, Leiden University²

01 Jan 2001

TL;DR: The QR decomposition algorithm is used to demonstrate the capability of the tool to quickly generate high performance parallel implementations, and results are presented showing how the control logic complexity and number of clock cycles vary with these transformations.

...read moreread less

Abstract: Compaan is a software tool capable of automatically translating nested loop programs, written in Matlab, into parallel Kahn process network descriptions suitable for implementation in hardware. In this paper we present a tool for converting these process networks into FPGA implementations. The QR decomposition algorithm is used to demonstrate the capability of the tool to quickly generate high performance parallel implementations. This allows us to rapidly explore a range of transformations, such as loop unrolling and skewing, to generate a circuit that meets the requirements of a particular application. We present results showing how the control logic complexity and number of clock cycles vary with these transformations.

...read moreread less

Journal Article•DOI•

A note on the direct sum decomposition of two-dimensional behaviors

[...]

Mauro Bisiacco, Maria Elena Valcher

01 Apr 2001-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: In this paper, the possibility of obtaining certain direct sum decompositions, for a given complete two-dimensional behavior, is investigated and proved to be equivalent to the zero skew-primeness property of suitable matrix pairs.

...read moreread less

Abstract: In this paper, the possibility of obtaining certain direct sum decompositions, for a given complete two-dimensional behavior, are investigated and proved to be equivalent to the zero skew-primeness property of suitable matrix pairs. Some known decomposition theorems for two-dimensional complete behaviors are later obtained as simple corollaries of this general result.

...read moreread less

Journal Article•DOI•

Computing row and column counts for sparse QR and LU factorization

[...]

John R. Gilbert¹, Xiaoye S. Li², Esmond G. Ng², Barry W. Peyton³•Institutions (3)

PARC¹, Lawrence Berkeley National Laboratory², Oak Ridge National Laboratory³

01 Jan 2001-Bit Numerical Mathematics

TL;DR: These algorithms are based on earlier work on computing row and column counts for sparse Cholesky factorization, plus an efficient method to compute the column elimination tree of a sparse matrix without explicitly forming the product of the matrix and its transpose.

...read moreread less

Abstract: We present algorithms to determine the number of nonzeros in each row and column of the factors of a sparse matrix, for both the QR factorization and the LU factorization with partial pivoting. The algorithms use only the nonzero structure of the input matrix, and run in time nearly linear in the number of nonzeros in that matrix. They may be used to set up data structures or schedule parallel operations in advance of the numerical factorization. The row and column counts we compute are upper bounds on the actual counts. If the input matrix is strong Hall and there is no coincidental numerical cancellation, the counts are exact for QR factorization and are the tightest bounds possible for LU factorization. These algorithms are based on our earlier work on computing row and column counts for sparse Cholesky factorization, plus an efficient method to compute the column elimination tree of a sparse matrix without explicitly forming the product of the matrix and its transpose.

...read moreread less

Proceedings Article•DOI•

Adaptive array beamforming with fixed-point arithmetic matrix inversion using Givens rotations

[...]

Daniel V. Rabinkin¹, William S. Song¹, M. Michael Vai¹, Huy T. Nguyen¹•Institutions (1)

Massachusetts Institute of Technology¹

20 Nov 2001

TL;DR: An analysis was performed to determine the required fixed-point precision needed to compute the weights for an adaptive array system operating in the presence of interference, and found that the precision of a floating-point computation can be well approximated with a 13-bit to 19-bit word length fixed point computation for typical system jammer-to-noise levels.

...read moreread less

Abstract: Adaptive array systems require the periodic solution of the well-known w=R1v equation in order to compute optimum adaptive array weights. The covariance matrix R is estimated by forming a product of noise sample matrices X:R=XHX. The operations-count cost of performing the required matrix inversion in real time can be prohibitively high for a high bandwidth system with a large number of sensors. Specialized hardware may be required to execute the requisite computations in real time. The choice of algorithm to perform these computations must be considered in conjunction with the hardware technology used to implement the computation engine. A systolic architecture implementation of the Givens rotation method for matrix inversion was selected to perform adaptive weight computation. The bit-level systolic approach enables a simple ASIC design and a very low power implementation. The bit-level systolic architecture must be implemented with fixed-point arithmetic to simplify the propagation of data through the computation cells. The Givens rotation approach has a highly parallel implementation and is ideally suited for a systolic implementation. Additionally, the adaptive weights are computed directly from the sample matrix X in the voltage domain, thus reducing the required dynamic range needed in carrying out the computations. An analysis was performed to determine the required fixed-point precision needed to compute the weights for an adaptive array system operating in the presence of interference. Based on the analysis results, it was determined that the precision of a floating-point computation can be well approximated with a 13-bit to 19-bit word length fixed point computation for typical system jammer-to-noise levels. This property has produced an order-of-magnitude reduction in required hardware complexity. A synthesis-based ASIC design process was used to generate preliminary layouts. These layouts were used to estimate the area and throughput of the VLSI QR decomposition architecture. The results show that this QR decomposition process, when implemented into a full-custom design, provides a computation time that is two orders of magnitude faster than a state-of-the-art microprocessor.© (2001) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

...read moreread less

Proceedings Article•DOI•

Multiple fault diagnosis of analog circuits by locating ambiguity groups of test equation

[...]

Janusz A. Starzyk¹, D. Liu•Institutions (1)

Ohio University¹

06 May 2001

TL;DR: This paper proposes a method to diagnose the multiple faults in linear analog circuits using the QR factorization and structural incident signal matrix to identify ambiguity groups in the test verification matrix.

...read moreread less

Abstract: This paper proposes a method to diagnose the multiple faults in linear analog circuits. The test equation establishes the relationship between the measured responses and faulty excitations due to faulty elements. The QR factorization is applied to identify ambiguity groups in the test verification matrix. The suspicious faulty excitations of the minimum size are determined. Faulty parameters are evaluated using the structural incident signal matrix. Finally, this method is illustrated with an example circuit.

...read moreread less

Journal Article•DOI•

A faster and simpler recursive algorithm for the lapack routine dgels

[...]

Erik Elmroth¹, Fred G. Gustavson²•Institutions (2)

Umeå University¹, IBM²

01 Dec 2001-Bit Numerical Mathematics

TL;DR: The four different problems of DGELS are essentially reduced to two, by use of explicit transposition of A, and by avoiding redundant computations in the update of B the authors reduce the work needed to compute the minimum norm solution.

...read moreread less

Abstract: We present new algorithms for computing the linear least squares solution to overdeter- mined linear systems and the minimum norm solution to underdetermined linear systems. For both problems, we consider the standard formulation minAX − BF and the trans- posed formulation minA T X − BF , i.e, four different problems in all. The functionality of our implementation corresponds to that of the LAPACK routine DGELS. The new implementation is significantly faster and simpler. It outperforms the LAPACK DGELS for all matrix sizes tested. The improvement is usually 50-100% and it is as high as 400%. The four different problems of DGELS are essentially reduced to two, by use of explicit transposition of A. By explicit transposition we avoid computing Householder transformations on vectors with large stride. The QR factorization of block columns of A is performed using a recursive level-3 algorithm. By interleaving updates of B with the factorization of A, we reduce the number of floating point operations performed for the linear least squares problem. By avoiding redundant computations in the update of B we reduce the work needed to compute the minimum norm solution. Finally, we outline fully recursive algorithms for the four problems of DGELS as well as for QR factorization.

...read moreread less

Patent•

Layered space time processing in a multiple antenna system

[...]

Akbar M. Sayeed¹, Ke Liu¹•Institutions (1)

Wisconsin Alumni Research Foundation¹

04 Oct 2001

TL;DR: In this article, an improved symbol decision is generated of a desired subchannel of the signal vector by first generating a baseline decision for the sub-channel, which is then multiplied by a unitary matrix generated from a QR decomposition of another channel matrix.

...read moreread less

Abstract: A system and method for performing extended space-time processing. An improved symbol decision is generated of a desired sub-channel of the signal vector by first generating a baseline decision for the sub-channel. A contribution of a strongest sub-channel is subtracted from the signal vector to generate a modified signal vector. The modified signal vector is multiplied by a unitary matrix generated from a QR decomposition of another channel matrix. Channel interference of the remaining sub-channels of the modified signal vector is cancelled from a remaining sub-channel.

...read moreread less

Proceedings Article•DOI•

Multi-level parallelism in the block-Jacobi SVD algorithm

[...]

G. Oksa¹, M. Vajtersic•Institutions (1)

Slovak Academy of Sciences¹

07 Feb 2001

TL;DR: The fine-grained parallelism of the two-sided block-Jacobi algorithm for the singular value decomposition (SVD) of matrix A/spl isin/R/sup m/spl times/n, shows that all updates can be realized by orthogonal modified Givens rotations.

...read moreread less

Abstract: We analyse the fine-grained parallelism of the two-sided block-Jacobi algorithm for the singular value decomposition (SVD) of matrix A/spl isin/R/sup m/spl times/n/, m/spl ges/n. The algorithm involves the class CO of parallel orderings on the two-dimensional toroidal mesh with p processors. The mathematical background is based on the QR decomposition (QRD) of local data matrices and on the triangular Kogbetliantz algorithm (TKA) for local SVDs in the diagonal mesh processors. Subsequent updates of local matrices in the diagonal as well as nondiagonal mesh processors are required. WE show that all updates can be realized by orthogonal modified Givens rotations. These rotations can be efficiently pipelined in parallel in the horizontal and vertical rings of /spl radic/p processors through the toroidal mesh. For one mesh processor our solution requires O[(m+n)/sup 2///sub p/] systolic processing elements (PEs). O(m/sup 2//p) local memory registers and O[(m+n)/sup 2//p] additional delay elements. The time complexity of our solution is O[(m+n/sup 3/2//p/sup 3/4/)/spl Delta/] time steps per one global iteration where /spl Delta/ is the length of the global synchronization time step that is given by evaluation and application of two modified Givens rotations in TKA.

...read moreread less

Journal Article•DOI•

A unified algebraic transformation approach for parallel recursive and adaptive filtering and SVD algorithms

[...]

Jun Ma¹, Keshab K. Parhi, E.F. Deprettere¹•Institutions (1)

University of Minnesota¹

01 Feb 2001-IEEE Transactions on Signal Processing

TL;DR: A unified algebraic transformation approach is presented for designing parallel recursive and adaptive digital filters and singular value decomposition (SVD) algorithms, based on the explorations of some algebraic properties of the target algorithms' representations.

...read moreread less

Abstract: In this paper, a unified algebraic transformation approach is presented for designing parallel recursive and adaptive digital filters and singular value decomposition (SVD) algorithms. The approach is based on the explorations of some algebraic properties of the target algorithms' representations. Several typical modern digital signal processing examples are presented to illustrate the applications of the technique. They include the cascaded orthogonal recursive digital filter, the Givens rotation-based adaptive inverse QR algorithm for channel equalization, and the QR decomposition-based SVD algorithms. All three examples exhibit similar throughput constraints. There exist long feedback loops in the algorithms' signal flow graph representation, and the critical path is proportional to the size of the problem. Applying the proposed algebraic transformation techniques, parallel architectures are obtained for all three examples. For cascade orthogonal recursive filter, retiming transformation and orthogonal matrix decompositions (or pseudo-commutativity) are applied to obtain parallel filter architectures with critical path of five Givens rotations. For adaptive inverse QR algorithm, the commutativity and associativity of the matrix multiplications are applied to obtain parallel architectures with critical path of either four Givens rotations or three Givens rotations plus two multiply-add operations, whichever turns out to be larger. For SVD algorithms, retiming and associativity of the matrix multiplications are applied to derive parallel architectures with critical path of eight Givens rotations. The critical paths of all parallel architectures are independent of the problem size as compared with being proportional to the problem size in the original sequential algorithms. Parallelism is achieved at the expense of slight increase (or the same for the SVD case) in the algorithms' computational complexity.

...read moreread less

Journal Article•DOI•

Regularization Using QR Factorization and the Estimation of the Optimal Parameter

[...]

Takashi Kitagawa¹, Susumu Nakata¹, Y. Hosoda•Institutions (1)

University of Tsukuba¹

01 Dec 2001-Bit Numerical Mathematics

TL;DR: In this paper, a direct regularization method using QR factorization for linear discrete ill-posed problems is proposed, which requires a parameter which is similar to the regularization parameter of Tikhonov's method.

...read moreread less

Abstract: In this paper we propose a direct regularization method using QR factorization for solving linear discrete ill-posed problems. The decomposition of the coefficient matrix requires less computational cost than the singular value decomposition which is usually used for Tikhonov regularization. This method requires a parameter which is similar to the regularization parameter of Tikhonov's method. In order to estimate the optimal parameter, we apply three well-known parameter choice methods for Tikhonov regularization.

...read moreread less

Journal Article•DOI•

Hybrid annihilation transformation (HAT) for pipelining QRD-based least-square adaptive filters

[...]

Z. Chi¹, Jun Ma¹, Keshab K. Parhi•Institutions (1)

University of Minnesota¹

01 Jul 2001-IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing

TL;DR: HAT is presented as a solution to break the bottleneck of a high-throughput implementation introduced by the inherent recursive computation in the QRD based adaptive filters and allows a linear speedup in the throughput rate by a linear increase in hardware complexity.

...read moreread less

Abstract: A novel transformation, referred to as hybrid annihilation transformation (HAT), for pipelining the QR decomposition (QRD) based least square adaptive filters has been developed. HAT provides a unified framework for the derivation of high-throughput/low-power VLSI architectures of three kinds of QRD adaptive filters, namely, QRD recursive least-square (LS) adaptive filters, QRD LS lattice adaptive filters, and QRD multichannel LS lattice adaptive filters. In this paper, HAT is presented as a solution to break the bottleneck of a high-throughput implementation introduced by the inherent recursive computation in the QRD based adaptive filters. The most important feature of the proposed solution is that it does not introduce any approximation in the entire filtering process. Therefore, it causes no performance degradation no matter how deep the filter is pipelined. It allows a linear speedup in the throughput rate by a linear increase in hardware complexity. The sampling rate can be traded off for power reduction with lower supply voltage for applications where high-speed is not required. The proposed transformation is addressed both analytically, with mathematical proofs, and experimentally, with computer simulation results on its applications in wireless code division multiple access (CDMA) communications, conventional digital communications and multichannel linear predictions.

...read moreread less