scispace - formally typeset
Search or ask a question

Showing papers on "QR decomposition published in 1996"


Journal ArticleDOI
TL;DR: Two algorithms are presented for computing rank-revealing QR factorizations that are nearly as efficient as QR with column pivoting for most problems and take O (ran2) floating-point operations in the worst case.
Abstract: Given anm n matrixM withm > n, it is shown that there exists a permutation FI and an integer k such that the QR factorization MYI= Q(Ak ckBk) reveals the numerical rank of M: the k k upper-triangular matrix Ak is well conditioned, IlCkll2 is small, and Bk is linearly dependent on Ak with coefficients bounded by a low-degree polynomial in n. Existing rank-revealing QR (RRQR) algorithms are related to such factorizations and two algorithms are presented for computing them. The new algorithms are nearly as efficient as QR with column pivoting for most problems and take O (ran2) floating-point operations in the worst case.

698 citations


Journal ArticleDOI
TL;DR: It is shown that the use of Level 3 BLAS can lead to very significant gains in performance and the design and implementation of a parallel QR decomposition algorithm for a large sparse matrix A is described.
Abstract: We describe the design and implementation of a parallel QR decomposition algorithm for a large sparse matrix A. The algorithm is based on the multifrontal approach and makes use of Householder transformations. The tasks are distributed among processors according to an assembly tree which is built from the symbolic factorization of the matrix ATA. We first address uniprocessor issues and then discuss the multiprocessor implementation of the method. We consider the parallelization of both the factorization phase and the solve phase. We use relaxation of the sparsity structure of both the original matrix and the frontal matrices to improve the performance. We show that, in this case, the use of Level 3 BLAS can lead to very significant gains in performance. We use the eight processor Alliant˜FX/80 at CERFACS to illustrate our discussion.

56 citations


Proceedings ArticleDOI
08 Sep 1996
TL;DR: The proposed SVD-QR method selects subsets of independent basis functions which are sufficient to represent a given system, through operations on a nonsingleton fuzzy basis function matrix, and provides an estimate of the number of necessary basis functions.
Abstract: Nonsingleton fuzzy logic systems (NSFLSs) are generalizations of singleton fuzzy logic systems (FLSs), that are capable of handling set-valued input. In this paper, we extend the theory of NSFLSs by presenting an algorithm to design and train such systems. Since they generalize singleton FLSs, the algorithm is equally applicable to both types of systems. The proposed SVD-QR method selects subsets of independent basis functions which are sufficient to represent a given system, through operations on a nonsingleton fuzzy basis function matrix. In addition, it provides an estimate of the number of necessary basis functions. We present examples to illustrate the ability of the SVD-QR method to operate in uncertain environments.

54 citations


Journal ArticleDOI
TL;DR: The mechanism by which the shifts are transmitted through the matrix in the course of a multishift QR iteration is identified andumerical evidence showing that the mechanism works well when m is small and poorly whenm is large is presented.

47 citations


Journal ArticleDOI
TL;DR: This paper describes a much simpler generalized Schur-type algorithm to compute similar low-rank approximants of a matrix H such that H - \Ha has 2-norm less than $\epsilon$.
Abstract: The usual way to compute a low-rank approximant of a matrix $H$ is to take its singular value decomposition (SVD) and truncate it by setting the small singular values equal to 0. However, the SVD is computationally expensive. This paper describes a much simpler generalized Schur-type algorithm to compute similar low-rank approximants. For a given matrix $H$ which has $d$ singular values larger than $\epsilon$, we find all rank $d$ approximants $\Ha$ such that $H - \Ha$ has 2-norm less than $\epsilon$. The set of approximants includes the truncated SVD approximation. The advantages of the Schur algorithm are that it has a much lower computational complexity (similar to a QR factorization), and directly produces a description of the column space of the approximants. This column space can be updated and downdated in an on-line scheme, amenable to implementation on a parallel array of processors.

40 citations


Book ChapterDOI
18 Aug 1996
TL;DR: An hierarchical approach for design of performance models for parallel algorithms in linear algebra based on a parallel machine model and the hierarchical structure of the ScaLAPACK library is presented.
Abstract: Performance models are important in the design and analysis of linear algebra software for scalable high performance computer systems. They can be used for estimation of the overhead in a parallel algorithm and measuring the impact of machine characteristics and block sizes on the execution time. We present an hierarchical approach for design of performance models for parallel algorithms in linear algebra based on a parallel machine model and the hierarchical structure of the ScaLAPACK library. This suggests three levels of performance models corresponding to existing ScaLAPACK routines. As a proof of the concept a performance model of the high level QR factorization routine pdgeqrf is presented. We also derive performance models of lower level ScaLAPACK building blocks such as pdgeqr2, pdlarft, pdlarfb, pdlarfg, pdlarf, pdnrm2, and pdscal, which are used in the high level model for pdgeqrf. Predicted performance results are compared to measurements on an Intel Paragon XP/S system. The accuracy of the top level model is over 90% for measured matrix and block sizes and different process grid configurations.

37 citations


Journal ArticleDOI
TL;DR: A new recursive orthogonal estimation algorithm is derived which updates both the model structure and the parameters of nonlinear models on-line and minimises the loss function at every selection step by selecting significant regression variables.

30 citations


Journal ArticleDOI
TL;DR: A new scaled tangent rotation (STAR) is used instead of the Givens rotations used in QRD-RLS, designed such that fine-grain pipelining can be accomplished with little hardware overhead.
Abstract: The QR decomposition-based recursive least-squares (RLS) adaptive filtering algorithm (referred to as QRD-RLS) is very popular because it has good numerical properties and can be mapped onto a systolic array. However, in this architecture, pipelining of the operations within the systolic array cells is difficult. Pipelining would be necessary to operate at high speeds or to reduce the power dissipation in a VLSI implementation. Pipelining QRD-RLS using look-ahead techniques leads to a large hardware overhead. The square-root free forms of QRD-RLS are also difficult to pipeline. In this paper, a new scaled tangent rotation (STAR) is used instead of the Givens rotations used in QRD-RLS. The STAR-based RLS algorithm (referred to as STAR-RLS) is designed such that fine-grain pipelining can be accomplished with little hardware overhead The scaled tangent rotations are not exactly orthogonal transformations but tend to become orthogonal asymptotically. The STAR-RLS algorithm is square-root free and has less complexity and lower intercell communication than the QRD-RLS algorithm. The properties of the STAR-RLS algorithm, such as stability, numerical property, and dynamic range, are examined with and without pipelining and compared with those of QRD-RLS. Simulation results are presented to compare the performance of STAR-RLS and QRD-RLS algorithms.

26 citations


Journal ArticleDOI
TL;DR: This paper shows that a modification of Han’s algorithm allows the iterates to be computed using QR factorization with column pivoting, which significantly reduces the computational cost and allows efficient updating/downdating techniques to be used.
Abstract: In 1980, S.-P. Han [Least-Squares Solution of Linearlnequalities, Tech. Report TR–2141, Mathematics Research Center, University of Wisconsin-Madison, 1980] described a finitely terminating algorithm for solving a system $Ax \leqslant b$ of linear inequalities in a least squares sense. The algorithm uses a singular value decomposition of a submatrix of A on each iteration, making it impractical for all but the smallest problems. This paper shows that a modification of Han’s algorithm allows the iterates to be computed using QR factorization with column pivoting, which significantly reduces the computational cost and allows efficient updating/downdating techniques to be used.The effectiveness of this modification is demonstrated, implementation details are given, and the behaviour of the algorithm discussed. Theoretical and numerical results are shown from the application of the algorithm to linear separability problems.

24 citations


Proceedings ArticleDOI
27 Mar 1996
TL;DR: A technique, based on checksum and reverse computation, that enables high-performance matrix operations to be fault-tolerant with low overhead is presented and analysis of the overhead of checkpointing and recovery confirms that this technique can provide fault tolerance.
Abstract: In this paper, we present a technique, based on checksum and reverse computation, that enables high-performance matrix operations to be fault-tolerant with low overhead. We have implemented this technique on five matrix operations: matrix multiplication, Cholesky factorization, LU factorization, QR factorization and Hessenberg reduction. The overhead of checkpointing and recovery is analyzed both theoretically and experimentally. These analyses confirm that our technique can provide fault tolerance for these high-performance matrix operations with low overhead.

20 citations


Journal ArticleDOI
TL;DR: Algorithms which apply self-scaling fast plane rotations to the QR decomposition for stiff least squares problems show that both fast and standard Givens rotation-based algorithms produce accurate results, regardless of row sorting and even with extremely large weights, when equality-constrained most squares problems are solved by the weighting method.

Journal ArticleDOI
TL;DR: An algorithm is derived that improves the Gram-Schmidt downdating algorithm when the columns in the Q factor are not orthonormal and produces far more accurate results than the gram-Sch Schmidt downdation algorithm for certain ill-conditioned problems.
Abstract: A new algorithm for downdating a QR decomposition is presented. We show that, when the columns in the Q factor from the Modified Gram-Schmidt QR decomposition of a matrixX are exactly orthonormal, the Gram-Schmidt downdating algorithm for the QR decomposition ofX is equivalent to downdating the full Householder QR decomposition of the matrixX augmented by ann ×n zero matrix on top. Using this relation, we derive an algorithm that improves the Gram-Schmidt downdating algorithm when the columns in the Q factor are not orthonormal. Numerical test results show that the new algorithm produces far more accurate results than the Gram-Schmidt downdating algorithm for certain ill-conditioned problems.

Proceedings ArticleDOI
03 Nov 1996
TL;DR: The coordinate rotation digital computer (CORDIC) algorithm is an alternative solution to the traditional multiplication, division, and square root version of QR decomposition that converges faster than the conventional CORDIC algorithm with the penalty of storing all the scale factors in a ROM.
Abstract: The coordinate rotation digital computer (CORDIC) algorithm is an alternative solution to the traditional multiplication, division, and square root version of QR decomposition. This approach is better as it uses only adders and shifters to do all the calculations. The area that is saved can be used to speed up the CORDIC algorithm even further. The critically damped CORDIC (CD-CORDIC) algorithm converges faster than the conventional CORDIC algorithm with the penalty of storing all the scale factors in a ROM. The ROM size is 2[N-1/2]+1 words, where N is the word length of the processor. The CD-CORDIC algorithm is twice as fast when the word length of the processor is 24 bit.

Proceedings ArticleDOI
08 Oct 1996
TL;DR: This paper describes methodologies for the on-line calculation of the weights to be used in the linear combination of the received radar data by a set of N antennas and M pulse repetition intervals (PRIs) for the derivation of the adapted space-time filter output.
Abstract: This paper describes methodologies for the on-line calculation of the weights to be used in the linear combination of the received radar data by a set of N antennas and M pulse repetition intervals (PRIs) for the derivation of the adapted space-time filter output. The numerically robust and computationally efficient QR-decomposition is used to derive the so called MVDR (minimum variance distortionless response) and lattice algorithms. Both algorithms are represented as a systolic computational flow graph. The MVDR is able to produce more than one adapted beam focused along different DOAs and Doppler frequencies in the radar surveillance volume. The lattice algorithm offers a computational saving; in fact its computational burden is O(N/sup 2/M) in lieu of O(N/sup 2/M/sup 2/). A comprehensive analysis of the numerical robustness of the algorithms is presented when the CORDIC-algorithm is used to compute the QR decomposition (QRD). Benchmarks on general purpose parallel computers and on a VLSI CORDIC (co-ordinate rotation digital computer) board are presented.

Journal ArticleDOI
TL;DR: A version of PLS regression is described that intends to combine the computer hardware implementation advantages of the algebraic technique of ‘QR decomposition’ with the statistical, interpretative and computational advantages of P LS regression.
Abstract: A version of PLS regression is described that intends to combine the computer hardware implementation advantages of the algebraic technique of ‘QR decomposition’ with the statistical, interpretative and computational advantages of PLS regression. With a QR decomposition based on Givens rotations, the QR-PLS technique appears to be suited for hardware parallelization without sacrificing the modelling flexibility of PLSR. © 1996 by John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This work proposes fast algorithms for direction-of-arrival (DOA) finding without computing (partial) eigendecompositions in large ESPRIT arrays and presents numerical simulation results to illustrate the efficiency of the proposed algorithms.

Proceedings ArticleDOI
22 Oct 1996
TL;DR: This paper presents an algorithm, based on QR decomposition, that can approximately reveal the rank and signal subspace of a matrix and simultaneously perform a subspace projection and has the potential for very simple parallel implementation.
Abstract: Conventional least squares minimization beamforming algorithms suffer from `weight jitter' when small data sequences are used. One method for overcoming this problem requires that the SVD of the data matrix is calculated and the `signal' and `noise' subspaces identified. A more stable beampattern can then be formed by projecting the least squares weight vector onto the appropriate subspace. The SVD is computationally expensive to perform and difficult to implement in a parallel architecture. Several approximate `rank revealing' algorithms have been presented of late (e.g. URV, RRQR) which have a much reduced computational load. However, being `two-sided' decompositions, they all suffer from implementation difficulties. In this paper we present an algorithm, based on QR decomposition, that can approximately reveal the rank and signal subspace of a matrix and simultaneously perform a subspace projection. The algorithm has the potential for very simple parallel implementation.

Journal ArticleDOI
01 Mar 1996
TL;DR: The authors present an algorithm-based fault tolerant scheme for recursive least squares, appropriate for applications in adaptive signal processing and extended to a fault-tolerant algorithm for linearly constrained QR decomposition.
Abstract: The authors present an algorithm-based fault tolerant scheme for recursive least squares, appropriate for applications in adaptive signal processing. The technique is closely focused on the Gentleman-Kung-McWhirter triangular systolic array architecture for QR decomposition. Assuming that the array is subject to transient faults, widely separated in time and each affecting a single processor, an algorithm is given that corrects the full triangular array with a computational overhead equivalent, on average, to the interpolation of a single extra vector into the data stream. No output residuals are lost in the fault recovery. The analysis is extended to a fault-tolerant algorithm for linearly constrained QR decomposition.

Journal ArticleDOI
TL;DR: It is shown that one can always reorder a weak Hall matrix into block upper triangular form so that there is no increase in the fill incurred by the $QR$ factorization.
Abstract: In $QR$ factorization of an $m \times n$ matrix $A$ ($m \geq n$), the orthogonal factor $Q$ is often stored implicitly as an $m \times n$ lower trapezoidal matrix $W$, known as the Householder matrix. When the sparsity of $A$ is to be exploited, the factorization is often preceded by a symbolic factorization step, which computes a data structure in which the nonzero entries of $W$ and $R$ are computed and stored. This is achieved by computing an upper bound on the nonzero structure of these factors, based solely on the nonzero structure of $A$. In this paper we use a well-known upper bound on the nonzero structure of $W$ to obtain an upper bound on the nonzero structure of $Q$. Let $U$ be the matrix consisting of the first $n$ columns of $Q$. One interesting feature of the new bound is that the bound on $W$'s structure is identical to the lower trapezoidal part of the bound on $U$'s structure. We show that if $A$ is strong Hall and has no zero entry on its main diagonal, then the bounds on the nonzero structures of $W$ and $U$ are the smallest possible based solely on the nonzero structure of $A$. We then use this result to obtain corresponding smallest upper bounds in the case where $A$ is weak Hall, is in block upper triangular form, and has no zero entry on its main diagonal. Finally, we show that one can always reorder a weak Hall matrix into block upper triangular form so that there is no increase in the fill incurred by the $QR$ factorization.

Journal ArticleDOI
TL;DR: An error analysis is given for block downdating using Corrected Seminormal Equations (CSNE), and it is shown that for ill-conditioned downdates this method gives more accurate results than the algorithms based on the LINPACK downdation algorithm or hyperbolic transformations.
Abstract: A new perturbation result is presented for the problem of block downdating a Cholesky decompositionX T X = R T R. Then, a condition number for block downdating is proposed and compared to other downdating condition numbers presented in literature recently. This new condition number is shown to give a tighter bound in many cases. Using the perturbation theory, an error analysis is presented for the block downdating algorithms based on the LINPACK downdating algorithm and stabilized hyperbolic transformations. An error analysis is also given for block downdating using Corrected Seminormal Equations (CSNE), and it is shown that for ill-conditioned downdates this method gives more accurate results than the algorithms based on the LINPACK downdating algorithm or hyperbolic transformations. We classify the problems for which the CSNE downdating method produces a downdated upper triangular matrix which is comparable in accuracy to the upper triangular factor obtained from the QR decomposition by Householder transformations on the data matrix with the row block deleted.

Proceedings ArticleDOI
27 Mar 1996
TL;DR: A paradigm for the efficient utilization of commercially available processors to implement serial algorithms on a parallel architecture as well as an algorithm for the parallel solution of a nonhomogeneous system of linear equations with constant coefficients is evaluated.
Abstract: The paper evaluates a paradigm for the efficient utilization of commercially available processors to implement serial algorithms on a parallel architecture. We present an architecture based on this paradigm as well as an algorithm for the parallel solution of a nonhomogeneous system of linear equations with constant coefficients. Major advantages stem from its systolic-like array structure and the versatility of fully programmable processor elements. The method uses a Givens rotation implementation of the well known QR factorization. Unlike other direct methods of factorization followed by backsubstitution, this implementation of the algorithm avoids the backsubstitution bottleneck. The computational complexity of this feedforward direct method of solving nonsingular systems of linear equations is similar to that of QR matrix factorization. Due to the programmability of the processor in the array, the mapping of this algorithm extends to an entire family of algorithms. We map this family of algorithms onto the novel architecture and present a comprehensive performance analysis. Performance results identify the algorithm/architecture combination as a cost effective, efficient method which exhibits speedup that is directly proportional to the number of processors used.

Proceedings ArticleDOI
03 Jun 1996
TL;DR: A neural nonlinear predictor for one dimensional signals is presented, based on a combination of linearization and QR decomposition that allows a fast adapting algorithm.
Abstract: A neural nonlinear predictor for one dimensional signals is presented. It is based on a combination of linearization and QR decomposition that allows a fast adapting algorithm. The predictor is used in a speech compression algorithm that has proven to be superior to linear based models. The compression and training are done simultaneously, allowing the network to continually adapt to the signal. The results presented show that this algorithm outperforms a typical LPC coding algorithm.

Proceedings ArticleDOI
12 May 1996
TL;DR: The problem of acoustic echo cancelation is addressed using an adaptive IIR filtering algorithm based on a QR decomposition and a pseudo-linear regression that yields a computational complexity of O(N/sup 2/) multiply-accumulates.
Abstract: In this paper the problem of acoustic echo cancelation is addressed using an adaptive IIR filtering algorithm based on a QR decomposition and a pseudo-linear regression. The proposed algorithm yields a computational complexity of O(N/sup 2/) multiply-accumulates. In echo cancelation simulations it shows fast convergence in single talk and double talk periods, and proves to be stable if the near-end signal and received signal are correlated due to a far-end echo path.

Journal ArticleDOI
TL;DR: A family of algorithms parameterized by the number of processors available P, arithmetic grain aggregation parameters g1, g2, …, gP, and communication grain aggregation parameter h, which computer the QR factorization of a matrix A ∈ Cm × n with minimal latency is presented.
Abstract: Rapid computation of the QR factorization of a matrix is fundamental to many scientific and engineering problems. The paper presents a family of algorithms parameterized by the number of processors available P, arithmetic grain aggregation parameters g1, g2, …, gP, and communication grain aggregation parameter h, which computer the QR factorization of a matrix A ∈ Cm × n with minimal latency. The approach is particularly well suited for dedicated distributed memory architectures such as linear arrays of INMOS Transputers, Texas Instruments C40s or Analog Devices 21060s.

Book ChapterDOI
15 Apr 1996
TL;DR: A strategy to reduce fill-in in order to get memory savings and decrease the computation times of the QR decomposition with column pivoting of a sparse matrix by means of Modified Gram-Schmidt orthogonalization.
Abstract: We present a parallel computational method for the QR decomposition with column pivoting of a sparse matrix by means of Modified Gram-Schmidt orthogonalization. Nonzero elements of the matrix M to be decomposed are stored in a one-dimensional doubly linked list data structure. We discuse a strategy to reduce fill-in in order to get memory savings and decrease the computation times. As an application of QR decomposition, we describe the least squares problem. This algorithm was designed for a message passing multiprocessor and has been evaluated on a Cray T3D, using the Harwell-Boeing sparse matrix collection.

Book ChapterDOI
25 Sep 1996
TL;DR: It is proved that the Householder QR factorization is likely to be inherently sequential as well and the problem of speedup vs non degeneracy and accuracy in numerical algorithms is investigated.
Abstract: Gaussian Elimination with Partial Pivoting and Householder QR factorization are two very popular methods to solve linear systems. Implementations of these two methods are provided in state-of-the-art numerical libraries and packages, such as LAPACK and MATLAB. Gaussian Elimination with Partial Pivoting was already known to be P-complete. Here we prove that the Householder QR factorization is likely to be inherently sequential as well. We also investigate the problem of speedup vs non degeneracy and accuracy in numerical algorithms.

Journal ArticleDOI
TL;DR: A simple proof of the transposed QR algorithm which permits the singular value decomposition of a matrix to be introduced to a first course in matrix algebra in the context of iterative procedures is presented.
Abstract: This paper presents a simple proof of the transposed QR algorithm which permits the singular value decomposition of a matrix to be introduced to a first course in matrix algebra in the context of iterative procedures.


Book ChapterDOI
01 Jan 1996
TL;DR: In this article, a class of ABS methods for solving the KT equations is presented, and several methods in this class are compared with the classical methods of Aasen and the method based upon the QR factorization with Householder rotations.
Abstract: In this paper we present a class of ABS methods for solving the KT equations. We compare several methods in this class with the classical methods of Aasen and the method based upon the QR factorization with Householder rotations. When the number of degrees of freedom is small two of the considered ABS methods are faster than the Aasen and the QR based methods by a factor respectively about 2 and 4. Moreover when the first block in the KT equations is diagonal and a sequence of problems have to be solved where only such a block changes, for small number of degrees of freedom the solution can be updated in order two operations by the ABS methods, while order three operations are required by the other methods. Finally, numerical testing over 300 problems has shown that the ABS methods give more accurate results in about 80

Proceedings Article
01 Sep 1996
TL;DR: A new fast multichannel QR decomposition (QRD) least squares (LS) adaptive algorithm is presented in this paper that is based exclusively on numerically robust orthogonal Givens rotations and offers substantially reduced computational complexity compared to previously derivedMultichannel fast QRD schemes.
Abstract: A new fast multichannel QR decomposition (QRD) least squares (LS) adaptive algorithm is presented in this paper. The algorithm deals with the general case of channels with different number of delay elements and is based exclusively on numerically robust orthogonal Givens rotations. The new scheme processes each channel separately and as a result it comprises scalar operations only. Moreover, the proposed algorithm is implementable on a very regular systolic architecture and offers substantially reduced computational complexity compared to previously derived multichannel fast QRD schemes.