scispace - formally typeset
Search or ask a question

Showing papers on "QR decomposition published in 1994"


Journal ArticleDOI
TL;DR: This article is to show how several different variants of the recursive least-squares algorithm can be directly related to the widely studied Kalman filtering problem of estimation and control.
Abstract: Adaptive filtering algorithms fall into four main groups: recursive least squares (RLS) algorithms and the corresponding fast versions; QR- and inverse QR-least squares algorithms; least squares lattice (LSL) and QR decomposition-based least squares lattice (QRD-LSL) algorithms; and gradient-based algorithms such as the least-mean square (LMS) algorithm. Our purpose in this article is to present yet another approach, for the sake of achieving two important goals. The first one is to show how several different variants of the recursive least-squares algorithm can be directly related to the widely studied Kalman filtering problem of estimation and control. Our second important goal is to present all the different versions of the RLS algorithm in computationally convenient square-root forms: a prearray of numbers has to be triangularized by a rotation, or a sequence of elementary rotations, in order to yield a postarray of numbers. The quantities needed to form the next prearray can then be read off from the entries of the postarray, and the procedure can be repeated; the explicit forms of the rotation matrices are not needed in most cases. >

470 citations


Journal ArticleDOI
TL;DR: The classical and modified Gram-Schmidt (CGS) orthogonalization is one of the fundamental procedures in linear algebra as mentioned in this paper, and it is equivalent to the factorization AQ1R, where Q1∈Rm×n with orthonormal columns and R upper triangular.

242 citations


Journal ArticleDOI
TL;DR: In this paper, the use of the torus-wrap mapping in general dense matrix algorithms is studied from both theoretical and practical viewpoints and it is proved that this assignment scheme leads to dense matrix algorithm that achieve the lower bound on interprocessor communication.
Abstract: Dense linear systems of equations are quite common in science and engineering, arising in boundary element methods, least squares problems, and other settings. Massively parallel computers will be necessary to solve the large systems required by scientists and engineers, and scalable parallel algorithms for the linear algebra applications must be devised for these machines. A critical step in these algorithms is the mapping of matrix elements to processors. In this paper, the use of the torus-wrap mapping in general dense matrix algorithms is studied from both theoretical and practical viewpoints. Under reasonable assumptions, it is proved that this assignment scheme leads to dense matrix algorithms that achieve (to within a constant factor) the lower bound on interprocessor communication. It is also shown that the torus-wrap mapping allows algorithms to exhibit less idle time, better load balancing, and less memory overhead than the more common row and column mappings. Finally, practical implementation i...

120 citations


Journal ArticleDOI
TL;DR: In this article, the Cholesky factorization of the least square coefficient matrix without explicitly forming the normal equations is presented. But the method is based on QR factorizations of the original matrices $A$ and $B$.
Abstract: The general problem considered here is the least squares solution of $(A \otimes B)x = t$, where $A$ and $B$ are full rank, rectangular matrices, and $A \otimes B$ is the Kronecker product of $A$ and $B$. Equations of this form arise in areas such as digital image and signal processing, photogrammetry, finite elements, and multidimensional approximation. An efficient method of solution is based on QR factorizations of the original matrices $A$ and $B$. It is demonstrated how these factorizations can be used to obtain the Cholesky factorization of the least squares coefficient matrix without explicitly forming the normal equations. A similar approach based on singular value decomposition (SVD) factorizations also is indicated for the rank-deficient case.

74 citations



Journal ArticleDOI
TL;DR: A new algorithm for accurate downdating of least squares solutions is described and compared to existing algorithms and numerical test results are presented using the sliding window method.
Abstract: Solutions to a sequence of modified least squares problems, where either a new observation is added (updating) or an old observation is deleted (downdating), are required in many applications. Stable algorithms for downdating can be constructed if the complete QR factorization of the data matrix is available. Algorithms that only downdate $R$ and do not store $Q$ require less operations. However, they do not give good accuracy and may not recover accuracy after an ill-conditioned problem has occurred. The authors describe a new algorithm for accurate downdating of least squares solutions and compare it to existing algorithms. Numerical test results are also presented using the sliding window method, where a number of updatings and downdatings occur repeatedly.

53 citations


Journal ArticleDOI
TL;DR: A family of square root and division free algorithms and their relationship with the square root free parametric family is examined and some systolic structures that are described are very promising, since they require less computational complexity than the structures known to date and they make the VLSI implementation easier.
Abstract: The least squares (LS) minimization problem constitutes the core of many real-time signal processing problems, such as adaptive filtering, system identification and adaptive beamforming. Recently efficient implementations of the recursive least squares (RLS) algorithm and the constrained recursive least squares (CRLS) algorithm based on the numerically stable QR decomposition (QRD) have been of great interest. Several papers have proposed modifications to the rotation algorithm that circumvent the square root operations and minimize the number of divisions that are involved in the Givens rotation. It has also been shown that all the known square root free algorithms are instances of one parametric algorithm. Recently, a square root free and division free algorithm has also been proposed. In this paper, we propose a family of square root and division free algorithms and examine its relationship with the square root free parametric family. We choose a specific instance for each one of the two parametric algorithms and make a comparative study of the systolic structures based on these two instances, as well as the standard Givens rotation. We consider the architectures for both the optimal residual computation and the optimal weight vector extraction. The dynamic range of the newly proposed algorithm for QRD-RLS optimal residual computation and the wordlength lower bounds that guarantee no overflow are presented. The numerical stability of the algorithm is also considered. A number of obscure points relevant to the realization of the QRD-RLS and the QRD-CRLS algorithms are clarified. Some systolic structures that are described in this paper are very promising, since they require less computational complexity (in various aspects) than the structures known to date and they make the VLSI implementation easier. >

51 citations


Journal ArticleDOI
TL;DR: A multifrontal method for sparse QR factorization and its implementation in MATLAB is described, and it is shown that the QR-based methods normally are much faster and more accurate than the MATLAB implementation of the augmented system method.
Abstract: In the recently presented sparse matrix extension of MATLAB, there is no routine for sparse QR factorization. Sparse linear least-squares problems are instead solved by the augmented system method. The accuracy in computed solutions is strongly dependent on a scaling parameter d. Its optimal value is expensive to compute, and it must therefore be approximated by a simple heuristic. We describe a multifrontal method for sparse QR factorization and its implementation in MATLAB. It is well known that the multifrontal approach is suitable for vector machines. We show that it is also attractive in MATLAB. In both cases, scalar operations are expensive, and the reformulation of the sparse problem into dense subproblems is advantageous. Using the new routine, we implement two methods for the solution of sparse linear least-squares problems and compare these with the built-in MATLAB function. We show that the QR-based methods normally are much faster and more accurate than the MATLAB implementation of the augmented system method. A better choice of the parameter d or iterative refinement must be used to make the augmented system method as accurate as the methods based on QR factorization.

46 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider the augmented system formulation (ASF) of two standard optimization problems, which include as special cases the minimum 2-norm of a linear underdetermined system (b=0) and the linear least squares problem (c=0), as well as more general problems.
Abstract: We consider solvingx+Ay=b andATx=c for givenb, c andm ×n A of rankn. This is called the augmented system formulation (ASF) of two standard optimization problems, which include as special cases the minimum 2-norm of a linear underdetermined system (b=0) and the linear least squares problem (c=0), as well as more general problems. We examine the numerical stability of methods (for the ASF) based on the QR factorization ofA, whether by Householder transformations, Givens rotations, or the modified Gram-Schmidt (MGS) algorithm, and consider methods which useQ andR, or onlyR. We discuss the meaning of stability of algorithms for the ASF in terms of stability of algorithms for the underlying optimization problems.

39 citations


Journal ArticleDOI
TL;DR: One of the algorithms is a direct extension of the conventional RLS lattice adaptive linear filtering algorithm to the nonlinear case and the other is based on the QR decomposition of the prediction error covariance matrices using orthogonal transformations.
Abstract: This paper presents two computationally efficient recursive least-squares (RLS) lattice algorithms for adaptive nonlinear filtering based on a truncated second-order Volterra system model. The lattice formulation transforms the nonlinear filtering problem into an equivalent multichannel, linear filtering problem and then generalizes the lattice solution to the nonlinear filtering problem. One of the algorithms is a direct extension of the conventional RLS lattice adaptive linear filtering algorithm to the nonlinear case. The other algorithm is based on the QR decomposition of the prediction error covariance matrices using orthogonal transformations. Several experiments demonstrating and comparing the properties of the two algorithms in finite and "infinite" precision environments are included in the paper. The results indicate that both the algorithms retain the fast convergence behavior of the RLS Volterra filters and are numerically stable. >

36 citations


Journal ArticleDOI
01 Apr 1994
TL;DR: A Monte Carlo simulation study has been performed to evaluate the performance of the processing scheme in terms of steady-state improvement factor, transient response, 2D filtering capability, and numerical robustness.
Abstract: A numerically robust, computationally efficient processing scheme for adaptive multipulse radar is described. Its purpose is to detect a target embedded in clutter and directive electromagnetic (EM) interference by processing the echoes received by an array of antennas on-board an aircraft or satellite. The algorithm which performs an efficient QR decomposition of the underlying space-time data matrix, may be implemented in parallel on either a triangular array of processors or using a novel lattice-type structure of smaller triangular array processors. A Monte Carlo simulation study has been performed to evaluate the performance of the processing scheme in terms of steady-state improvement factor, transient response, 2D filtering capability, and numerical robustness.

Proceedings ArticleDOI
15 Jun 1994
TL;DR: This paper discusses in detail fault tolerant version of a matrix multiplication algorithm, and outlines how two other numerical algorithms, QR factorization and Gaussian Elimination may be made fault-tolerant using the approach.
Abstract: Previous algorithm-based methods for developing reliable versions of numerical algorithms have mostly concerned themselves with error detection. A truly fault tolerant algorithm, however, needs to locate errors and recover from them once they are located. In a parallel processing environment, this corresponds to locating the faulty processors and recovering the data corrupted by the faulty processors. In our paper, we discuss in detail fault tolerant version of a matrix multiplication algorithm. The ideas developed in the derivation of the fault-tolerant matrix multiplication algorithms may be used to derive fault-tolerant versions of other numerical algorithms. We outline how two other numerical algorithms, QR factorization and Gaussian Elimination may be made fault-tolerant using our approach. Our fault model assumes that a faulty processor can corrupt all the data it possesses. We present error coverage and overhead results for the single faulty processor case for fault-locating and fault-tolerant versions of three numerical algorithms on an Intel iPSC/2 hypercube multicomputer. >

Journal ArticleDOI
TL;DR: A FORTRAN implementation of a divide-and-conquer method for computing the spectral resolution of a unitary upper Hessenberg matrix H, using the Schur parametrization to compute the spectral decomposition of H without explicitly forming the elements of H.
Abstract: We present a FORTRAN implementation of a divide-and-conquer method for computing the spectral resolution of a unitary upper Hessenberg matrix H. Any such matrix H of order n, normalized so that its subdiagonal elements are nonnegative, can be written as a product of n–1 Givens matrices and a diagonal matrix. This representation, which we refer to as the Schur parametric form of H, arises naturally in applications such as in signal processing and in the computation of Gauss-Szego¨ quadrature rules. Our programs utilize the Schur parametrization to compute the spectral decomposition of H without explicitly forming the elements of H. If only the eigenvalues and first components of the eigenvectors are desired, as in the applications mentioned above, the algorithm requires only O(n2) arithmetic operations. Experimental results presented indicate that the algorithm is reliable and competitive with the general QR algorithm applied to this problem. Moreover, the algorithm can be easily adapted for parallel implementation.

09 Feb 1994
TL;DR: Two inverse free, highly parallel, spectral divide and conquer algorithms for computing an invariant subspace of a nonsymmetric matrix and another one for computing left and right deflating subspaces of a regular matrix pencil A - lambda B are discussed.
Abstract: We discuss two inverse free, highly parallel, spectral divide and conquer algorithms: one for computing an invariant subspace of a nonsymmetric matrix and another one for computing left and right deflating subspaces of a regular matrix pencil A - lambda B. These two closely related algorithms are based on earlier ones of Bulgakov, Godunov and Malyshev, but improve on them in several ways. These algorithms only use easily parallelizable linear algebra building blocks: matrix multiplication and QR decomposition. The existing parallel algorithms for the nonsymmetric eigenproblem use the matrix sign function, which is faster but can be less stable than the new algorithm.

Patent
02 May 1994
TL;DR: In this paper, the squared norm of each member of a training data set with respect to each members of a set of centers is computed and transformed in accordance with a non-linear function to produce training vectors.
Abstract: A heuristic processor incorporates a digital arithmetic unit arranged to compute the squared norm of each member of a training data set with respect to each member of a set of centers, and to transform the squared norms in accordance with a non-linear function to produce training φ vectors. A systolic array arranged for QR decomposition and least means squares processing forms combinations of the elements of each φ vector to provide a fit to corresponding training answers. The form of combination is then employed with like-transformed test data to provide estimates of unknown results. The processor is applicable to provide estimated results for problems which are non-linear and for which explicit mathematical formalisms are unknown.

Journal ArticleDOI
TL;DR: The generalized singular value decomposition is extended to a new decomposition that can be updated at a low cost and a forgetting factor can be incorporated in this decomposition.

Journal ArticleDOI
L. Kaufman1
TL;DR: The implicit QR algorithm as mentioned in this paper is a serial iterative algorithm for determining all the eigenvalues of an n × n symmetric tridiagonal matrix A. In contrast to the original algorithm, which cannot take advantage of the architectures of parallel or vector machines, each iteration of the new algorithm mainly involves synchronous, lock-step operations which can effectively use vector and concurrency capabilities of SIMD machines.

Journal ArticleDOI
TL;DR: In this paper, the perturbation bounds for the derivative of the map that takes an invertible matrix $A$ to the unitary factor $U$ in the polar decomposition $A = UP$ is evaluated.
Abstract: The derivative of the map that takes an invertible matrix $A$ to the unitary factor $U$ in the polar decomposition $A = UP$ is evaluated. The same is done for the map that takes $A$ to the unitary factor $Q$ in the QR decomposition $A = QR$. These results lead to perturbation bounds for these maps. Other applications of the method developed are discussed.

Journal ArticleDOI
TL;DR: In this paper, the structure of generalizations of the singular value decomposition and the QR decomposition for any number of matrices is analyzed as a function of the ranks of the matrices or their products and concatenations.
Abstract: This paper analyzes in detail the structure of generalizations of the singular value decomposition and the QR decomposition for any number of matrices. The structure is completely determined as a function of the ranks of the matrices or their products and concatenations.

Journal ArticleDOI
TL;DR: A new row ordering strategy based on pairing rows to minimize local fill-in is presented and is competitive with the row ordering from nested domain decomposition on a finite element application using nesteddomain decomposition for the column ordering.
Abstract: A new row ordering strategy based on pairing rows to minimize local fill-in is presented. The row ordering can be combined with most column ordering strategies to reduce computation, maintain sparsity, and solve rank deficient problems. Comparison of the new row pairing algorithm with Duff's fixed pivot row ordering on a collection of sparse matrix test problems shows a median 47-71\% reduction, depending on the column ordering, in floating point operations (flops) required for the QR decomposition. On a finite element application using nested domain decomposition for the column ordering, the new row ordering is competitive with the row ordering from nested domain decomposition.

Journal ArticleDOI
TL;DR: In this article, a trust region method for nonlinear optimization problems with equality constraints is proposed, which incorporates quadratic subproblems in which orthogonal projective matrices of the Jacobian of constraint functions are used to replace QR decompositions.

Journal ArticleDOI
01 Feb 1994
TL;DR: Two more challenging examples are presented that illustrate the use of simple diagrammatic transformations to develop novel algorithms and architectures, and demonstrate the potential power of algorithmic engineering as a formal design technique.
Abstract: Algorithmic engineering provides a rigorous framework for describing and manipulating the type of building blocks commonly used to define parallel algorithms and architectures for digital signal processing. So far, the concept has only been illustrated by means of relatively simple examples relating to the use of QR decomposition (QRD) by Givens rotations for the purposes of adaptive filtering and beamforming. Two more challenging examples are presented that illustrate the use of simple diagrammatic transformations to develop novel algorithms and architectures, and demonstrate the potential power of algorithmic engineering as a formal design technique. The first example constitutes the only known derivation of a modular processing architecture for generalised sidelobe cancellation based on QR decomposition. The second provides a simple derivation of the QRD-based lattice algorithm for multichannel least-squares linear prediction.

Proceedings ArticleDOI
J. Malard1, C.C. Paige1
23 May 1994
TL;DR: A general framework for analyzing the scalability of parallel algorithms is presented and both the Householder QR factorization algorithm and the modified Gram-Schmidt algorithm can be written in terms of matrix-matrix operations using the Compact WY representation.
Abstract: Both the Householder QR factorization algorithm and the modified Gram-Schmidt algorithm can be written in terms of matrix-matrix operations using the Compact WY representation. Parallelizations of the resulting algorithms are reviewed and analyzed. For this purpose a general framework for analyzing the scalability of parallel algorithms is presented. >

Patent
Ronald H. Levine1
20 Sep 1994
TL;DR: In this article, a filter synthesizer and a method for generating and outputting a stable filter from frequency response data, including coherence values, is described, and the modal parameters are generated by determining a companion matrix from the first orthogonal polynomial data and the frequency response, and then performing QR decomposition on the companion matrix.
Abstract: A filter synthesizer and method are disclosed for generating and outputting a stable filter from frequency response data, including coherence values. A processing unit has a processor, memory, and stored programs. The processor, operating the modal analysis program, processes the frequency response data, generates orthogonal polynomial data from a cost function of the frequency response data including coherence values, and generates the modal parameters of a transfer function from the orthogonal polynomial data, which represents Forsythe polynomials. The processor generates the orthogonal polynomial data from least squares processing the cost function of the frequency response data, including coherence values. The processor generates the modal parameters, including a pole of the transfer function, and determines an instability condition from the pole. If a pole is unstable, the processor refits the frequency response data to generate a stable transfer function. A least squares optimization procedure is performed on the frequency response data, including the coherence values. A residue is generated corresponding to a pole, and the residue is revised in response to a condition of the pole being unstable. The modal parameters are generated by determining a companion matrix from the first orthogonal polynomial data and the frequency response data; performing QR decomposition on the companion matrix; and generating a pole as a eigenvalue of the companion matrix from the QR decomposition.

Journal ArticleDOI
TL;DR: In this paper, the backward errors for the symmetric eigenvalue decomposition and the singular value decomposition in the two-norm and in the Frobenius norm were studied.
Abstract: We present bounds on the backward errors for the symmetric eigenvalue decomposition and the singular value decomposition in the two-norm and in the Frobenius norm. Through different orthogonal decompositions of the computed eigenvectors we can define different symmetric backward errors for the eigenvalue decomposition. When the computed eigenvectors have a small residual and are close to orthonormal then all backward errors tend to be small. Consequently it does not matter how exactly a backward error is defined and how exactly residual and deviation from orthogonality are measured. Analogous results hold for the singular vectors. We indicate the effect of our error bounds on implementations for eigenvector and singular vector computation. In a more general context we prove that the distance of an appropriately scaled matrix to its orthogonal QR factor is not much larger than its distance to the closest orthogonal matrix.

Proceedings ArticleDOI
01 Jan 1994
TL;DR: The proof of a new systolic array for implementing a recursive least squares (RLS) algorithm based on the QR decomposition (QRD) inverse-updates method, which shows how to transform the algorithm so as to circumvent this problem.
Abstract: We present the proof of a new systolic array for implementing a recursive least squares (RLS) algorithm based on the QR decomposition (QRD) inverse-updates method. In its basic formulation, this algorithm contains a long data feedback path which makes the construction of a systolic architecture very difficult. Here we show how to transform the algorithm so as to circumvent this problem. >

Journal ArticleDOI
Hideaki Sakai1
TL;DR: This paper presents some new algorithms for parallel weight extraction in the recursive least-squares (RLS) estimation based on the modified Gram-Schmidt (MGS) method, which do not contain the square root operation.
Abstract: This paper presents some new algorithms for parallel weight extraction in the recursive least-squares (RLS) estimation based on the modified Gram-Schmidt (MGS) method. These are the counterparts of the algorithms using an inverse QR decomposition based on the Givens rotations and do not contain the square root operation. Systolic-array implementations of the algorithms are considered on a 2-D rhombic array. Simulation results are also presented to compare the finite word-length effect of these new algorithms and existing algorithms. >

Journal ArticleDOI
TL;DR: A new orthogonal decomposition for square dense matrices is proposed, and it is proved that the backward error depends linearly on the size of the matrix.
Abstract: A new orthogonal decomposition for square dense matrices is proposed. For matrices of bigger size the computational time for this decomposition and the corresponding solution method is less than the computational time for the usual Givens QR decomposition The backward stability of the proposed decomposition is studied, and it is proved that the backward error depends linearly on the size of the matrix. Finally, some numerical evidence is given.

Journal ArticleDOI
TL;DR: The authors present a systolic algorithm for computing a rank revealing QR factorization, and consider the performance of the algorithm.
Abstract: The rank revealing QR factorization is a useful tool in many signal processing applications; since it explicitly yields all the necessary information to solve rank deficient least-squares problems and subset selection problems, to compute signal and noise subspaces, etc. The authors present a systolic algorithm for computing a rank revealing QR factorization, and consider the performance of the algorithm. >

Proceedings ArticleDOI
14 Dec 1994
TL;DR: It is demonstrated that the SVD-based approach proposed in this paper can not only obviously improve the convergence rate, numerical stability of RLS, but also provide much more precise identification results and greatly enhance the robustness of the system identification.
Abstract: Based on singular value decomposition (SVD), a new recursive least-squares identification method, which takes in account input excitation, is proposed in this paper. It is demonstrated that the SVD-based approach proposed in this paper can not only obviously improve the convergence rate, numerical stability of RLS, but also provide much more precise identification results and greatly enhance the robustness of the system identification. Moreover, this algorithm is formulated in the form of vector-matrix and matrix-matrix operations, so it is also useful for parallel computers. >