scispace - formally typeset
Search or ask a question

Showing papers on "QR decomposition published in 1988"


Journal ArticleDOI
TL;DR: A unified checksum scheme for the LU decomposition, Gaussian elimination with pairwise pivoting, and the QR decomposition is introduced and it is shown how to represent the error as a rank-one perturbation to the original data, so that the authors need not worry when the error occurred.

145 citations


Book
01 Jan 1988
TL;DR: Part 1 Fundamentals of parallel computation: general principles of parallel computing parallel techniques and algorithms parallel sorting algorithms and future trends in algorithm development.
Abstract: Part 1 Fundamentals of parallel computation: general principles of parallel computing parallel techniques and algorithms parallel sorting algorithms. Part 2 Numerical linear algebra: solution of a system of linear algebraic equations the symmetric eigenvalue problem - Jacobi method QR factorization singular-value decomposition and related problems future trends in algorithm development.

126 citations


Journal ArticleDOI
TL;DR: A direct algorithm is suggested for the computation of the linear state feedback for multi-input systems such that the resultant closed-loop system matrix has specified eigenvalues.

83 citations


MonographDOI
TL;DR: A Subset of Fortran 77 Built-in Functions is presented in this article, along with some programming tips for working with submatrices and Factorizations of Factorizations with Loops and Conditionals.
Abstract: 1. A Subset of Fortran 77. Basics, Logical Operations, Loops, Arrays, Subprograms, Input/Output, Complex Arithmetic, Programming Tips, Fortran 77 Built-in Functions 2. The BLAS. Bookkeeping Operations, Vector Operations, Norm Computations, Givens Rotations, Double Precision and Complex Versions 3.LINPACK. Triangular Systems, General Systems, Symmetric Systems, Banded Systems, The QR Factorization, The Singular Value Decomposition, Double Precision and Complex Versions 4. MATLAB. Basics, Loops and Conditionals, Built-in Functions, Working with Submatrices, Functions, Factorizations, Miscellaneous.

60 citations


Journal ArticleDOI
TL;DR: The authors show how to avoid the rollback by using matrix updating techniques, and they introduce new checksum methods for Gaussian elimination with pairwise pivoting and for QR decomposition on systolic arrays.
Abstract: Examines the checksum methods of Abraham et al. for LU decomposition on multiprocessor arrays. Their methods are efficient for detecting a transient error, but expensive for correcting it due to the need for a computation rollback. The authors show how to avoid the rollback by using matrix updating techniques, and they introduce new checksum methods for Gaussian elimination with pairwise pivoting and for QR decomposition on systolic arrays. >

51 citations


Journal ArticleDOI
TL;DR: The properties of block reflectors are developed and some algorithms for computing a block reflector that introduces a block of zeros into a matrix are given.
Abstract: A block reflector is an orthogonal, symmetric matrix that reverses a subspace whose dimension may be greater than one. We shall develop the properties of block reflectors and give some algorithms for computing a block reflector that introduces a block of zeros into a matrix. We consider the compact representation of block reflectors, some applications, and their use in parallel computers.

50 citations


Journal ArticleDOI
TL;DR: Experimental results are provided to compare the sparse Gaussian elimination using the new storage scheme with that proposed by George and Ng.
Abstract: For a general m by n sparse matrix A, a new scheme is proposed for the structural representation of the factors of its sparse orthogonal decomposition by Householder transformations. The storage scheme is row-oriented and is based on the structure of the upper triangular factor obtained in the decomposition. The storage of the orthogonal matrix factor is particularly efficient in that the overhead required is only $m + n$ items, independent of the actual number of nonzeros in the factor. The same scheme is applicable to sparse orthogonal factorization by Givens rotations, and also to the recent implementation of sparse Gaussian elimination with partial pivoting developed by George and Ng (this Journal, 1987, to appear). Experimental results are provided to compare the sparse Gaussian elimination using the new storage scheme with that proposed by George and Ng.

34 citations


Journal ArticleDOI
TL;DR: Several techniques are evaluated for solving the linear ordinary differential equations arising from compartment models and the computational efficiencies of these techniques are compared for several models arising from radiopharmacokinetic studies.

30 citations


Patent
13 Oct 1988
TL;DR: In this paper, a processor for constrained least squares computations is provided, which incorporates a systolic array of boundary, internal, constraint and multiplier cells arranged as triangular and rectangular sub-arrays.
Abstract: A processor is provided which is suitable for constrained least squares computations. It incorporates a systolic array of boundary, internal, constraint and multiplier cells arranged as triangular and rectangular sub-arrays. The triangular sub-array contains boundary cells along its major diagonal and connected via delays, together with above-diagonal internal cells. It computes and updates a QR decomposition of a data matrix X incorporating successive data vectors having individual signals as elements. The rectangular sub-array incorporates constraint cell columns each implementing a respective constraint vector and terminating at a respective multiplier cell. The constraint cells store respective conjugate constraint factors obtained by constraint vector transformation in the triangular sub-array. Rotation parameters produced by QR decomposition in the triangular sub-array are employed by the rectangular sub-array to rotate a zero input and update stored constraint factors. Cumulative rotation of this input and summation of squared moduli of constraint factors are carried out in cascade down constraint columns. The boundary cells are arranged for cumulative multiplication of cosine rotation parameters. Multiplier cells multiply cumulatively multiplied cosine parameters by cumulatively rotated constraint column inputs and divide by summed squared moduli of constraint factors to provide residuals. Each constraint column produces respective residuals in succession corresponding to weighting of the data matrix X to produce minimum signals subject to a respective constraint governing the form of weighting, as required to compute minimum variance distortionless response (MVDR) results.

28 citations


Journal ArticleDOI
S. Qiao1
TL;DR: In this article, the shift invariance property of a Toeplitz matrix has been used to decompose a QR decomposition into a fast TOEPLITZ matrix.
Abstract: New techniques for fast Toeplitz QR decomposition are presented The methods are based on the shift invariance property of a Toeplitz matrix The numerical properties of the algorithms are discussed and some comparisons are made with two other fast Toeplitz orthogonalization methods

21 citations


Proceedings ArticleDOI
11 Apr 1988
TL;DR: A rectangular systolic array of processing units is presented for implementation of the QR adaptive filter, which improves on widely used gradient methods for adaptive filtering, which must insert increasing amounts of performance-degrading delay into the adaptive updating when either the speed of implementation or number of taps increase.
Abstract: A rectangular systolic array of processing units is presented for implementation of the QR adaptive filter. This array requires approximately 8N processing units to exactly solve the least-squares adaptive filtering problem using QR factorization. If a processing unit takes 50 ns to perform a task, the array can be implemented at an adaptive-filter input sampling rate of 20 MHz, with no loss in characteristic high performance (of least squares), numerical stability, or accuracy. This improves on widely used gradient methods for adaptive filtering, which must insert increasing amounts of performance-degrading delay into the adaptive updating when either the speed of implementation or number of taps increase. A discussion of the structure and interconnection of the processing units is included, as well as computer simulations that verify the stability and performance of the adaptive processing array. >

Proceedings ArticleDOI
23 Feb 1988
TL;DR: A new projection-based algorithm for estimating the angles of arrival of plane waves incident onto arrays of sensors based on a single QR decomposition of the signal covariance matrix, which is much faster than eigen-based methods which require many QR decompositions.
Abstract: We propose a new projection-based algorithm for estimating the angles of arrival of plane waves incident onto arrays of sensors. The method is based on a single QR decomposition of the signal covariance matrix; hence, it is much faster than eigen-based methods which require many QR decompositions. It is shown that optimum performance is attained only if the columns of the covariance matrix are permuted in a prescribed manner before the QR decomposition proceeds. An adjunct to the angle of arrival estimation process is a new eigenvalue-free technique for estimating the number of incident signals. There is no performance penalty associated with either of these new methods. The real-time performance of this technique is enhanced through the use of systolic arrays. A novel systolic array structure is proposed for extracting both the Q and R matrices generated by the QR decomposition.

Book ChapterDOI
01 Jun 1988
TL;DR: Algorithms for solving nonlinear least-squares problems on a message-passing multiprocessor, including an efficient parallel algorithm for determining the Levenberg-Marquardt parameter and a new row-oriented QR factorization algorithm are described.
Abstract: In this paper we describe algorithms for solving nonlinear least-squares problems on a message-passing multiprocessor. We demonstrate new parallel algorithms, including an efficient parallel algorithm for determining the Levenberg-Marquardt parameter and a new row-oriented QR factorization algorithm. Experimental results obtained on an Intel iPSC hypercube are presented and compared with sequential MINPACK code executed on a single processor. These experimental results show that essentially full efficiency is obtained for problems where the row size is sufficiently larger than the number of processors. These algorithms have the advantage of involving only simple data movements and consequently are not constrained to the hypercube architecture.

Proceedings ArticleDOI
25 May 1988
TL;DR: A one-dimensional systolic array for solving arbitrarily large least-mean-square problems involving QR decomposition and a triangular system of equations, which can also be used in problems such as matrix-by-vector, matrix- by-matrix, and LU decomposition.
Abstract: The design is presented of a one-dimensional systolic array for solving arbitrarily large least-mean-square problems involving QR decomposition and a triangular system of equations. The main characteristics of this array are maximization of array utilization, thus achieving a minimum global computation time, and low complexity of the resulting array, which can also be used in problems such as matrix-by-vector, matrix-by-matrix, and LU decomposition. Two systolic algorithms for QR decomposition have been designed. Their chained execution is shown. >

Proceedings ArticleDOI
01 Nov 1988
TL;DR: A parallel version of the Householder algorithm with column pivoting is introduced for computing the QR factorization of a matrix with advantages of incorporating the controlled pivoting strategy into the traditional QR algorithm to guard against the known pathological cases.
Abstract: This paper presents a new parallel version of the Householder algorithm with column pivoting for computing the QR factorization of a matrix. In contrast to the standard algorithm we employ a local pivoting scheme that allows for efficient implementation of the algorithm on a parallel machine, in particular one with a distributed architecture. An inexpensive but reliable incremental condition estimator is used to control the selection of pivot columns by obtaining cheap estimates for the smallest singular value of the currently created upper triangular matrix R. Numerical experiments show that the local pivoting strategy behaves about as well as the traditional global pivoting strategy. They also show the advantages of incorporating the controlled pivoting strategy into the traditional QR algorithm to guard against the known pathological cases.

Journal ArticleDOI
TL;DR: A new algorithm is presented for the stable computation of sample partial correlation coefficients for symmetric positive-definite matrices B by generalizing Bareiss' algorithm for the solution of linear systems of equations with (non-symmetric) Toeplitz coefficient matrix to matrices that are not ToEplitz, and showing that it computes their LU and UL factorizations.

Journal ArticleDOI
TL;DR: An algorithm intended for software implementation on a programmable systolic/wavefront computer is presented for the computation of a complex-valued frequency-response matrix G, an orthogonal version of an algorithm described previously by A.J. Laub.
Abstract: An algorithm intended for software implementation on a programmable systolic/wavefront computer is presented for the computation of a complex-valued frequency-response matrix G. Typically, real-valued state-space model matrices are given and the calculation of G must be performed for a very large number of values of the scalar frequency parameter. The algorithm is an orthogonal version of an algorithm described previously by A.J. Laub (ibid., vol.26, no.4, p.407-8, 1981). The system matrix A is reduced initially to an upper Hessenberg form which is preserved as the frequency varies subsequently. A systolic QR factorization of a certain complex-valued matrix is then implemented for effecting the necessary linear system solution (inversion). The critical computational component is the back solve. This computational component's process dependency graph is embedded optimally in space and time through the use of a nonlinear spacetime transformation. The computational period of the algorithm is O(n) where n is the order of the matrix A. >

Journal ArticleDOI
01 Sep 1988
TL;DR: A basic factorization scheme is introduced which can be readily tailored to the LU, Cholesky and QR factorization provided that the corresponding matrices are distributed according to the column-oriented wrap mapping.
Abstract: This paper is concerned with principal considerations for developing a linear algebra package for the SUPRENUM computer The design goals, as well as the mapping strategy of the parallelization methodology, are described briefly Finally, a basic factorization scheme is introduced which can be readily tailored to the LU, Cholesky and QR factorization provided that the corresponding matrices are distributed according to the column-oriented wrap mapping

DOI
01 Oct 1988
TL;DR: It has been determined that all the AOA estimators considered behave well when applied to the low-angle tracking problem, and the performance of this QR technique is virtually the same as that of MUSIC, yet does not require time-consuming eigendecompositions.
Abstract: This paper examines a new, fast, highperformance angle-of-arrival (AOA) estimator called the QR-based spectrum estimation algorithm, which we apply to the low-angle tracking problem in radar. We compare the performance of this algorithm to the more established modern estimators of its kind; namely, the MUSIC and modified FBLP algorithms. This comparison is based on real data collected from an experimental low-angle tracking radar simulator which was operated over water. All methods considered are adapted, using the spatial smoothing technique, so that accurate results may be obtained in the presence of correlated multipath, or from one single snapshot. It has been determined that the performance of this QR technique is virtually the same as that of MUSIC, yet does not require time-consuming eigendecompositions. Instead, the method is based on the much faster QR decomposition, which may be implemented readily using systolic arrays. Speedup factors on the order of 10 times the dimension of the covariance matrix are expected with the QR method. It has been determined that all the AOA estimators considered behave well when applied to the low-angle tracking problem.

01 Aug 1988
TL;DR: The techniques of adaptive blocking and incremental condition estimation are presented which are useful for the computation of common matrix decompositions in high-performance environments and methods for introducing pivoting into the distributed QR factorization algorithm are developed.
Abstract: We present the techniques of adaptive blocking and incremental condition estimation which we believe to be useful for the computation of common matrix decompositions in high-performance environments. We apply these new techniques to algorithms for computing the Householder QR factorization with and without pivoting on a coarse-grained distributed system. For reasons of portability, we use a pipelined scheme on a ring of processors as the basis of our algorithms. To take advantage of possible floating point hardware on each node we develop a blocked version of the pipelined Householder QR algorithm that employs the compact WY representation for products of Householder matrices. While a strategy involving blocks of fixed width leads to increased floating point utilization per node, it also leads to increased load imbalance. To reconcile this tradeoff we introduce a variable width block strategy based on a model of the critical path of the algorithm. The resulting adaptive blocking strategy provides for good floating point performance per node while maintaining overall load balance. Experimental results on the Intel iPSC hypercube show that the adaptive blocking strategy performs indeed better than any fixed width blocking strategy. In the second part of our thesis we develop methods for introducing pivoting into the distributed QR factorization algorithm. Incorporating the traditional column pivoting strategy in a straightforward manner introduces a global synchronization constraint which results in increased communication overhead. A strictly local pivoting scheme avoids the resulting loss in efficiency, but has to be monitored for reliability. To this end, we introduce an incremental condition estimator which allows us to update the estimate of the smallest singular value of an upper triangular matrix $R$ as new columns are added to $R$. The update requires only $O(n)$ flops an the storage of $O(n)$ words between successive steps. Experiments indicate that the incremental condition estimator is reliable despite its small computational cost. Using the incremental condition estimator we are then able to guard against the selection of troublesome pivot columns in our local pivoting scheme at little extra cost. Simulation results show that the resulting algorithm is about as reliable as the traditional QR factorization algorithm with column pivoting.

Proceedings ArticleDOI
11 Apr 1988
TL;DR: An algorithm and a set of array architectures that implement and multichannel adaptive least squares lattice filter are presented that provides both a numerically sound and regularly structured set of recursions.
Abstract: An algorithm and a set of array architectures that implement and multichannel adaptive least squares lattice filter are presented. The algorithm is based on QR decomposition and provides both a numerically sound and regularly structured set of recursions. For m channels and p filter taps, O(pm/sup 2/) computations are required at each sample update. Several array architectures are presented to illustrate the space-time tradeoffs available. These range from an array of p processing elements (PEs) that compute a sample update in O(m/sup 2/) time to an array of O(pm/sup 2/) PEs that computes the update in constant time. The in-between arrays of O(m), O(m/sup 2/), or O(pm) PEs are also outlined. >

10 Nov 1988
TL;DR: This paper aims to demonstrate the efforts of the US Navy’s Postgraduate School and the National Science Foundation to provide real-time information about the response of the immune system to the presence of infectious disease.
Abstract: Prepared for: Naval Postgraduate School and the National Science Foundation, Washington D.C.

Journal ArticleDOI
TL;DR: In this paper, the problem of factorization QR for lineaires on un multiprocesseur with a configuration hypercubique is investigated. André et al. propose a factorisation QR for les problemes des moindres carres lineaires.
Abstract: On etudie l'implantation d'une factorisation QR pour les problemes des moindres carres lineaires sur un multiprocesseur a configuration hypercubique

Proceedings ArticleDOI
25 May 1988
TL;DR: The first step in the development of a chip set to support eigenvalue-eigenvector-based estimation algorithms is presented, based on the assumption that an averaging technique will produce a symmetric covariance matrix.
Abstract: The first step in the development of a chip set to support eigenvalue-eigenvector-based estimation algorithms is presented. It is based on the assumption that an averaging technique will produce a symmetric covariance matrix. Such a matrix can be reduced to a symmetric tridiagonal matrix, and hence the eigenvalues and eigenvectors can be found by successive iterations involving QR decomposition. The architecture is unique in that other architectures either solve only for the eigenvalues or use methods other than QR iteration. It has potential for use in a systolic computer for computer intensive digital signal processing based on modern spectral-analysis techniques. >

Journal ArticleDOI
TL;DR: The pipelined version of the new algorithm leads to a systolic implementation whose area-time performances overcome those of the arrays of Bojanczyk, Brent and Kung and Gentleman and Kung.
Abstract: Given an m by n dense matrix A(m≧n) we consider parallel algorithms to compute its orthogonal factorization via Givens rotations. First we describe an algorithm which is executed in m+n— 2 steps on a linear array of [m/2] processors, a step being the time necessary to achieve a Givens rotation. The pipelined version of the new algorithm leads to a systolic implementation whose area-time performances overcome those of the arrays of Bojanczyk, Brent and Kung [1] and Gentleman and Kung [5].

Proceedings ArticleDOI
25 May 1988
TL;DR: The use of square-root-free linear systolic array structure to perform the QR decomposition needed in the solution of least-squares (LS) problems is proposed.
Abstract: The use of square-root-free linear systolic array structure to perform the QR decomposition needed in the solution of least-squares (LS) problems is proposed. A form of the Kalman filter algorithm is applied to perform the recursive LS estimation. Compared with the conventional triangular systolic array structure for LS estimation, the linear array has the advantage of requiring less area and being simpler for VLSI implementation. >

Proceedings ArticleDOI
21 Jan 1988
TL;DR: By using the matrix decomposition method, the Kalman filter can be formulated as a modified SRIF data processing problem followed by a QR operation, which simplifies the computational structure, and is more reliable when the system has a singular(or near singular) coefficient matrix.
Abstract: Systolic Kalman filtering based on QR decompositionM J Chen and K YaoElectrical Engineering Department, University of CaliforniaLos Angeles, CA 90024 -1600ABSTRACTIn this paper, by using the matrix decomposition method, the Kalman filter can be formu-lated as a modified SRIF data processing problem followed by a QR operation Compared withthe conventional SRIF method, this approach simplifies the computational structure, and ismore reliable when the system has a singular(or near singular) coefficient matrix By skew-ing the order of input matrices, fully pipelined systolic Kalman filtering operation can beachieved With the number of processing units of the O(n2), the system throughput rate isof the O(n) The numerical properties of the systolic Kalman filtering algorithm underfinite word length effect are studied via analysis and computer simulations, and are com-pared with those of conventional approaches1 INTRODUCTIONIn many practical communication and control problems, we have to deal with time -varyingsignals as well as signals that are modeled by random pfocesses, as in radio transmission,aircraft tracking, radar signal processing, etc Kalman proposed the well known Kalmanfiltering algorithm based on the linear minimum variance criterion and this technique hasbeen successfully applied to many time -varying signal estimation problems These problemscan be modeled by a discrete first order recursive dynamic system with additive coloredsystem and observation noises as given byx(k+1) = F(k+1)x(k) + w(k+l),

Proceedings ArticleDOI
01 Jan 1988
TL;DR: This paper demonstrates how to formulate the re- cursive solution to the mixed case as a least squares problem, which leads to algorithms based on recursive QR decomposition implemented by either Givens or fast GIVens rotations.
Abstract: Most adaptive filters require a desired signal for operation. However, in many applications the a priori knowledge consists of the signal-to-data cross-correlation vector rather than a desired signal. Recursive sample matrix inversion (SMI) algorithms exist for this "mixed' case. These SMI algorithms, which are based on the inversion of a data cor- relation matrix, have both numerical and structural short- comings. This paper demonstrates how to formulate the re- cursive solution to the mixed case as a least squares problem. This formulation leads to algorithms based on recursive QR decomposition implemented by either Givens or fast Givens rotations. Compared to the recursive SMI approach, these QR-based algorithms are more efficient, have better numerical properties, and exhibit greater structural regularity. Because of their structural regularity, the algorithms are easily implemented by either a triangular or linear systolic array.

01 Jan 1988
TL;DR: The application of these algorithms is demonstrated by their employment in the enhancement of single-trial auditory evoked responses in magnetoencephalography by offering an improved method of a posteriori estimation of these signals.
Abstract: Least squares techniques are widely used in adaptive signal processing. While algorithms based on least squares are robust and offer rapid convergence properties, they also tend to be complex and computationally intensive. To enable the use of least squares techniques in real-time applications, it is necessary to develop adaptive algorithms that are (1) efficient and numerically stable, and (2) can be readily implemented in hardware. The first part of this work presents a uniform development of general recursive least squares (RLS) algorithms, and multichannel least squares lattice (LSL) algorithms. RLS algorithms are developed for both direct estimators, in which a desired signal is present, and for mixed estimators, in which no desired signal is available, but the signal-to-data cross-correlation is known. In both the RLS and LSL cases, two types of algorithms are developed and compared. Algorithms of the first type are based on a traditional data correlation matrix approach, while those of the second type are based on the more numerically stable QR decomposition of the data matrix. In the second part of this work, new and more flexible techniques of mapping algorithms to array architectures are presented. These techinques, based on the synthesis and manipulation of locally recursive algorithms (LRAs), have evolved from existing data dependence graph-based approaches, but offer the increased flexibility needed to deal with the structural complexities of the RLS and LSL algorithms. Using these techniques, various array architectures are developed for each of the RLS and LSL algorithms and the associated space/time tradeoffs presented. In the final part of this work, the application of these algorithms is demonstrated by their employment in the enhancement of single-trial auditory evoked responses in magnetoencephalography. In addition to demonstrating the algorithms, this application offers an improved method of a posteriori estimation of these signals. (Copies available exclusively from Micrographics Department, Doheny Library, USC, Los Angeles, CA 90089-0182.)

Proceedings ArticleDOI
20 Apr 1988
TL;DR: An algorithm intended for software implementation on a programmable systolic/wavefront computer system is presented for the computation of a complex-valued frequency response matrix G (j 0) = C (j w- A)-1B .
Abstract: A systolic organization is presented for the computation of a complex-valued frequency response matrix G (j 0)) = C (j w- A)-1B . By 'sys-tolic organization,' we mean an algorithm intended for software implementation on a programmable systolic/wavefront computer system. Typically, the real-valued state space model matrices A , B , and C are given and the calculation of G must be performed for a very large number of values of the scalar "frequency" parameter co. This, and closely related calculations, arise naturally in the analysis and design of control systems. The algorithm which has been chosen for systolic implementation is an orthogonal version of an algorithm appearing pre-viously in the literature. The matrix A is reduced initially to an upper Hessenberg form and this form is preserved as w varies subsequently in the matrix j wl - A . A systolic QR factorization of this latter matrix [(jl - A) = QT R} is then implemented for effecting the linear system solution (inversion). The critical computational component is CR-1. This computational component's process dependency graph is embedded optimally in space and time through the use of a nonlinear spacetime transformation. The computational period of the algorithm is 0 (n) where n is the order of the matrix A .