scispace - formally typeset
Search or ask a question

Showing papers on "QR decomposition published in 1990"


Journal ArticleDOI
TL;DR: A functional-level concurrent error-detection scheme is presented for such VLSI signal processing architectures as those proposed for the FFT and QR factorization, and it is shown that the error coverage is high with large word sizes.
Abstract: The increasing demands for high-performance signal processing along with the availability of inexpensive high-performance processors have results in numerous proposals for special-purpose array processors for signal processing applications. A functional-level concurrent error-detection scheme is presented for such VLSI signal processing architectures as those proposed for the FFT and QR factorization. Some basic properties involved in such computations are used to check the correctness of the computed output values. This fault-detection scheme is shown to be applicable to a class of problems rather than a particular problem, unlike the earlier algorithm-based error-detection techniques. The effects of roundoff/truncation errors due to finite-precision arithmetic are evaluated. It is shown that the error coverage is high with large word sizes. >

179 citations


Journal ArticleDOI
TL;DR: In this paper, the smallest singular value of a dense triangular matrix is estimated one row or column at a time using a simple rational function, which can be interpreted as trying to approximate the secular equation with a simpler rational function.
Abstract: This paper introduces a new technique for estimating the smallest singular value, and hence the condition number, of a dense triangular matrix as it is generated one row or column at a time. It is also shown how this condition estimator can be interpreted as trying to approximate the secular equation with a simpler rational function. While one can construct examples where this estimator fails, numerical experiments demonstrate that despite its small computational cost, it produces reliable estimates. Also given is an example that shows the advantage of incorporating the incremental condition estimation strategy into the QR factorization algorithm with column pivoting to guard against near rank deficiency going unnoticed.

93 citations


Proceedings ArticleDOI
01 Jan 1990
TL;DR: In this article, the singular value decomposition (SVD) is explored as the common structure in the three basic matrix pencil algorithms: direct matrix pencil algorithm, pro-ESPRIT, and TLS-ES PRIT.
Abstract: Several algorithms for estimating generalized eigenvalues (GEs) of singular matrix pencils perturbed by noise are reviewed. The singular value decomposition (SVD) is explored as the common structure in the three basic algorithms: direct matrix pencil algorithm, pro-ESPRIT, and TLS-ESPRIT. It is shown that several SVD-based steps inherent in the algorithms are equivalent to the first-order approximation. In particular, the Pro-ESPRIT and its variant TLS-Pro-ESPRIT are shown to be equivalent, and the TLS-ESPRIT and its earlier version LS-ESPRIT are shown to be asymptotically equivalent to the first-order approximation. For the problem of estimating superimposed complex exponential signals, the state-space algorithm is shown to be also equivalent to the previous matrix pencil algorithms to the first-order approximation. The second-order perturbation and the threshold phenomenon are illustrated by simulation results based on a damped sinusoidal signal. An improved state-space algorithm is found to be the most robust to noise. >

83 citations


Journal ArticleDOI
01 Nov 1990
TL;DR: Proposed parallel algorithms for the modified Gram-Schmidt and the Householder algorithms on message passing systems in which the matrix is distributed by blocks or rows are studied.
Abstract: In this paper, the parallel implementation of two algorithms for forming a QR factorization of a matrix is studied. We propose parallel algorithms for the modified Gram-Schmidt and the Householder algorithms on message passing systems in which the matrix is distributed by blocks or rows. The models that predict performance of the algorithms are validated by experimental results on several parallel machines.

53 citations


Journal ArticleDOI
TL;DR: A pair of multichannel least-squares lattice filter algorithms is presented, each m-channel filter stage is numerically stable and computationally efficient, with a computational complexity of O(m/sup 2/).
Abstract: A pair of multichannel least-squares lattice filter algorithms is presented. Each m-channel filter stage is numerically stable and computationally efficient, with a computational complexity of O(m/sup 2/). Both algorithms are based on the recursive QR decomposition of the forward and backward error matrices in each filter stage. The first algorithm uses orthogonal Givens rotations to compute the QR decomposition. The second algorithm uses fast Givens rotations for greater efficiency. Simulation results are presented, as well as an example of the algorithms' application in the enhancement of magnetoencephalographic signals. >

44 citations


Journal ArticleDOI
TL;DR: This work presents FORTRAN subroutines that update the QR decomposition in a numerically stable manner when A is modified by a matrix of rank one, or when a row or a column is inserted or deleted.
Abstract: Let the matrix A E R”““, m 2 n, have a QR decomposition A = QR, where Q E R”“” has orthonormal columns, and R E R”“” is upper triangular. Assume that Q and R are explicitly known. We present FORTRAN subroutines that update the QR decomposition in a numerically stable manner when A is modified by a matrix of rank one, or when a row or a column is inserted or deleted. These subroutines are modifications of the Algol procedures in Daniel et al. [5]. We also present a subroutine that permutes the columns of A and updates the QR decomposition so that the elements in the lower right corner of R will generally be small if the columns of A are nearly linearly dependent. This subroutine is an implementation of the rank-revealing QR decomposition scheme recently proposed by Chan [3].

40 citations


Journal ArticleDOI
TL;DR: In this article, a new algorithm for computing the QR factorization of a rectangular matrix on a hypercube multiprocessor is described, where the hypercube network is configured as a two-dimensional subcube-grid in the proposed scheme.
Abstract: In this article a new algorithm for computing the QR factorization of a rectangular matrix on a hypercube multiprocessor is described. The hypercube network is configured as a two-dimensional subcube-grid in the proposed scheme. A global communication scheme that uses redundant computation to maintain data proximity is employed, and the mapping strategy is such that for a fixed number of processors the processor idle time is small and either constant or grows linearly with the dimension of the matrix. A complexity analysis shows what the aspect ratio of the configured grid should be in terms of the shape of the matrix and the relative speeds of communication and computation. Numerical experiments performed on an Intel Hypercube multiprocessor support the theoretical results.

39 citations


Journal ArticleDOI
TL;DR: An in-depth study of the various issues and tradeoffs available in algorithm-based error detection, as well as a general methodology for evaluating the schemes, shows that, in general, the SOS approach gives much better coverage for QR factorization while maintaining low overheads.
Abstract: The authors provide an in-depth study of the various issues and tradeoffs available in algorithm-based error detection, as well as a general methodology for evaluating the schemes. They illustrate the approach on an extremely useful computation in the field of numerical linear algebra: QR factorization. They have implemented and investigated numerous ways of applying algorithm-based error detection using different system-level encoding strategies for QR factorization. Specifically, schemes based on the checksum and sum-of-squares (SOS) encoding techniques have been developed. The results of studies performed on a 16-processor Intel iPSC-2/D4/MX hypercube multiprocessor are reported. It is shown that, in general, the SOS approach gives much better coverage (85-100%) for QR factorization while maintaining low overheads (below 10%). >

27 citations


01 Jan 1990
TL;DR: Efficient parallel code can be generated and it has been shown that the performance benefits of the systolic communication programming model can be achieved from a high-level algorithm description.
Abstract: Many applications with challenging performance requirements and a high potential for parallelization can be found in the area of numerical computing. These applications, such as matrix computations, signal processing algorithms, and finite-difference methods, consist mainly of computationally intensive nested loops. While many problems in these fields have been successfully implemented on a wide variety of parallel computers, the process of programming these machines still remains tedious and ad hoc. The parallel computation model based on regularly interconnected arrays of processors with high-bandwidth neighbor-to-neighbor communication (systolic arrays) has been shown to be a highly efficient means of parallelization for this application domain. However, this efficiency is accompanied by an increased difficulty in programming arrays of processors. Among the main factors that make the programming of these machines difficult and error prone are: managing multiple threads of control in disjoint address spaces, explicitly coordinating communication between processors, and tailoring the implementation of an algorithm to machine dependent characteristics such as the number of processors and communication channels. The main contribution of this work is the reduction of the difficulty in programming systolic array computers by automatically generating programs from high-level language nested loops for a given programmable systolic array of processors. The techniques used for generating code are based on linear transformations of the same kind used for automatic synthesis of special purpose systolic arrays from algorithms with uniform dependencies. The work has been evaluated and validated on the Warp, a 10-cell programmable linear systolic array. Each Warp cell is a VLIW 10 MFLOPS processor with two communication channels to each of its neighbors. Code has been automatically generated and performance measurements have been taken for benchmark programs such as matrix multiplication, LU decomposition, QR decomposition, shortest path, transitive closure, and the Livermore loops. With the ideas and techniques introduced in this thesis, efficient parallel code can be generated and it has been shown that the performance benefits of the systolic communication programming model can be achieved from a high-level algorithm description. For instance, for the matrix computations, speedups from 7.4 to 7.9 have been obtained on 8 cells for matrices of size 120 x 120. Also, very good performance has been achieved for most of the benchmark programs, e.g., on 8 cells: 64 MFLOPS for matrix multiplication, 21 MFLOPS for LU decomposition, and 37 MFLOPS for QR decomposition.

26 citations


Proceedings ArticleDOI
03 Apr 1990
TL;DR: A fast Householder filter (FHF) QR-RLS algorithm is presented that requires significantly less computation than previous fast QR- RLS adaptive algorithms and replaces the Givens rotations used in these fast QR algorithms by Householder transformations.
Abstract: A fast Householder filter (FHF) QR-RLS algorithm is presented that requires significantly less (by a factor of at least three) computation than previous fast QR-RLS adaptive algorithms. The essential feature of the new method is that it replaces the Givens rotations used in these fast QR algorithms by Householder transformations. A set of filters that characterize the QR factorization of a data matrix is derived, and time updates on this set are determined using a generic Householder updating identity. The FHF requires 7N computations per iteration for the standard prewindowed case, which is the same as the FTF (fast transversal filter) and FAEST fast (non-QR) RLS. >

25 citations


Proceedings ArticleDOI
03 Apr 1990
TL;DR: An adaptive algorithm is presented for covariance matrix eigenstructure computation based on the updated computation of the SVD (singular value decomposition) of a data matrix formed with the received data vectors appended as columns, avoiding the need to double the dynamic range necessary for a given numerical accuracy.
Abstract: An adaptive algorithm is presented for covariance matrix eigenstructure computation based on the updated computation of the SVD (singular value decomposition) of a data matrix formed with the received data vectors appended as columns. Simulation results show that the algorithm is successful in tracking the eigenstructure of a time-varying covariance matrix in a nonstationary environment. The advantage of the algorithm is that it uses the data vectors X/sub i/ at each iteration to update the eigenstructure instead of a rank one matrix update, thus avoiding the need to double the dynamic range necessary for a given numerical accuracy. The computations for the algorithm are easily mapped on existing systolic arrays with some modifications. >

Proceedings ArticleDOI
05 Dec 1990
TL;DR: Three structured networks and their corresponding training algorithms are proposed for matrix QR factorization eigenvalue and eigenvector determination, and Lyapunov equation solving.
Abstract: Three structured networks and their corresponding training algorithms are proposed for matrix QR factorization eigenvalue and eigenvector determination, and Lyapunov equation solving. The basic procedure behind these approaches is as follows: represent a given problem by a structured network, train this structured network to match some desired patterns, and obtain the solution to the problem from the weights of the resulting structured network. A general-purpose programmable network architecture is proposed which can be programmed to solve different problems. Simulation results showed that the proposed approaches worked quite well. >

Book ChapterDOI
01 Sep 1990
TL;DR: This paper presents an “adaptive blocking” methodology for determining in a systematic manner an optimal blocking strategy for a uniprocessor machine and shows that the resulting blocking strategy is as good as any fixed-width blocking strategy.
Abstract: On most high-performance architectures, data movement is slow compared to floating-point (in particular, vector) performance. On these architectures block algorithms have been successful for matrix computations. By considering a matrix as a collection of submatrices (the so-called blocks) one naturally arrives at algorithms that require little data movement. The optimal blocking strategy, however, depends on the computing environment and on the problem parameters. Current approaches use fixed-width blocking strategies that are not optimal. This paper presents an “adaptive blocking” methodology for determining in a systematic manner an optimal blocking strategy for a uniprocessor machine. We demonstrate this technique on a block QR factorization routine on a uniprocessor. After generating timing models for the high-level kernels of the algorithm we can formulate the optimal blocking strategy in a recurrence relation that we can solve inexpensively with a dynamic programming technique. Experiments on one processor of a CRAY-2 show that in fact the resulting blocking strategy is as good as any fixed-width blocking strategy. So while we do not know the optimum fixed-width blocking strategy unless we re-run the same problem several times, adaptive blocking provides optimum performance in the very first run.

Proceedings ArticleDOI
01 May 1990
TL;DR: The QR-decomposition (QSD)-based least-squares lattice algorithm and its architecture are described and it is confirmed that a square-root-free form of the algorithm is empirically better than the standard form.
Abstract: The QR-decomposition (QSD)-based least-squares lattice algorithm and its architecture are described. This algorithm can be used to solve least-squares minimization problems that involve time-series data. The results of some computer simulation experiments on an adaptive channel equalizer using the QRD-based lattice algorithm are presented. These simulations were performed using limited-precision floating-point arithmetic. The results show that very little penalty is paid in reducing the computational load. The QRD-based lattice algorithm works essentially as well as the QRD-based triangular systolic array but requires only O(p/sup 2/N) operations per time instant as compared with O(p/sup 2/N/sup 2/) for the array. The results also confirm that a square-root-free form of the algorithm is empirically better than the standard form. >

Journal ArticleDOI
TL;DR: It is shown that highly efficient parallel variants of the QR decomposition and modified Gram-Schmidt procedure can be derived by using a graph theoretic description of system connectivity.
Abstract: It is shown that highly efficient parallel variants of the QR decomposition and modified Gram-Schmidt procedure can be derived by using a graph theoretic description of system connectivity. The methods developed herein regularly order the directed graph of the mechanical system, such that the transpose of the constraint matrix has block upper triangular form, and subsequently assign independent processors tasks based upon the zero fill structure of the constraint matrix

Proceedings ArticleDOI
03 Apr 1990
TL;DR: The TQR methods offer efficient ways for identifying sinusoidals closely clustered in frequencies under stationary and nonstationary conditions and the benefit of truncated singular value decomposition (TSVD) for high-frequency resolution is demonstrated.
Abstract: Three truncated QR methods are proposed for sinusoidal frequency estimation: (1) truncated QR without column pivoting (TQR); (2) truncated QR with pre-ordered columns, (TQRR); and (3) truncated QR with column pivoting (TQRP). It is demonstrated that the benefit of truncated singular value decomposition (TSVD) for high-frequency resolution is achievable under the truncated QR approach with much lower computational cost. Other attractive features of these methods include the ease of updating, which is difficult for the SVD method, and numerical stability. Thus, the TQR methods offer efficient ways for identifying sinusoidals closely clustered in frequencies under stationary and nonstationary conditions. Based on the forward-backward linear prediction model, computer simulations and comparisons are provided for different truncation methods under various signal-to-noise ratios. >

Proceedings ArticleDOI
03 Apr 1990
TL;DR: The performance of recursive-least-squares (RLS) algorithm based on an inverse QR decomposition is reported, derived in terms of the biases that are present in steady-state along the diagonal entries of the matrix used in the approach.
Abstract: The performance of recursive-least-squares (RLS) algorithm based on an inverse QR decomposition is reported. Theoretical analysis provides performance measures in a finite precision environment. The performance measure is derived in terms of the biases that are present in steady-state along the diagonal entries of the matrix used in the approach. An analytical expression has been derived for this bias as a function of wordlength, forgetting factor, and signal statistics. This result is further used to show that the diagonal entries will not reduce to zero or become negative, thereby ensuring stability of the algorithm. All analytical results are verified by corresponding simulation results. >

Journal ArticleDOI
TL;DR: Gohberg and Leiterer as mentioned in this paper showed that every subset of C m×n consisting of all the matrices of the same rank is analytically arcwise connected, and the smoothness of the Moore-Penrose inverse of a matrix function of class Cp is smooth.

Proceedings ArticleDOI
03 Apr 1990
TL;DR: It is shown that the Q matrix can be computed easily by using a multiphase systolic algorithm, and thus the eigenvectors can also be computed without any global communication in the array.
Abstract: A multiphase systolic algorithm is proposed to solve the spectral decomposition problem based on the QR algorithm. It is shown that the Q matrix can be computed easily by using a multiphase systolic algorithm, and thus the eigenvectors can also be computed without any global communication in the array. Details on these multiphase operations of the QR algorithm as well as architectural consequences are discussed. >

01 Oct 1990
TL;DR: The development of paradigms for the efficient solution of the "inner loop" of a nonlinear optimization algorithm: the estimation of the Jacobian, its factorization, and the solution ofThe resulting trust-region problem.
Abstract: In this talk, we present algorithms and experimental results for the estimation and QR factorization of large, sparse Jacobians on a message-passing multiprocessor. The gist of this work is the development of paradigms for the efficient solution of the "inner loop" of a nonlinear optimization algorithm: the estimation of the Jacobian, its factorization, and the solution of the resulting trust-region problem. A parallel sparse QR factorization based on the global row reduction algorithm is introduced. We emphasize the commonality between row partitions that allow for the efficient parallel factorization of the Jacobian and its estimation. We also note that the interprocessor communication structure constructed for the QR factorization can also be used to solve an associated trust-region problem. Finally, experimental results obtained on the Intel iPSC/2 are presented.

Journal ArticleDOI
TL;DR: The authors introduce a family of sliding window techniques into the least-squares theory of linear prediction by using QR factorization of the Toeplitz data matrices that arise in linear prediction.
Abstract: The authors pose a sequence of linear prediction problems that differ a little from those previously posed. The solutions to these problems introduce a family of sliding window techniques into the least-squares theory of linear prediction. By using these techniques it is possible to perform QR factorization of the Toeplitz data matrices that arise in linear prediction. The matrix Q is an orthogonal version of the data matrix, and the matrix R is a Cholesky factor of the experimental correlation matrix., The QR and Cholesky algorithms generate generalized reflection coefficients that may be used in the usual ways for analysis, synthesis, or classification. >

Proceedings ArticleDOI
05 Sep 1990
TL;DR: The author gives an overview of the Warp systolic array and describes the AL language and its implementation for the Warp machine, a sequential programming language extended with the DARRAY data structure and DO looping construct to guide the compiler to generate efficient parallel code.
Abstract: The author gives an overview of the Warp systolic array and describes the AL language and its implementation for the Warp machine. AL is a sequential programming language extended with the DARRAY data structure and DO looping construct to guide the compiler to generate efficient parallel code. The author has implemented an AL compiler for the Warp machine and has been using AL to program matrix computation applications. Examples of LU decomposition, QR decomposition, and singular value decomposition (SVD) are given to illustrate the use of AL. More than 27 MFLOPS (out of 100 MFLOPS peak) on matrices of order 300 were achieved for these applications. >

Journal ArticleDOI
TL;DR: The parallel algorithm described in this paper is based on the sequential QR factorization algorithm for Toeplitz matrices recently developed by the authors, and the total storage required is O(n), i.e., only a constant per cell.

Proceedings ArticleDOI
08 Apr 1990
TL;DR: This paper describes the AL programming lani page for the Warp systolic array, a linear array of 11 processing cells, and examples of LU decomposition, QR decompositioii, and singular value decomposition were used to illustrate the use of AL.
Abstract: The author gives an overview of the Warp systolic array and describes the AL language and its implementation for the Warp machine. AL is a sequential programming language extended with the DARRAY data structure and DO looping construct to guide the compiler to generate efficient parallel code. The author has implemented an AL compiler for the Warp machine and has been using AL to program matrix computation applications. Examples of LU decomposition, QR decomposition, and singular value decomposition (SVD) are given to illustrate the use of AL. More than 27 MFLOPS (out of 100 MFLOPS peak) on matrices of order 300 were achieved for these applications. >

01 Jan 1990
TL;DR: In this paper, the theory behind the recent work on QR decomposition (QRD) based lattice filter algorithms can be applied to the wide-band beamforming problem and describe the computationally efficient QRD-based algorithm that results.
Abstract: By taking advantage of the time-shift properties inherent in the data, the computational load of a least squares wide-band beamforming algorithm may be reduced from O(N2p2) to O(Np2), assuming a p-channel beamformer with an N-tap filter in each channel. We show how the theory behind the recent work on QR decomposition (QRD) based lattice filter algorithms can be applied to the wide-band beamforming problem and describe the computationally efficient QRD-based algorithm that results. The resulting architecture is essentially the same as the “lattice of triangular arrays” that has been derived, separately, by Yang and B6hme and by Ling. The connection between these different approaches is reviewed. We also describe a simplified derivation of the QRD-based lattice algorithm that is applicable to both the adaptive filtering and the wide-band beamforming problems.

Proceedings ArticleDOI
03 Apr 1990
TL;DR: It is shown how the theory behind the recent work on QR decomposition (QRD)-based lattice filter algorithms can be applied to the wideband beamforming problem, and the computationally efficient QRD-based algorithm that results is described.
Abstract: Taking advantage of the time-shift properties inherent in the data makes it possible to reduce the computational load of a least-squares wideband beamforming algorithm from O(N/sup 2/p/sup 2/) to O(Np/sup 2/), assuming a p-channel beamformer with an N-tap filter in each channel. It is shown how the theory behind the recent work on QR decomposition (QRD)-based lattice filter algorithms can be applied to the wideband beamforming problem, and the computationally efficient QRD-based algorithm that results is described. The resulting architecture is essentially the same as the lattice of triangular arrays that has been derived, separately, by B. Yang and J.F. Bohme (1989) and by F. Ling (1989). The connection between these different approaches is reviewed. Also described is a simplified derivation of the QRD-based lattice algorithm that is applicable to both the adaptive filtering and the wideband beamforming problems. >


Proceedings ArticleDOI
01 Nov 1990
TL;DR: An algorithm for adaptively estimating the noise subspace of a data matrix as is required in signal processing applications employing the ''signal subspace'' approach is developed using a rank-revealing QR factorization instead of the more expensive singular value or eigenvalue decompositions.
Abstract: We develop an algorithm for adaptively estimating the noise subspace of a data matrix as is required in signal processing applications employing the ''signal subspace'' approach. The noise subspace is estimated using a rank-revealing QR factorization instead of the more expensive singular value or eigenvalue decompositions. Using incremental condition estimation to monitor the smallest singular values of triangular matrices we can update the rank-revealing triangular factorization inexpensively when new rows are added and old rows are deleted. Experiments demonstrate that the new approach usually requires 0(n2) work to update an n x n matrix and accurately tracks the noise subspace.© (1990) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Journal ArticleDOI
TL;DR: The commenters address a statement made in the above-titled paper that 'it would be completely impractical to solve this huge system of simultaneous equations by algebraic methods such as matrix manipulation', and point out that the method of Gaussian elimination solves the problem in a low order polynomial time.
Abstract: The commenters address a statement made in the above-titled paper J.G. Daugman (see ibid., vol.36, no.7, pp.1169-79, July 1988) that 'it would be completely impractical to solve this huge system of simultaneous equations by algebraic methods such as matrix manipulation, since the complexity of such methods grows factorially with the number of simultaneous equations'. They point out that the method of Gaussian elimination solves the problem in a low order polynomial time; specifically, O(N/sup 3/) arithmetic operations are needed where N is the number of linear equations and the number of unknowns. Major algorithms include LU decomposition requiring O(N/sup 3//3) operations; the Householder QR decomposition, requiring O(2N/sup 3//3) operations; and the Givens QR decomposition, requiring O(4N/sup 3/) operations. >

Journal ArticleDOI
TL;DR: Presents a process to linearize formally a constrained nonlinear automatic lens design problem, which is formulated by means of the penalty function method, and is given the solutions derived by the QR factorization method.
Abstract: Presents a process to linearize formally a constrained nonlinear automatic lens design problem, which is formulated by means of the penalty function method. Also is given the solutions derived by the QR factorization method.