scispace - formally typeset
Search or ask a question

Showing papers on "Sparse matrix published in 1993"


Journal ArticleDOI
TL;DR: This article surveys iterative domain decomposition techniques that have been developed in recent years for solving several kinds of partial differential equations, including elliptic, parabolic, and differential systems such as the Stokes problem and mixed formulations of elliptic problems.
Abstract: Domain decomposition (DD) has been widely used to design parallel efficient algorithms for solving elliptic problems. In this thesis, we focus on improving the efficiency of DD methods and applying them to more general problems. Specifically, we propose efficient variants of the vertex space DD method and minimize the complexity of general DD methods. In addition, we apply DD algorithms to coupled elliptic systems, singular Neumann boundary problems and linear algebraic systems. We successfully improve the vertex space DD method of Smith by replacing the exact edge, vertex dense matrices by approximate sparse matrices. It is extremely expensive to calculate, invert and store the exact vertex and edge Schur complement dense sub-matrices in the vertex space DD algorithm. We propose several approximations for these dense matrices, by using Fourier approximation and an algebraic probing technique. Our numerical and theoretical results show that these variants retain the fast convergence rate and greatly reduce the computational cost. We develop a simple way to reduce the overall complexity of domain decomposition methods through choosing the coarse grid size. For sub-domain solvers with different complexities, we derive the optimal coarse grid size $H\sb{opt},$ which asymptotically minimizes the total computational cost of DD methods under the sequential and parallel environments. The overall complexity of DD methods is significantly reduced by using this optimal coarse grid size. We apply the additive and multiplicative Schwarz algorithms to solving coupled elliptic systems. Using the Dryja-Widlund framework, we prove that their convergence rates are independent of both the mesh and the coupling parameters. We also construct several approximate interface sparse matrices by using Sobolev inequalities, Fourier analysis and probe technique. We further discuss the application of DD to the singular Neumann boundary value problems. We extend the general framework to these problems and show how to deal with the null space in practice. Numerical and theoretical results show that these modified DD methods still have optimal convergence rate. By using the DD methodology, we propose algebraic additive and multiplicative Schwarz methods to solve general sparse linear algebraic systems. We analyze the eigenvalue distribution of the iterative matrix of each algebraic DD method to study the convergence behavior.

550 citations


Journal ArticleDOI
TL;DR: In this article, a class of vector-space bases is introduced for sparse representation of discretizations of integral operators, where an operator with a smooth, nonoscillatory kernel possessing a finite number of singularities in each row or column is represented in these bases as a sparse matrix, to high precision.
Abstract: A class of vector-space bases is introduced for the sparse representation of discretizations of integral operators An operator with a smooth, nonoscillatory kernel possessing a finite number of singularities in each row or column is represented in these bases as a sparse matrix, to high precision A method is presented that employs these bases for the numerical solution of second-kind integral equations in time bounded by $O(n\log ^2 n)$, where n is the number of points in the discretization Numerical results are given which demonstrate the effectiveness of the approach, and several generalizations and applications of the method are discussed

378 citations


Journal ArticleDOI
TL;DR: This paper considers construction and properties of factorized sparse approximate inverse preconditionings well suited for implementation on modern parallel computers to preserve symmetry and/or positive definiteness of the original matrix and lead to convergent splittings.
Abstract: This paper considers construction and properties of factorized sparse approximate inverse preconditionings well suited for implementation on modern parallel computers. In the symmetric case such preconditionings have the form $A \to G_L AG_L^T $, where $G_L $ is a sparse approximation based on minimizing the Frobenius form $\| I - G_L L_A \|_F $ to the inverse of the lower triangular Cholesky factor $L_A $ of A, which is not assumed to be known explicitly. These preconditionings preserve symmetry and/or positive definiteness of the original matrix and, in the case of M-, H-, or block H-matrices, lead to convergent splittings.

364 citations


31 Dec 1993
TL;DR: A heuristic is presented that helps to improve the quality of the bisection returned by the Kernighan-Lin and greedy graph bisection algorithms and helps to reduce the amount of fill-in produced by separator-based algorithms that reorder a matrix before factorization.
Abstract: We present a heuristic that helps to improve the quality of the bisection returned by the Kernighan-Lin and greedy graph bisection algorithms. This in turns helps to reduce the amount of fill-in produced by separator-based algorithms that reorder a matrix before factorization. We also describe the performance of our heuristic on graphs from the Harwell-Boeing collection of sparse matrix test problems, and compare them with known results by other methods on the same graphs.

284 citations


Journal ArticleDOI
TL;DR: Two sparse Cholesky factorization algorithms are examined in a systematic and consistent fashion, both to illustrate the strengths of the blocking techniques in general and to obtain a fair evaluation of the two approaches.
Abstract: As with many other linear algebra algorithms, devising a portable implementation of sparse Cholesky factorization that performs well on the broad range of computer architectures currently available is a formidable challenge. Even after limiting the attention to machines with only one processor, as has been done in this paper, there are still several interesting issues to consider. For dense matrices, it is well known that block factorization algorithms are the best means of achieving this goal. This approach is taken for sparse factorization as well.This paper has two primary goals. First, two sparse Cholesky factorization algorithms, the multifrontal method and a blocked left-looking sparse Cholesky method, are examined in a systematic and consistent fashion, both to illustrate the strengths of the blocking techniques in general and to obtain a fair evaluation of the two approaches, Second, the impact of various implementation techniques on time and storage efficiency is assessed, paying particularly clo...

195 citations


BookDOI
01 Jan 1993
TL;DR: The articles in this volume are based on recent research on sparse matrix computations and examine graph theory as it connects to linear algebra, parallel computing, data structures, geometry and both numerical and discrete algorithms.
Abstract: When reality is modelled by computation, matrices are often the connection between the continuous physical world and the finite algorithmic one. Usually, the more detailed the model, the bigger the matrix; however, efficiency demands that every possible advantage be exploited. The articles in this volume are based on recent research on sparse matrix computations. They examine graph theory as it connects to linear algebra, parallel computing, data structures, geometry and both numerical and discrete algorithms. The articles are grouped into three general categories: graph models of symmetric matrices and factorizations; graph models of algorithms on nonsymmetric matrices; and parallel sparse matrix algorithms.

155 citations


Journal ArticleDOI
TL;DR: Three different preconditioners, namely, the incomplete LU factorization (ILU), block diagonal factorization, and the symmetric successive over-relaxation (SSOR) are investigated, which have been optimized to have good vectorization properties.

152 citations


Proceedings ArticleDOI
01 Dec 1993
TL;DR: Numerical results show that the new reordering algorithm can in some cases reduce the envelope of a sparse matrix by more than a factor of two over the current standard algorithms such as Gibbs-Poole-Stockmeyer or SPARSPAK's reverse Cuthill-McKee.
Abstract: An algorithm for reducing the envelope of a sparse matrix is presented. This algorithm is based on the computation of eigenvectors of the Laplacian matrix associated with the graph of the sparse matrix. A reordering of the sparse matrix is determined based on the numerical values of the entries of an eigenvector of the Laplacian matrix. Numerical results show that the new reordering algorithm can in some cases reduce the envelope by more than a factor of two over the current standard algorithms such as Gibbs-Poole-Stockmeyer or SPARSPAK's reverse Cuthill-McKee.

140 citations


Journal ArticleDOI
F.X. Canning1
TL;DR: The impedance matrix localization (IML) method was introduced as a modification to standard moment method calculations to ease the limitations of requiring excessive storage and execution times for even modestly large electromagnetics problems as mentioned in this paper.
Abstract: Moment method calculations have the well-known limitations of requiring excessive storage and execution times for even modestly large electromagnetics problems. The impedance matrix localization (IML) method was introduced as a modification to standard moment method calculations to ease these limitations. It utilizes a matrix transformation which effectively changes the basis (testing) functions into ones resembling traveling waves. An improved method that uses an orthogonal transformation to generate standing-wave-like basis functions is presented here. Remarkable improvements are achieved in the numerical stability of the method and in its compatibility with iterative solvers. Furthermore, the correspondence of the large elements in this matrix to geometrical theory of diffraction (GTD) terms is strengthened, as is the possibility of further increasing the speed of iterative solutions by constructing preconditioners based on the pattern of nonzero matrix elements. >

129 citations


01 Aug 1993
TL;DR: A new technique for sparse matrix multiplication on vector multiprocessors based on the efficient implementation of a segmented sum operation that is better suited than the Ellpack/Itpack or the Jagged Diagonal algorithms for matrices which have a varying number of non-zero elements in each row.
Abstract: In this paper we present a new technique for sparse matrix multiplication on vector multiprocessors based on the efficient implementation of a segmented sum operation. We describe how the segmented sum can be implemented on vector multiprocessors such that it both fully vectorizes within each processor and parallelizes across processors. Because of our method''s insensitivity to relative row size, it is better suited than the Ellpack/Itpack or the Jagged Diagonal algorithms for matrices which have a varying number of non-zero elements in each row. Furthermore, our approach requires less preprocessing (no more time than a single sparse matrix-vector multiplication), less auxiliary storage, and uses a more convenient data representation (an augmented form of the standard compressed sparse row format). We have implemented our algorithm (SEGMV) on the Cray Y-MP C90, and have compared its performance with other methods on a variety of sparse matrices from the Harwell-Boeing collection and industrial application codes. Our performance on the test matrices is up to 3 times faster than the Jagged Diagonal algorithm and up to 5 times faster than Ellpack/Itpack method. Our preprocessing time is an order of magnitude faster than for the Jagged Diagonal algorithm. Also, using an assembly language implementation of SEGMV on a 16 processor C90, the NAS Conjugate Gradient benchmark runs at 3.5 gigaflops.

103 citations


Journal ArticleDOI
TL;DR: The elimination tree is generalized to a structure appropriate for the sparse $LU$ factorization of unsymmetric matrices and a pair of directed acyclic graphs, called elimination dags, are defined and used to characterize the zero-nonzero structures of the lower and upper triangular factors.
Abstract: The elimination tree is central to the study of Cholesky factorization of sparse symmetric positive definite matrices. In this paper, the elimination tree is generalized to a structure appropriate for the sparse $LU$ factorization of unsymmetric matrices. A pair of directed acyclic graphs, called elimination dags, is defined and they are used to characterize the zero-nonzero structures of the lower and upper triangular factors. These elimination structures are applied in a new algorithm to compute fill for sparse $LU$ factorization. Experimental results indicate that the new algorithm is usually faster than earlier methods.

Proceedings ArticleDOI
01 Aug 1993
TL;DR: The method presented in this paper postpones data structure selection until the compile phase, thereby allowing the compiler to combine code optimization with explicitData structure selection, and the task of the programmer is greatly reduced in complexity.
Abstract: The problem of compiler optimization of sparse codes is well known and no satisfactory solutions have been found yet. One of the major obstacles is formed by the fact that sparse programs deal explicitly with the particular data structures selected for storing sparse matrices. This explicit data structure handling obscures the functionality of a code to such a degree that the optimization of the code is prohibited, e.g. by the introduction of indirect addressing. The method presented in this paper postpones data structure selection until the compile phase, thereby allowing the compiler to combine code optimization with explicit data structure selection. Not only enables this method the compiler to generate efficient code for sparse computations, also the task of the programmer is greatly reduced in complexity.

Journal ArticleDOI
TL;DR: A task-to-processor mapping algorithm is described for computing the parallel multifrontal Cholesky factorization of irregular sparse problems on distributed-memory multiprocessors that is nearly as efficient on a collection of problems with irregular sparsity structure as it is for the regular grid problems.
Abstract: A task-to-processor mapping algorithm is described for computing the parallel multifrontal Cholesky factorization of irregular sparse problems on distributed-memory multiprocessors. The performance of the mapping algorithm is compared with the only general mapping algorithm previously reported. Using this mapping, the distributed multifrontal algorithm is nearly as efficient on a collection of problems with irregular sparsity structure as it is for the regular grid problems.

Journal ArticleDOI
TL;DR: A simple characterization of fundamental supernodes is given in terms of the row subtrees of sparse Cholesky factors in the elimination tree and an efficient algorithm is presented that determines the set of suchsupernodes in time proportional to the number of nonzeros and equations in the original matrix.
Abstract: A simple characterization of fundamental supernodes is given in terms of the row subtrees of sparse Cholesky factors in the elimination tree. Using this characterization, an efficient algorithm is presented that determines the set of such supernodes in time proportional to the number of nonzeros and equations in the original matrix. Experimental results verify the practical efficiency of this algorithm.

01 Jan 1993
TL;DR: In this paper, a unified and elementary introduction to the standard characterizations of chordal graphs and clique trees is presented, as well as a detailed proof of all the results.
Abstract: Clique trees and chordal graphs have carved out a niche for themselves in recent work on sparse matrix algorithms, due primarily to research questions associated with advanced computer architectures. This paper is a unified and elementary introduction to the standard characterizations of chordal graphs and clique trees. The pace is leisurely, as detailed proofs of all results are included. We also briefly discuss applications of chordal graphs and clique trees in sparse matrix computations.

Journal ArticleDOI
TL;DR: It is proved that with high probability the algorithms produce well-balanced storage for sufficiently large matrices with bounded number of nonzeros in each row and column, but no other restrictions on structure.
Abstract: This paper investigates the balancing of distributed compressed storage of large sparse matrices on a massively parallel computer. For fast computation of matrix–vector and matrix–matrix products on a rectangular processor array with efficient communications along its rows and columns it is required that the nonzero elements of each matrix row or column be distributed among the processors located within the same array row or column, respectively. Randomized packing algorithms are constructed with such properties, and it is proved that with high probability the algorithms produce well-balanced storage for sufficiently large matrices with bounded number of nonzeros in each row and column, but no other restrictions on structure. Then basic matrix–vector multiplication routines are described with fully parallel interprocessor communications and intraprocessor gather and scatter operations. Their efficiency isdemonstrated on the 16,384-processor MasPar computer.

Journal ArticleDOI
TL;DR: A method for computing a few eigenpairs of sparse symmetric matrices is presented and analyzed that combines the power of preconditioning techniques with the efficiency of the Lanczos algorithm.
Abstract: A method for computing a few eigenpairs of sparse symmetric matrices is presented and analyzed that combines the power of preconditioning techniques with the efficiency of the Lanczos algorithm. The method is related to Davidson’s method and its generalizations, but can be less expensive for matrices that are fairly sparse. A double iteration is used. An effective termination criterion is given for the inner iteration. Quadratic convergence with respect to the outer loop is shown.

Journal ArticleDOI
TL;DR: A straightforward parallelization of the left-looking supernodal sparse Cholesky factorization algorithm for shared-memory MIMD multiprocessors, which improves performance by reducing indirect addressing and memory traffic.
Abstract: This paper presents a parallel sparse Cholesky factorization algorithm for shared-memory MIMD multiprocessors. The algorithm is particularly well suited for vector supercomputers with multiple processors, such as the Cray Y-MP. The new algorithm is a straightforward parallelization of the left-looking supernodal sparse Cholesky factorization algorithm. Like its sequential predecessor, it improves performance by reducing indirect addressing and memory traffic. Experimental results on a Cray Y-MP demonstrate the effectiveness of the new algorithm. On eight processors of a Cray Y-MP, the new routine performs the factorization at rates exceeding one Gflop for several test problems from the Harwell–Boeing sparse matrix collection.

Journal ArticleDOI
TL;DR: It is shown that computing the sparse matrix inverse, used for any round of an expectation-maximization algorithm, is only about three times as expensive as computation of the determinant,Used for each step of a derivative-free algorithm.

Proceedings ArticleDOI
16 Aug 1993
TL;DR: This paper analyzes the performance and scalability of a number of parallel formulations of the matrix multiplication algorithm and predicts the conditions under which each formulation is better than the others.
Abstract: A number of parallel formulations of dense matrix multiplication algorithm have been developed For arbitrarily large number of processors, any of these algorithms or their variants can provide near linear speedup for sufficiently large matrix sizes and none of the algorithms can be clearly claimed to be superior than the others In this paper we analyze the performance and scalability of a number of parallel formulations of the matrix multiplication algorithm and predict the conditions under which each formulation is better than the others

Journal ArticleDOI
TL;DR: New implementation techniques for a modified Forrest-Tomlin LU update which reduce the time complexity of the update and the solution of the associated sparse linear systems of simplex-based linear programming software are presented.
Abstract: This paper discusses sparse matrix kernels of simplex-based linear programming software. State-of-the-art implementations of the simplex method maintain an LU factorization of the basis matrix which is updated at each iteration. The LU factorization is used to solve two sparse sets of linear equations at each iteration. We present new implementation techniques for a modified Forrest-Tomlin LU update which reduce the time complexity of the update and the solution of the associated sparse linear systems. We present numerical results on Netlib and other real-life LP models.

Journal ArticleDOI
TL;DR: The parallel solution of a sparse system Lx = b with triangular matrix L is considered, which is often a performance bottleneck in parallel computation, and the inverse of L is represented as a product of a few sparse factors.
Abstract: This paper considers the parallel solution of a sparse system $Lx = b$ with triangular matrix L, which is often a performance bottleneck in parallel computation. When many systems with the same matrix are to be solved, the parallel efficiency can be improved by representing the inverse of L as a product of a few sparse factors. The factorization with the smallest number of factors is constructed, subject to the requirement that no new nonzero elements are created. Applications are to iterative solvers with triangular preconditioners, to structural analysis, or to power systems applications. Experimental results on the Connection Machine show the method to be highly valuable.

Book ChapterDOI
01 Jan 1993
TL;DR: This work studies structure prediction for computations that involve nonsymmetric row and column permutations and nonsymmetry or non-square matrices and bipartite graphs, matchings, and alternating paths.
Abstract: Many computations on sparse matrices have a phase that predicts the nonzero structure of the output, followed by a phase that actually performs the numerical computation. We study structure prediction for computations that involve nonsymmetric row and column permutations and nonsymmetric or non-square matrices. Our tools are bipartite graphs, matchings, and alternating paths.

Journal ArticleDOI
TL;DR: In this article, a relaxation algorithm is used to obtain approximate solutions to these systems of equations without resorting to standard matrix inversion routines or sparse matrix solvers, which can be used for 3D magnetotelluric modeling.
Abstract: In recent years, there has been a tremendous amount of progress made in three-dimensional (3-D) magnetotelluric modeling algorithms. Much of this work has been devoted to the integral equation technique (e.g., Hohmann, 1975; Weidelt, 1975; Wannamaker et al., 1984; Wannamaker, 1991). This method has contributed significantly to our understanding of electromagnetic field behavior in 3-D models. However, some of the very earliest work in 3-D modeling concentrated on differential methods (e.g., Jones and Pascoe, 1972; Reddy et al., 1977). It is generally recognized that differential methods are better suited than integral equation methods to model arbitrarily complex geometries, and consequently this area has recently been receiving a great deal of attention (e.g., Madden and Mackie, 1989; Xinghua et al., 1991; Mackie et al., 1993; Smith, 1992, personnal communication). Differential methods lead to large sparse systems of equations to be solved for the unknown field values. It is possible to use relaxation algorithms to quickly obtain approximate solutions to these systems of equations without resorting to standard matrix inversion routines or sparse matrix solvers.

Journal Article
TL;DR: A class of vector-space bases is introduced for the sparse representation of discretizations of integral operators possessing a smooth, nonoscillatory kernel possessing a finite number of singularities in each row or column as a sparse matrix, to high precision.
Abstract: A class of vector-space bases is introduced for the sparse representation of discretizations of integral operators. An operator with a smooth, nonoscillatory kernel possessing a finite number of singularities in each row or column is represented in these bases as a sparse matrix, to high precision. A method is presented that employs these bases for the numerical solution of second-kind integral equations in time bounded by O(n log 2 n), where n is the number of points in the decretization

Journal ArticleDOI
TL;DR: In this article, the exact solution of scattering by a two-dimensional random rough surface (three-dimensional scattering problem) with moderate RMS height and slopes is calculated using a new numerical method called the sparse-matrix flat-surface iterative approach.
Abstract: The exact solution of scattering by a two-dimensional random rough surface (three-dimensional scattering problem) with moderate RMS height and slopes is calculated using a new numerical method called the sparse-matrix flat-surface iterative approach. Comparison is also made with the second-order Kirchhoff approximation.

01 Jan 1993
TL;DR: This thesis provides the first thorough analysis of the interaction between sequential sparse Cholesky factorization methods and memory hierarchies, and shows that panel methods are inappropriate for large-scale parallel machines because they do not expose enough concurrency.
Abstract: Cholesky factorization of large sparse matrices is an extremely important computation, arising in a wide range of domains including linear programming, finite element analysis, and circuit simulation. This thesis investigates crucial issues for obtaining high performance for this computation on sequential and parallel machines with hierarchical memory systems. The thesis begins by providing the first thorough analysis of the interaction between sequential sparse Cholesky factorization methods and memory hierarchies. We look at popular existing methods and find that they produce relatively poor memory hierarchy performance. The methods are extended, using blocking techniques, to reuse data in the fast levels of the memory hierarchy. This increased reuse is shown to provide a three-fold speedup over popular existing approaches (e.g., SPARSPAK) on modern workstations. The thesis then considers the use of blocking techniques in parallel sparse factorization. We first describe parallel methods we have developed that are natural extensions of the sequential approach described above. These methods distribute panels (sets of contiguous columns with nearly identical non-zero structures) among the processors. The thesis shows that for small parallel machines, the resulting methods again produce substantial performance improvements over existing methods. A framework is provided for understanding the performance of these methods, and also for understanding the limitations inherent in them. Using this framework, the thesis shows that panel methods are inappropriate for large-scale parallel machines because they do not expose enough concurrency. The thesis then considers rectangular block methods, where the sparse matrix is split both vertically and horizontally. These methods address the concurrency problems of panel methods, but they also introduce a number of complications. Primary among these are issues of choosing blocks that can be manipulated efficiently and structuring a parallel computation in terms of these blocks. The thesis describes solutions to these problems and presents performance results from an efficient block method implementation. The contributions of this work come both from its theoretical foundation for understanding the factors that limit the scalability of panel- and block-oriented methods on hierarchical memory multiprocessors, and from its investigation of practical issues related to the implementation of efficient parallel factorization methods.

Journal ArticleDOI
TL;DR: The findings show that convergence rate degrades significantly with increasing matrix rank and decreasing electrical loss for mesh spacings which adequately resolve the physical wavelengths of the electromagnetic wave propagation, but with proper choice of algorithm and preconditioning, reliable convergence has been achieved.
Abstract: The formulations used center on Helmholtz weak forms which have been shown to be numerically robust and to afford additional sparsity in the resulting system of algebraic equations. Practical solution of these equations depends critically on the realization of an effective sparse matrix solver. Experience with several conjugate gradient-type methods is reported. The findings show that convergence rate (and even convergence in some cases) degrades significantly with increasing matrix rank and decreasing electrical loss for mesh spacings which adequately resolve the physical wavelengths of the electromagnetic wave propagation. However, with proper choice of algorithm and preconditioning, reliable convergence has been achieved for matrix ranks exceeding 2*10 5 on domains having sizeable volumes of electrically lossless regions. An automatic grid generation scheme for constructing meshes which consist of variable element sizes that conform to a predefined set of boundaries is discussed. >

Journal ArticleDOI
TL;DR: In this paper, the modes of an arbitrarily shaped hollow metallic waveguide are found, using a surface-integral-equation/method-of-moments (MOM) formulation.
Abstract: The use of wavelet-like basis functions for solving electromagnetics problems is demonstrated. In particular, the modes of an arbitrarily shaped hollow metallic waveguide are found, using a surface-integral-equation/method-of-moments (MOM) formulation. A class of wavelet-like basis functions is used to produce a sparse MOM impedance matrix, allowing the use of sparse matrix methods for fast solution of the problem. The same method applies directly to the external scattering problem. For the examples considered, the wavelet-domain impedance matrix has about 20% nonzero elements, and the time required to compute its LU factorization is reduced by approximately a factor of 10 compared to that for the original full matrix. >

Proceedings ArticleDOI
Edward Rothberg1, Anoop Gupta
01 Dec 1993
TL;DR: The authors propose and evaluate an approach that is simple to implement, provides slightly higher performance than column (and panel) methods on small parallel machines, and has the potential to provide much higher performance on large parallel machines.
Abstract: The authors explore the use of a sub-block decomposition strategy for parallel sparse Cholesky factorization, in which the sparse matrix is decomposed into rectangular blocks. Such a strategy has enormous theoretical scalability advantages over more traditional column-oriented and panel-oriented decompositions. However, little progress has been made in producing a practical sub-block method. The authors propose and evaluate an approach that is simple to implement, provides slightly higher performance than column (and panel) methods on small parallel machines, and has the potential to provide much higher performance on large parallel machines.