scispace - formally typeset
Search or ask a question

Showing papers on "Sparse approximation published in 1997"


Journal ArticleDOI
TL;DR: A parallel preconditioner is presented for the solution of general sparse linear systems of equations using a sparse approximate inverse computed explicitly and then applied as a preconditionser to an iterative method.
Abstract: A parallel preconditioner is presented for the solution of general sparse linear systems of equations. A sparse approximate inverse is computed explicitly and then applied as a preconditioner to an iterative method. The computation of the preconditioner is inherently parallel, and its application only requires a matrix-vector product. The sparsity pattern of the approximate inverse is not imposed a priori but captured automatically. This keeps the amount of work and the number of nonzero entries in the preconditioner to a minimum. Rigorous bounds on the clustering of the eigenvalues and the singular values are derived for the preconditioned system, and the proximity of the approximate to the true inverse is estimated. An extensive set of test problems from scientific and industrial applications provides convincing evidence of the effectiveness of this approach.

635 citations


Journal ArticleDOI
TL;DR: Given local spatial error dependence, one can construct sparse spatial weight matrices and computed a simultaneous autoregression using 20 640 observations in under 19 min despite needing to compute a 20 640 by 20 640 determinant 10 times.

415 citations


Journal ArticleDOI
TL;DR: The first algorithms to factor a wide class of sparse matrices that are asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures are presented.
Abstract: In this paper, we describe scalable parallel algorithms for symmetric sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1,024 processors on a Gray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithms substantially improve the state of the art in parallel direct solution of sparse linear systems-both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithms to factor a wide class of sparse matrices (including those arising from two- and three-dimensional finite element problems) that are asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithms incur less communication overhead and are more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of one of our sparse Cholesky factorization algorithms delivers up to 20 GFlops on a Gray T3D for medium-size structural engineering and linear programming problems. To the best of our knowledge, this is the highest performance ever obtained for sparse Cholesky factorization on any supercomputer.

239 citations


Proceedings ArticleDOI
18 Dec 1997
TL;DR: The data locality characteristics of the compressed sparse row representation is examined and improvements in locality through matrix permutation are considered and modified sparse matrix representations are evaluated.
Abstract: We analyze single node performance of sparse matrix vector multiplication by investigating issues of data locality and fine grained parallelism. We examine the data locality characteristics of the compressed sparse row representation and consider improvements in locality through matrix permutation. Motivated by potential improvements in fine grained parallelism, we evaluate modified sparse matrix representations. The results lead to general conclusions about improving single node performance of sparse matrix vector multiplication in parallel libraries of sparse iterative solvers.

168 citations


Journal ArticleDOI
TL;DR: In this paper, fast and adaptive algorithms for numerically solving nonlinear partial differential equations of the form = Lu+ Nf(u), where L and N are linear differential operators and f(u) is a nonlinear function are developed.

151 citations


Patent
31 Oct 1997
TL;DR: In this paper, a composite data construct is created to accommodate sparse dimensions when the database is defined, sparse dimensions are identified and composites of those sparse dimensions were created The composites are stored in a linearized array The amount of storage space allocated by the database administrator to satisfy reasonable expected growth of the sparse data
Abstract: In a multi-dimensional database, a composite data construct is created to accommodate sparse dimensions When the database is defined, sparse dimensions are identified and composites of those sparse dimensions are created The composites are stored in a linearized array The amount of storage space allocated is chosen by the database administrator to satisfy reasonable expected growth of the sparse data

104 citations


Book ChapterDOI
26 Aug 1997
TL;DR: A relational algebra based framework for compiling efficient sparse matrix code from dense DO-ANY loops and a specification of the representation of the sparse matrix is presented.
Abstract: We present a relational algebra based framework for compiling efficient sparse matrix code from dense DO-ANY loops and a specification of the representation of the sparse matrix. We present experimental data that demonstrates that the code generated by our compiler achieves performance competitive with that of hand-written codes for important computational kernels.

75 citations


Proceedings ArticleDOI
05 Jan 1997
TL;DR: Improved upper bounds on the discrepancy of two well-studied families of sparse matrices: e permutations of [n], and rectangles containing n points in Rk are shown, and a discrepancy bound of O(&logn) is shown for the former, improving on the previous-best O( logn) due to Bohus.
Abstract: A powerful technique to approximate certain sparse integer programs due to Beck & Fiala, shows that matrices A E {-l,O, 1),X, with no column having more than t nonzeroes, have discrepancy disc(A) less than 2t. An outstanding conjecture of Beck & Fiala is that this disc(A) here is O(d). This, if true, would be best-possible; any bound of o(t) would be very interesting. We make progress on this by showing that certain related discrepancy measures of A that are lower bounds on disc(A), are O(t3i4 log t) (i.e., o(t)). We also show that disc(A) = O(&logn), improving the Beck-Spencer bound of O(&log tlog n). These results also apply to the lattice approximation problem of Raghavan. We show improved upper bounds on the discrepancy of two well-studied families of sparse matrices: e permutations of [n], and rectangles containing n points in Rk. We show a discrepancy bound of O(&logn) for the former, improving on the previous-best O(Llogn) due to Bohus. This improves the bounds for the latter, for k = 2,3 and 4. We also present a simple connection between discrepancy and communication complexity.

67 citations


Journal ArticleDOI
TL;DR: This work shows how to use wavelet compression ideas to improve the performance of approximate inverse preconditioners by first transforming the inverse of the coefficient matrix into a wavelet basis, before applying standard approximate inverse techniques.
Abstract: There is an increasing interest in using sparse approximate inverses as preconditioners for Krylov subspace iterative methods. Recent studies of Grote and Huckle and Chow and Saad also show that sparse approximate inverse preconditioner can be effective for a variety of matrices, e.g. Harwell-Boeing collections. Nonetheless a drawback is that it requires rapid decay of the inverse entries so that sparse approximate inverse is possible. However, for the class of matrices that, come from elliptic PDE problems, this assumption may not necessarily hold. Our main idea is to look for a basis, other than the standard one, such that a sparse representation of the inverse is feasible. A crucial observation is that the kind of matrices we are interested in typically have a piecewise smooth inverse. We exploit this fact, by applying wavelet techniques to construct a better sparse approximate inverse in the wavelet basis. We shall justify theoretically and numerically that our approach is effective for matrices with smooth inverse. We emphasize that in this paper we have only presented the idea of wavelet approximate inverses and demonstrated its potential but have not yet developed a highly refined and efficient algorithm.

64 citations


Proceedings ArticleDOI
01 Jul 1997
Abstract: We analyse the probability of success of the block algorithm proposed by Coppersmith for solving large sparse systems Aw = O of linear equations over a field K. Itis based on a modification of a scheme proposed by Wiedemann. An open question was to prove that the block algorithm may produce a solution for small finite fields e.g. for K =GF(2). Our investigations allow us to answer this question nearly completely. We prove that the input parameters of the algorithm may be tuned such that, for any input system, a solution is computed with high probability for any field. Conversely, for particular input systems, we show that the conditions on the input parameters may be relaxed to ensure the success. We also improve the previous probability measurements in the case of large cardkmlity fields.

61 citations


Journal ArticleDOI
TL;DR: The design, implementation, and use of subprograms for the multiplication of a fully matrix by a sparse one and for the solution of sparse triangular systems with one or more (full) right-hand sides are discussed.
Abstract: This article proposes a set of Level 3 Basic Linear Algebra Subprograms and associated kernels for sparse matrices. A major goal is to design and develop a common framework to enable efficient, and portable, implementations of iterative algorithms for sparse matrices on high-performance computers. We have designed the routines to shield the developer of mathematical software from most of the complexities of the various data structures used for sparse matrices. We have kept the interface and suite of codes as simple as possible while at the same time including sufficient functionality to cover most of the requirements of iterative solvers and sufficient flexibility to cover most sparse matrix data structures. An important aspect of our framework is that it can be easily extended to incorporate new kernels if the need arises. We discuss the design, implementation, and use of subprograms for the multiplication of a fully matrix by a sparse one and for the solution of sparse triangular systems with one or more (full) right-hand sides. We include a routine for checking the input data, generating a new sparse data structure from the input, and scaling a sparse matrix. The new data structure for the transformation can be specified by the user or can be chosen automatically by vendors to be efficient on their machines. We also include a routine for permuting the columns of a sparse matrix and one for permuting the rows of a full matrix.

Book ChapterDOI
28 Apr 1997
TL;DR: Nested dissection is gaining increasing popularity, due to its suitability for parallel solving, and is being proposed as a method for factorization of sparse symmetric matrices.
Abstract: Finding good orderings is a critical issue for the efficient factorization of sparse symmetric matrices, both in terms of space usage and solution time. Several ordering techniques have been proposed, among which nested dissection is gaining increasing popularity, due to its suitability for parallel solving.

08 Oct 1997
TL;DR: Spark98 is a collection of sparse matrix kernels for shared memory and message passing systems that are simple, realistic, and portable and notices that efficient parallel programming of sparse codes requires careful partitioning of data references, regardless of the underlying memory system.
Abstract: : Spark98 is a collection of sparse matrix kernels for shared memory and message passing systems. Our aim is to provide system builders with a set of example sparse matrix codes that are simple, realistic, and portable. Each kernel performs a sequence of sparse matrix vector product operations using matrices that are derived from a family of three dimensional finite element earthquake applications. We describe the computational structure of the kernels, summarize their performance on a parallel system, and discuss some of the insights that such kernels can provide. In particular we notice that efficient parallel programming of sparse codes requires careful partitioning of data references, regardless of the underlying memory system. So on one hand, efficient shared memory programs can be just as difficult to write as efficient message passing programs. On the other hand, shared memory programs are not necessarily less efficient than message passing programs.

Book ChapterDOI
01 Jan 1997
TL;DR: This paper reviews recent developments in techniques for representing data in terms of its local scale components that allow data compression through elimination of scale-coefficients that are sufficiently small in the transformed representation.
Abstract: We review recent developments in techniques for representing data in terms of its local scale components. These techniques allow data compression through elimination of scale-coefficients that are sufficiently small in the transformed representation. This capability for data compression can be used to reduce the cost of many numerical solution algorithms either by applying it to the numerical solution operator in order to get an approximate sparse representation, or by applying it to the numerical solution itself in order to reduce the number of quantities that need to be computed.

11 Jul 1997
TL;DR: This thesis presents techniques for automatically generating sparse codes from dense matrix algorithms through a process called sparse compilation, and discusses the Bernoulli Sparse Compiler, which provides a novel mechanism that allows the user to extend its repertoire of sparse matrix storage formats.
Abstract: This thesis presents techniques for automatically generating sparse codes from dense matrix algorithms through a process called \emph{sparse compilation}. We will start by recognizing that sparse computations are ubiquitous to scientific computation, that these codes are difficult to write by hand, and that they are difficult for conventional compilers to optimize. We will present the sparse compiler as an alternative to writing these codes by hand or using sparse libraries. We will show how many aspects of sparse compilation can be modeled in terms of relational database concepts, These include the following: queries to express sparse computations, relations to model sparse matrices, the join operation to model simultaneous efficient access of sparse matrices. Using this model, the problem of sparse compilation can be seen as an instance of the query optimization problem. We will discuss two basic strategies for sparse compilation based upon this relational approach. One strategy is targeted towards algorithms that can be described using inner join queries, which include matrix-vector multiplication and matrix-matrix multiplication. This approach is the one that we have currently implemented. The other can handle a larger class of dependence-free matrix algorithms. Although it is more general, the latter approach introduced does not generate as efficient code for some problems as the former approach. We will show that these two approaches are grounded in properties of the relational algebra and draw connections with previous work that has been described in the database literature. We also discuss how conventional dense optimizations and fill can be handled within the overall relational framework. We will discuss the Bernoulli Sparse Compiler and use experimental results to show that this system is able to generate sparse implementations from non-trivial dense matrix algorithms that are as efficient as hand-written codes. In addition, this compiler provides a novel mechanism that allows the user to extend its repertoire of sparse matrix storage formats. Thus, the user is not only able to choose the data structures for storing the sparse matrices, but to describe these data structures as well.

Proceedings ArticleDOI
15 Nov 1997
TL;DR: Experimental data is presented that demonstrates that the code generated by the Bernoulli compiler achieves performance competitive with that of hand-written codes for important computational kernels.
Abstract: We have developed a framework based on relational algebra for compiling efficient sparse matrix code from dense DO-ANY loops and a specification of the representation of the sparse matrix. In this paper, we show how this framework can be used to generate parallel code, and present experimental data that demonstrates that the code generated by our Bernoulli compiler achieves performance competitive with that of hand-written codes for important computational kernels.

Book ChapterDOI
01 Sep 1997
TL;DR: A new invariant, the sparse condition number, is introduced and it is shown that a sparse polynomial system analysis in terms of this invariant is easier to solve than a non sparse one.
Abstract: Is a sparse polynomial system more easy to solve than a non sparse one? In this paper we introduce a new invariant, the sparse condition number, and we study sparse polynomial system analysis in terms of this invariant.

01 Jun 1997
TL;DR: This paper describes how various sparse matrix and distribution formats can be handled using the relational approach to sparse matrix code compilation and introduces a relational algebra approach to solving this problem.
Abstract: We describe how various sparse matrix and distribution formats can be handled using the relational approach to sparse matrix code compilation This approach allows for the development of compilation techniques that are independent of the storage formats by viewing the data structures as relations and abstracting the implementation details as access methods Introduction Sparse matrix computations are at the core of many computational science algorithms A typical application can often be separated into the discretization module which translates a continuous problem such as a system of di erential equations into a sequence of sparse matrix problems and into the solver module which solves the matrix problems Typically the solver is the most time and space intensive part of an application and quite naturally much e ort both in the numerical analysis and compilers communities has been devoted to producing e cient parallel and sequential code for sparse matrix solvers There are two challenges in generating solver code that has to be interfaced with discretization systems Di erent discretization systems produce the sparse matrices in many di erent formats Therefore the compiler should be able to generate solver code for di erent storage formats Some discretization systems partition the problem for parallel solution and use various methods for specifying the partitioning distribution Therefore a compiler should be able to produce parallel code for di erent distribution formats In our approach the programmer writes programs as if all matrices were dense and then provides a speci cation of which matrices are actually sparse and what formats distributions are used to represent them The job of the compiler is the following given a sequential dense matrix program descriptions of sparse matrix formats and data and computation distribution formats generate parallel sparse SPMD code and have introduced a relational algebra approach to solving this problem In this approach we view sparse matrices as database relations sparse matrix formats as implementations of access methods to the relations and execution of loop nests as evaluation of certain relational queries The key operator in these queries turns out to be the relational join For parallel execution we view loop nests as distributed queries and the process This research was supported by an NSF Presidential Young Investigator award CCR NSF Grant CCR and ONR grant N A version of this report appears in the proceedings of Eighth SIAM Conference on Parallel Processing for Scienti c Computing of generating SPMD node programs as the translation of distributed queries into equivalent local queries and communication statements In this paper we focus on how our compiler handles user de ned sparse data structures and distribution formats The architecture of the compiler is illustrated in Figure

01 Jan 1997
TL;DR: An object oriented sparse matrix library in C++ designed for portability and performance across a wide class of machine architectures is described, thus addressing many of the di culties encountered with the typical approach to sparse matrix libraries.
Abstract: We describe an object oriented sparse matrix library in C++ designed for portability and performance across a wide class of machine architectures. Besides simplifying the subroutine interface, the object oriented design allows the same driving code to be used for various sparse matrix formats, thus addressing many of the di culties encountered with the typical approach to sparse matrix libraries. We also discuss the the design of a C++ library for implementing various iterative methods for solving linear systems of equations. Performance results indicate that the C++ codes are competitive with optimized Fortran.

01 Jan 1997
TL;DR: The overall implementation and numerical issues encountered in the matrix computations, based on the condition number of a matrix are discussed and the program's capabilities are illustrated by applying it to concrete problems from vision, robotics and computational biology.
Abstract: Sparse elimination and the sparse resultant exploit the structure of polynomials by measuring their complexity in terms of Newton polytopes instead of total degree. % The sparse, or Newton, resultant generalizes % the classical homogeneous resultant and its degree is a function % of the mixed volumes of the Newton polytopes. We sketch the sparse resultant constructions of Canny and Emiris and show how they reduce the problem of root-finding to an eigenproblem. A little known method for achieving this reduction is presented which does not increase the dimension of the problem. Together with an implementation of the sparse resultant construction, this provides a general solver for polynomial systems. We discuss the overall implementation and emphasize the numerical issues encountered in the matrix computations, based on the condition number of a matrix. We illustrate the program's capabilities by applying it to concrete problems from vision, robotics and computational biology. The high efficiency and accuracy of the solutions suggest that sparse elimination may be the method of choice for systems of moderate size.

Proceedings ArticleDOI
18 Dec 1997
TL;DR: This work proposes the first known efficient scalable parallel algorithm which uses a two dimensional block cyclic distribution of T and presents the parallel runtime and scalability analyses of the proposed two dimensional algorithm, which is applicable to dense as well as sparse triangular solvers.
Abstract: Solving a system of equations of the form Tx=y, where T is a sparse triangular matrix, is required after the factorization phase in the direct methods of solving systems of linear equations. A few parallel formulations have been proposed recently. The common belief in parallelizing this problem is that the parallel formulation utilizing a two dimensional distribution of T is unscalable. We propose the first known efficient scalable parallel algorithm which uses a two dimensional block cyclic distribution of T. The algorithm is shown to be applicable to dense as well as sparse triangular solvers. Since most of the known highly scalable algorithms employed in the factorization phase yield a two dimensional distribution of T, our algorithm avoids the redistribution cost incurred by the one dimensional algorithms. We present the parallel runtime and scalability analyses of the proposed two dimensional algorithm. The dense triangular solver is shown to be scalable. The sparse triangular solver is shown to be at least as scalable as the dense solver. We also show that it is optimal for one class of sparse systems. The experimental results of the sparse triangular solver show that it has good speedup characteristics and yields high performance for a variety of sparse systems.

Journal ArticleDOI
TL;DR: First, the stacked model is decomposed into recursive submodels without destroying its original block pattern and next, how to efficiently solve the sparse linear system in the Newton algorithm.


Journal ArticleDOI
TL;DR: These results illustrate the beneficial effect of nonparametric smoothing and require less stringent conditions and allow more flexible sparseness conditions than maximum penalized likelihood cell probability estimators.

Journal ArticleDOI
01 Aug 1997
TL;DR: A new scheme called parallel contracted ordering is described which is a combination of a new parallel nested dissection heuristic and any serial ordering method, based on parallel graph contraction.
Abstract: Computing a fill-reducing ordering of a sparse matrix is a central problem in the solution of sparse linear systems using direct methods. In recent years, there has been significant research in developing a sparse direct solver suitable for message-passing multiprocessors. However, computing the ordering step in parallel remains a challenge and there are very few methods available. This paper describes a new scheme called parallel contracted ordering which is a combination of a new parallel nested dissection heuristic and any serial ordering method. The new nested dissection heuristic called Shrink-Split ND (SSND) is based on parallel graph contraction. For a system with N unknowns, the complexity of SSND is O (( N / P )log P ) using P processors in a hypercube; the overall complexity is O ( N / P )log N ) when the serial ordering method chosen is graph exploration P based nested dissection. We provide extensive empirical results on the quality of the ordering. We also report on the parallel performance of a preliminary implementation on three different message passing multiprocessors.

Journal ArticleDOI
TL;DR: This work considers rigorous definitions for dense graphs and sparse matrices, thus quantifying these concepts that have been hitherto used in a qualitative manner.
Abstract: We consider rigorous definitions for dense graphs and sparse matrices, thus quantifying these concepts that have been hitherto used in a qualitative manner. We assign to every graph the compactness...

01 Jan 1997
TL;DR: This thesis proposes several new preconditioning techniques that attempt to extend the range of iterative methods, particularly to solving nonsymmetric and indeenite problems such as those arising from incompressible computational uid dynamics.
Abstract: This is to certify that I have examined this bound copy of a doctoral thesis by EDMOND TEN-FU CHOW and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the nal examining committee have been made. Acknowledgments I wish to thank Yousef Saad, without whom this thesis would not be possible. His guidance was perfectly balanced to let me beneet from both his exceptional perception, and a freedom to discover at my own pace. Wei-Pai Tang introduced me to numerical linear algebra and scientiic computing when I was an undergraduate at the University of Waterloo. I am grateful for his trust in me at that time, and for his continued advice and interest in my work and career. I also appreciate the encouragement and support of Michael Heroux, who gave me three summers' worth of time and resources at Cray Research (SGI). The BPKIT block preconditioning toolkit was motivated by him. My teachers during the weekends and late-night hours were two of my oocemates and friends, Kesheng Wu and Andreas Stathopoulos. Our endless discussions helped reene my understanding of iterative methods. Andrew Chapman and Barry Rackner deserve thanks for providing codes, advice, and support. Sandra Carney \showed me the ropes" when I rst arrived in Minnesota. Shang-Hua Teng set an example for me, both as a friend and as a professional, and Tony Chan kept a watchful eye over me from UCLA. I also wish to thank my readers Daniel Boley and Graham Candler for their time and interest in my work. Above all, I wish to express my gratitude to my friends and family for their love and support. Grateful acknowledgment is made to the publishers who provided permission to excerpt i ii To my parents. iii iv Abstract Preconditioned iterative methods have become standard linear solvers in many applications, but their limited robustness in some cases has hindered the ability to eeciently solve very large problems in some areas. This thesis proposes several new preconditioning techniques that attempt to extend the range of iterative methods, particularly to solving nonsymmetric and indeenite problems such as those arising from incompressible computational uid dynamics. First, we present an iterative technique to compute sparse approximate inverse precondi-tioners. This new technique produces approximate inverses comparable in quality with others in the literature, but at a lower computational cost, and with a simpler method …

Book ChapterDOI
01 Jan 1997
TL;DR: An overview of parallel direct methods for solving sparse systems of linear equations, focusing on symmetric positive definite systems, with main emphasis on parallel implementation of the numerically intensive factorization process.
Abstract: We present an overview of parallel direct methods for solving sparse systems of linear equations, focusing on symmetric positive definite systems. We examine the performance implications of the important differences between dense and sparse systems. Our main emphasis is on parallel implementation of the numerically intensive factorization process, but we also briefly consider other major components of direct methods, including parallel ordering.

Journal ArticleDOI
TL;DR: It is shown that, by using the proposed method, a tangible improvement over prior work can be obtained, particularly for very sparse and skewed matrices, and that I/O overhead for this problem can be efficiently amortized throughI/O latency hiding and overall load-balancing.

Proceedings Article
01 Mar 1997
TL;DR: A compile{time method to select compression and distribution schemes for sparse matrices which are computed using Fortran 90 array intrinsic operations, guided by cost functions of various sparse routines as measured from the target machine.
Abstract: We present a compile{time method to select compression and distribution schemes for sparse matrices which are computed using Fortran 90 array intrinsic operations. The selection process samples input sparse matrices to determine their sparsity structures. It is also guided by cost functions of various sparse routines as measured from the target machine. The Fortran 90 array expression is then transformed into a sparse array expression that calls the selected compression and distribution routines.