scispace - formally typeset
Search or ask a question

Showing papers on "Sparse matrix published in 2001"


Journal ArticleDOI
TL;DR: The main features and the tuning of the algorithms for the direct solution of sparse linear systems on distributed memory computers developed in the context of a long term European research project are analyzed and discussed.
Abstract: In this paper, we analyze the main features and discuss the tuning of the algorithms for the direct solution of sparse linear systems on distributed memory computers developed in the context of a long term European research project. The algorithms use a multifrontal approach and are especially designed to cover a large class of problems. The problems can be symmetric positive definite, general symmetric, or unsymmetric matrices, both possibly rank deficient, and they can be provided by the user in several formats. The algorithms achieve high performance by exploiting parallelism coming from the sparsity in the problem and that available for dense matrices. The algorithms use a dynamic distributed task scheduling technique to accommodate numerical pivoting and to allow the migration of computational tasks to lightly loaded processors. Large computational tasks are divided into subtasks to enhance parallelism. Asynchronous communication is used throughout the solution process to efficiently overlap communication with computation. We illustrate our design choices by experimental results obtained on an SGI Origin 2000 and an IBM SP2 for test matrices provided by industrial partners in the PARASOL project.

2,066 citations


Journal ArticleDOI
TL;DR: This work surveys the quadratic eigenvalue problem, treating its many applications, its mathematical properties, and a variety of numerical solution techniques.
Abstract: We survey the quadratic eigenvalue problem, treating its many applications, its mathematical properties, and a variety of numerical solution techniques. Emphasis is given to exploiting both the structure of the matrices in the problem (dense, sparse, real, complex, Hermitian, skew-Hermitian) and the spectral properties of the problem. We classify numerical methods and catalogue available software.

1,369 citations


01 Jul 2001
TL;DR: It can be proved that, given an optimal decoder, Gallager's low density parity check codes asymptotically approach the Shannon limit.
Abstract: We report theoretical and empirical properties of Gallager's (1963) low density parity check codes on Gaussian channels. It can be proved that, given an optimal decoder, these codes asymptotically approach the Shannon limit. With a practical 'belief propagation' decoder, performance substantially better than that of standard convolutional and concatenated codes can be achieved; indeed the performance is almost as close to the Shannon limit as that of turbo codes.

1,339 citations


Journal ArticleDOI
TL;DR: A new procedure for allocating transmission losses to generators and loads in the context of pools operated under a single marginal price derived from a merit-order approach based on the network Z-bus matrix, although all required computations exploit the sparse Y- bus matrix.
Abstract: This paper presents a new procedure for allocating transmission losses to generators and loads in the context of pools operated under a single marginal price derived from a merit-order approach. The procedure is based on the network Z-bus matrix, although all required computations exploit the sparse Y-bus matrix. One innovative feature and advantage of this method is that, unlike other proposed approaches, it exploits the full set of network equations and does not require any simplifying assumptions. The method is based on a solved load flow and is easily understood and implemented. The loss allocation process emphasizes current rather than power injections, an approach that is intuitively reasonable and leads to a natural separation of the system losses among the network buses. Results illustrate the consistency of the new allocation process with expected results and with the performance of other methods.

416 citations


Book ChapterDOI
28 May 2001
TL;DR: The experience indicates that for matrices arising in scientific simulations, register level optimizations are critical, and this work focuses here on the optimizations and parameter selection techniques used in Sparsity for register-level optimizations.
Abstract: Sparse matrix-vector multiplication is an important computational kernel that tends to perform poorly on modern processors, largely because of its high ratio of memory operations to arithmetic operations. Optimizing this algorithm is difficult, both because of the complexity of memory systems and because the performance is highly dependent on the nonzero structure of the matrix. The Sparsity system is designed to address these problem by allowing users to automatically build sparse matrix kernels that are tuned to their matrices and machines. The most difficult aspect of optimizing these algorithms is selecting among a large set of possible transformations and choosing parameters, such as block size. In this paper we discuss the optimization of two operations: a sparse matrix times a dense vector and a sparse matrix times a set of dense vectors. Our experience indicates that for matrices arising in scientific simulations, register level optimizations are critical, and we focus here on the optimizations and parameter selection techniques used in Sparsity for register-level optimizations. We demonstrate speedups of up to 2× for the single vector case and 5× for the multiple vector case.

164 citations


Journal ArticleDOI
TL;DR: In this article, a fast method for multiple measurement placement for systems that are found to be unobservable is presented, where all observable islands will be merged together and the entire system will be rendered fully observable.
Abstract: This paper presents a fast method for multiple measurement placement for systems that are found to be unobservable. Upon placement of multiple measurements, all observable islands will be merged together and the entire system will be rendered fully observable. The method uses a test matrix whose leading dimension is determined by the rank deficiency of the gain matrix. Therefore, even for very large systems, as long as the number of measurements to be placed is relatively low, the proposed method will maintain its computational advantage. Compared to the existing iterative approaches, this method directly provides the entire set of additional measurements for placement. The method is developed based on the authors' previous work [4] where a direct method for observability analysis was presented.

140 citations


Journal ArticleDOI
TL;DR: It turns out that the new method achieves correctness rates which are competitive to that of the best existing methods, i.e. the amount of data to be classified.
Abstract: (h n −1 n d −1) instead of O(h n −d ) grid points and unknowns are involved. Here d denotes the dimension of the feature space and h n = 2 −n gives the mesh size. To be precise, we suggest to use the sparse grid combination technique [42] where the classification problem is discretized and solved on a certain sequence of conventional grids with uniform mesh sizes in each coordinate direction. The sparse grid solution is then obtained from the solutions on these different grids by linear combination. In contrast to other sparse grid techniques, the combination method is simpler to use and can be parallelized in a natural and straightforward way. We describe the sparse grid combination technique for the classification problem in terms of the regularization network approach. We then give implementational details and discuss the complexity of the algorithm. It turns out that the method scales only linearly with the number of instances, i.e. the amount of data to be classified. Finally we report on the quality of the classifier built by our new method. Here we consider standard test problems from the UCI repository and problems with huge synthetical data sets in up to 9 dimensions. It turns out that our new method achieves correctness rates which are competitive to that of the best existing methods.

135 citations


Journal ArticleDOI
TL;DR: This work presents a new algorithm to compute the Integer Smith normal form of large sparse matrices by reducing the computation of the Smith form to independent, and therefore parallel, computations modulo powers of word-size primes.

108 citations


Journal ArticleDOI
TL;DR: This work designs and test a special type of candidate list strategy and a move mechanism to be embedded in a tabu search procedure for the bandwidth reduction problem and shows that the proposed procedure outperforms the best-known algorithms in terms of solution quality consuming a reasonable computational effort.

102 citations


Proceedings ArticleDOI
23 Apr 2001
TL;DR: A new hypergraph model for the decompo- sition of irregular computational domains by partitioning thene-grain hypergraph into equally weighted vertex parts so that hyperedges are split among as few processors as possible, which minimizes communication volume while maintaining computationalload balance.
Abstract: We propose a new hypergraph model for the decompo- sition of irregular computational domains. This work fo- cuses on the decomposition of sparse matrices for parallel matrix-vector multiplication. However, the proposed model can also be used to decompose computational domains of other parallel reduction problems. We propose a "`ne- grain" hypergraph model for two-dimensional decomposi- tion of sparse matrices. In the proposedne-grain hyper- graph model, vertices represent nonzeros and hyperedges represent sparsity patterns of rows and columns of the ma- trix. By partitioning thene-grain hypergraph into equally weighted vertex parts (processors) so that hyperedges are split among as few processors as possible, the model cor- rectly minimizes communication volume while maintaining computationalload balance. Experimental results on a wide range of realistic sparse matrices conrm the validity of the proposed model, by achieving up to 50 percent better de- compositions than the existing models, in terms of total com- munication volume.

100 citations


Journal ArticleDOI
TL;DR: It is shown that the JDSVD can be seen as an accelerated (inexact) Newton scheme and experimentally compare the method with some other iterative SVD methods.
Abstract: We discuss a new method for the iterative computation of a portion of the singular values and vectors of a large sparse matrix. Similar to the Jacobi--Davidson method for the eigenvalue problem, we compute in each step a correction by (approximately) solving a correction equation. We give a few variants of this Jacobi--Davidson SVD (JDSVD) method with their theoretical properties. It is shown that the JDSVD can be seen as an accelerated (inexact) Newton scheme. We experimentally compare the method with some other iterative SVD methods.

Proceedings ArticleDOI
10 Nov 2001
TL;DR: The proposed method explicitly models the minimization of communication volume while enforcing the upper bound of p + q - 2 on the maximum number of messages handled by a single processor, for a parallel system with P = p × q processors.
Abstract: We propose a new two-phase method for the coarse-grain decomposition of irregular computational domains. This work focuses on the 2D partitioning of sparse matrices for parallel matrix-vector multiplication. However, the proposed model can also be used to decompose computational domains of other parallel reduction problems. This work also introduces the use of multi-constraint hypergraph partitioning, for solving the decomposition problem. The proposed method explicitly models the minimization of communication volume while enforcing the upper bound of p + q --- 2 on the maximum number of messages handled by a single processor, for a parallel system with P = p × q processors. Experimental results on a wide range of realistic sparse matrices confirm the validity of the proposed methods, by achieving up to 25 percent better partitions than the standard graph model, in terms of total communication volume, and 59 percent better partitions in terms of number of messages, on the overall average.

Journal ArticleDOI
TL;DR: In this article, the authors compared the performance of simulation problem analysis and research kernel (SPARK) and the HVACSIM+ programs by means of benchmark testing and showed that the graph-theoretic techniques employed in SPARK offer significant speed advantages over the other methods for significantly reducible problems and that even problem portions with little reduction potential can be solved efficiently.

Journal ArticleDOI
TL;DR: This work considers the problem of computing low-rank approximations of matrices in a factorized form with sparse factors and presents numerical examples arising from some application areas to illustrate the efficiency and accuracy of the proposed algorithms.
Abstract: We consider the problem of computing low-rank approximations of matrices. The novel aspects of our approach are that we require the low-rank approximations to be written in a factorized form with sparse factors, and the degree of sparsity of the factors can be traded off for reduced reconstruction error by certain user-determined parameters. We give a detailed error analysis of our proposed algorithms and compare the computed sparse low-rank approximations with those obtained from singular value decomposition. We present numerical examples arising from some application areas to illustrate the efficiency and accuracy of our algorithms.

Journal ArticleDOI
TL;DR: An ordering algorithm that achieves a tighter coupling of bottom-up and top-down methods for sparse matrices and results show that the orderings obtained are in general better than those obtained by other ordering codes.
Abstract: Most state-of-the-art ordering schemes for sparse matrices are a hybrid of a bottom-up method such as minimum degree and a top-down scheme such as George's nested dissection. In this paper we present an ordering algorithm that achieves a tighter coupling of bottom-up and top-down methods. In our methodology vertex separators are interpreted as the boundaries of the remaining elements in an unfinished bottom-up ordering. As a consequence, we are using bottom-up techniques such as quotient graphs and special node selection strategies for the construction of vertex separators. Once all separators have been found, we are using them as a skeleton for the computation of several bottom-up orderings. Experimental results show that the orderings obtained by our scheme are in general better than those obtained by other ordering codes.

Journal ArticleDOI
TL;DR: In this article, a fast numerical method called the sparse-matrix/canonical grid (SM/CG) method is employed to analyze densely packed microstrip interconnects that involve a large number of unknowns.
Abstract: In this paper, a fast numerical method called the sparse-matrix/canonical-grid (SM/CG) method is employed to analyze densely packed microstrip interconnects that involve a large number of unknowns. The mixed-potential integral equation is solved by using the method of moments in the spatial domain. The closed-form expressions of the spatial Green's functions of microstrip structures are obtained from the combination of the fast Hankel transform and the matrix pencil method. The Rao-Wilton-Glisson triangular basis functions are used to convert the integral equation into a matrix equation. The matrix equation is then solved by using the SM/CG method, in which the far-interaction portion of the matrix-vector multiplication in the iterative solution is performed by the fast Fourier transforms (FFTs). This is achieved by the Taylor series expansions of the spatial Green's functions about the uniformly spaced canonical grid points overlaying the triangular discretization. Numerical examples are presented to illustrate the accuracy and efficiency of the proposed method. The SM/CG method has computational complexity of O(NlogN). Furthermore, being FFT-based facilitates the implementation for parallel computation.

Journal ArticleDOI
TL;DR: A new general algorithm for constructing interpolation weights in algebraic multigrid (AMG) by exploiting a proper extension mapping outside a neighborhood about a fine degree of freedom to be interpolated.
Abstract: We propose a new general algorithm for constructing interpolation weights in algebraic multigrid (AMG). It exploits a proper extension mapping outside a neighborhood about a fine degree of freedom (dof) to be interpolated. The extension mapping provides boundary values (based on the coarse dofs used to perform the interpolation) at the boundary of the neighborhood. The interpolation value is then obtained by matrix dependent harmonic extension of the boundary values into the interior of the neighborhood. We describe the method, present examples of useful extension operators, provide a two-grid analysis of model problems, and, by way of numerical experiments, demonstrate the successful application of the method to discretized elliptic problems.

Journal ArticleDOI
TL;DR: In this article, an incomplete LU decomposition with pivoting is presented that progressively monitors the growth of the inverse factors of L, U and is used as feedback for dropping entries in L and U.

Proceedings ArticleDOI
07 Oct 2001
TL;DR: Non-negative matrix factorization (NMF) is used for dimensionality reduction of the vector space model, where matrices decomposed by NMF only contain non-negative values, the original data are represented by only additive, not subtractive, combinations of the basis vectors.
Abstract: The vector space model (VSM) is a conventional information retrieval model, which represents a document collection by a term-by-document matrix. Since term-by-document matrices are usually high-dimensional and sparse, they are susceptible to noise and are also difficult to capture the underlying semantic structure. Additionally, the storage and processing of such matrices places great demands on computing resources. Dimensionality reduction is a way to overcome these problems. Principal component analysis (PCA) and singular value decomposition (SVD) are popular techniques for dimensionality reduction based on matrix decomposition, however they contain both positive and negative values in the decomposed matrices. In the work described here, we use non-negative matrix factorization (NMF) for dimensionality reduction of the vector space model. Since matrices decomposed by NMF only contain non-negative values, the original data are represented by only additive, not subtractive, combinations of the basis vectors. This characteristic of parts-based representation is appealing because it reflects the intuitive notion of combining parts to form a whole. Also NMF computation is based on the simple iterative algorithm, it is therefore advantageous for applications involving large matrices. Using the MEDLINE collection, we experimentally showed that NMF offers great improvement over the vector space model.

Journal ArticleDOI
TL;DR: The SPAI$(\varepsilon)$ smoother provides a natural procedure for improvement where needed, and is shown to satisfy the smoothing property for symmetric positive definite problems.
Abstract: Sparse approximate inverses are considered as smoothers for multigrid. They are based on the SPAI-Algorithm [M. J. Grote and T. Huckle, SIAM J. Sci. Comput., 18 (1997), pp. 838--853], which constructs a sparse approximate inverse M of a matrix A by minimizing I -MA in the Frobenius norm. This yields a new hierarchy of smoothers: SPAI-0, SPAI-1, SPAI$(\varepsilon)$. Advantages of SPAI smoothers over classical smoothers are inherent parallelism, possible local adaptivity, and improved robustness. The simplest smoother, SPAI-0, is based on a diagonal matrix M. It is shown to satisfy the smoothing property for symmetric positive definite problems. Numerical experiments show that SPAI-0 smoothing is usually preferable to damped Jacobi smoothing. For the SPAI-1 smoother the sparsity pattern of M is that of A; its performance is typically comparable to that of Gauss--Seidel smoothing; however, both the computation and the application of the smoother remain inherently parallel. In more difficult situations, where the simpler SPAI-0 and SPAI-1 smoothers are not adequate, the SPAI$(\varepsilon)$ smoother provides a natural procedure for improvement where needed. Numerical examples illustrate the usefulness of SPAI smoothing.

Journal ArticleDOI
TL;DR: It is shown that the management of inter-node communications and the effective use of on-node cache are helped by organizing the atoms into compact groups by analyzing the problem of parallelizing the multiplication of sparse matrices with the sparsity pattern required by linear-scaling techniques.

Journal ArticleDOI
TL;DR: Two types of local sparse preconditioners are generalized to solve three-dimensional Helmholtz problems iteratively and can ensure a better eigenvalue clustering for the normal equation matrix and thus a faster convergence of CGN.

Journal ArticleDOI
TL;DR: The resulting method of moments (MoM) impedance matrix shows a good conditioning, that allows its safe sparsification and the efficient use of iterative methods for the solution of the linear system.
Abstract: A new approach is presented for the integral equation analysis of multilayer printed circuits, antennas and arrays. It is based on the introduction of new multiresolution vector functions with properties similar to those of the scalar wavelets, and flexible enough to accommodate for general shapes. The resulting method of moments (MoM) impedance matrix shows a good conditioning, that allows its safe sparsification and the efficient use of iterative methods for the solution of the linear system. The results of the analysis of various real-life structures show the advantage of the method.

Journal ArticleDOI
TL;DR: On a range of examples arising from practical applications, the multilevel algorithm is shown to produce orderings that are better than those produced by the Sloan algorithm and are of comparable quality to those obtained using the hybrid Sloan algorithm.
Abstract: A multilevel algorithm for reordering sparse symmetric matrices to reduce the wavefront and profile is described. The algorithm is a combinatorial algorithm that uses a maximal independent vertex set for coarsening the adjacency graph of the matrix and an enhanced version of the Sloan algorithm on the coarsest graph. On a range of examples arising from practical applications, the multilevel algorithm is shown to produce orderings that are better than those produced by the Sloan algorithm and are of comparable quality to those obtained using the hybrid Sloan algorithm. Advantages over the hybrid Sloan algorithm are that the multilevel approach requires no spectral information and less CPU time.

Proceedings ArticleDOI
11 Sep 2001
TL;DR: A more restrictive definition for matrix diagrams is given and an algorithm is presented that builds a canonical matrix diagram representation for an arbitrary non-negative matrix, given encodings for the sets of rows and columns.
Abstract: The solution of a generalized stochastic Petri net (GSPN) is severely restricted by the size of its underlying continuous-time Markov chain. In recent work (G. Ciardo and A.S. Miner, 1999), matrix diagrams built from a Kronecker expression for the transition rate matrix of certain types of GSPNs were shown to allow for more efficient solution; however, the GSPN model requires a special form, so that the transition rate matrix has a Kronecker expression. In this paper, we extend the earlier results to GSPN models with partitioned sets of places. Specifically, we give a more restrictive definition for matrix diagrams and show that the new form is canonical. We then present an algorithm that builds a canonical matrix diagram representation for an arbitrary non-negative matrix, given encodings for the sets of rows and columns. Using this algorithm, a Kronecker expression is not required to construct the matrix diagram. The efficient matrix diagram algorithms for numerical solution presented earlier are still applicable. We apply our technique to several example GSPNs.

Journal ArticleDOI
TL;DR: A multilevel expansion algorithm to overcome the limitation of the surface area of the sparse-matrix canonical grid method for wave scattering from two-dimensional random rough surfaces and the trade-off in computer memory requirements and CPU time is discussed.
Abstract: Wave scattering from two-dimensional (2-D) random rough surfaces up to several thousand square wavelengths has been previously analyzed using the sparse-matrix canonical grid (SMCG) method. The success of the SMCG method highly depends on the roughness of the random surface for a given surface area. We present a multilevel expansion algorithm to overcome this limitation. The proposed algorithm entails the use of a three-dimensional (3-D) canonical grid. This grid is generated by a uniform discretization of the vertical displacement along the height (z-axis) of the rough surface in addition to the uniform sampling of the rough surface along the x-y plane. The Green's function is expanded about the 3-D canonical grid for the far interactions. The trade-off in computer memory requirements and CPU time between the neighborhood distance and the number of discretization levels along the x-axis are discussed for both perfectly electric conducting (PEC) and lossy dielectric random rough surfaces. Ocean surfaces of the Durden-Vesecky (1985) spectrum with various bandlimits are also studied.

Patent
12 Jun 2001
TL;DR: In this paper, the authors propose a method, a computer system, and a program product for retrieving and/or ranking documents in a database by computing the scalar product between the dimension reduced document matrix and a query vector.
Abstract: A method, a computer system, and a program product for retrieving and/or ranking documents in a database. The method comprising steps of, providing a document matrix derived from the documents, the matrix including numerical elements derived from the attributes; providing a covariance matrix derived from the document matrix; executing singular value decomposition of the covariance matrix so as to obtain the following formula: K=V·Σ·V T , wherein K represents the covariance matrix, V represents the matrix consisting of eigenvectors, Σ represents a diagonal matrix, and V T represents a transpose of the matrix V; reducing a dimension of the matrix V using a predetermined number of eigenvectors included in the matrix V, the eigenvectors including an eigenvector corresponding to the largest singular value; reducing a dimension of the document matrix using the dimension reduced matrix V; and retrieving and/or ranking the documents in the database by computing the scalar product between the dimension reduced document matrix and a query vector.

Journal ArticleDOI
TL;DR: It is shown that if an A k1,k2 -stable boundary value method is used for an m-by-m system of ODEs, then the authors' preconditioners are invertible and all the eigenvalues of the preconditionsed systems are 1 except for at most 2m(k 1 + k 2 ) outliers.
Abstract: We consider the solution of ordinary differential equations (ODEs) using boundary value methods. These methods require the solution of one or more unsymmetric, large and sparse linear systems. The GMRES method with the Strang-type block-circulant preconditioner is proposed for solving these linear systems. We show that if an A k1,k2 -stable boundary value method is used for an m-by-m system of ODEs, then our preconditioners are invertible and all the eigenvalues of the preconditioned systems are 1 except for at most 2m(k 1 + k 2 ) outliers. It follows that when the GMRES method is applied to solving the preconditioned systems, the method will converge in at most 2m(k 1 +k 2 ) + 1 iterations. Numerical results are given to illustrate the effectiveness of our methods.

Journal ArticleDOI
TL;DR: A fast algorithm to compute the R factor of the QR factorization of a block-Hankel matrix H, based on the generalized Schur algorithm, which allows to handle the rank-deficient case.

Journal ArticleDOI
TL;DR: An algorithm that derives fast versions for a broad class of discrete signal transforms symbolically by finding fast sparse matrix factorizations for the matrix representations of these transforms by using the defining matrix as its sole input.
Abstract: This paper presents an algorithm that derives fast versions for a broad class of discrete signal transforms symbolically. The class includes but is not limited to the discrete Fourier and the discrete trigonometric transforms. This is achieved by finding fast sparse matrix factorizations for the matrix representations of these transforms. Unlike previous methods, the algorithm is entirely automatic and uses the defining matrix as its sole input. The sparse matrix factorization algorithm consists of two steps: first, the "symmetry" of the matrix is computed in the form of a pair of group representations; second, the representations are stepwise decomposed, giving rise to a sparse factorization of the original transform matrix. We have successfully demonstrated the method by computing automatically efficient transforms in several important cases: for the DFT, we obtain the Cooley-Tukey (1965) FFT; for a class of transforms including the DCT, type II, the number of arithmetic operations for our fast transforms is the same as for the best-known algorithms. Our approach provides new insights and interpretations for the structure of these signal transforms and the question of why fast algorithms exist. The sparse matrix factorization algorithm is implemented within the software package AREP.