scispace - formally typeset
Search or ask a question

Showing papers on "Sparse matrix published in 2006"


Book
15 Sep 2006
TL;DR: This paper presents a meta-modelling framework for solving sparse linear systems using cholesky factorization and CSparse, and some examples show how this framework can be modified to handle sparse matrices.
Abstract: Preface 1. Introduction 2. Basic algorithms 3. Solving triangular systems 4. Cholesky factorization 5. Orthogonal methods 6. LU factorization 7. Fill-reducing orderings 8. Solving sparse linear systems 9. CSparse 10. Sparse matrices in MATLAB Appendix: Basics of the C programming language Bibliography Index.

1,366 citations


Journal ArticleDOI
TL;DR: It is shown that tapering the correct covariance matrix with an appropriate compactly supported positive definite function reduces the computational burden significantly and still leads to an asymptotically optimal mean squared error.
Abstract: Interpolation of a spatially correlated random process is used in many scientific areas. The best unbiased linear predictor, often called a kriging predictor in geostatistical science, requires the solution of a (possibly large) linear system based on the covariance matrix of the observations. In this article, we show that tapering the correct covariance matrix with an appropriate compactly supported positive definite function reduces the computational burden significantly and still leads to an asymptotically optimal mean squared error. The effect of tapering is to create a sparse approximate linear system that can then be solved using sparse matrix algorithms. Monte Carlo simulations support the theoretical results. An application to a large climatological precipitation dataset is presented as a concrete and practical illustration.

757 citations


01 Jan 2006
TL;DR: Three new variations of a direct factorization scheme to tackle the is- sue of indeniteness in sparse symmetric linear systems, including a reordering that is based on a symmetric weighted matching of the matrix, which is effective for highly indenite symmetric systems.
Abstract: This paper discusses new pivoting factorization methods for solving sparse symmetric indenite sys- tems. As opposed to many existing pivoting methods, our SupernodenBunchnKaufman (SBK) pivoting method dy- namically selects and pivots and may be supplemented by pivot perturbation techniques. We demonstrate the effectiveness and the numerical accuracy of this algorithm and also show that a high performance implementa- tion is feasible. We will also show that symmetric maximum-weighted matching strategies add an additional level of reliability to SBK. These techniques can be seen as a complement to the alternative idea of using more complete pivoting techniques during the numerical factorization. Numerical experiments validate these conclusions. where is a diagonal matrix with and pivot blocks, is a sparse lower triangu- lar matrix, and is a symmetric indenite diagonal matrix that reects small half-machine precision perturbations, which might be necessary to tackle the problem of tiny pivots. is a reordering that is based on a symmetric weighted matching of the matrix , and tries to move the largest off-diagonal elements directly alongside the diagonal in order to form good initial or diagonal block pivots. is a ll reducing reordering which honors the structure of . We will present three new variations of a direct factorization scheme to tackle the is- sue of indeniteness in sparse symmetric linear systems. These methods restrict the pivoting search, to stay as long as possible within predened data structures for efcient Level-3 BLAS factorization and parallelization. On the other hand, the imposed pivoting restrictions can be reduced in several steps by taking the matching permutation into account. The rst al- gorithm uses SupernodenBunchnKaufman (SBK) pivoting and dynamically selects and pivots. It is supplemented by pivot perturbation techniques. It uses no more storage than a sparse Cholesky factorization of a positive denite matrix with the same sparsity structure due to restricting the pivoting to interchanges within the diagonal block associated to a single supernode. The coefcient matrix is perturbed whenever numerically acceptable and pivots cannot be found within the diagonal block. One or two steps of iterative re- nement may be required to correct the effect of the perturbations. We will demonstrate that this restricting notion of pivoting with iterative renement is effective for highly indenite symmetric systems. Furthermore the accuracy of this method is for a large set of matrices from different applications areas as accurate as a direct factorization method that uses com- plete sparse pivoting techniques. In addition, we will discuss two preprocessing algorithms to identify large entries in the coefcient matrix that, if permuted close to the diagonal, permit

474 citations


Journal ArticleDOI
TL;DR: Comparisons to previously published methods show that the new nsNMF method has some advantages in keeping faithfulness to the data in the achieving a high degree of sparseness for both the estimated basis and the encoding vectors and in better interpretability of the factors.
Abstract: We propose a novel nonnegative matrix factorization model that aims at finding localized, part-based, representations of nonnegative multivariate data items. Unlike the classical nonnegative matrix factorization (NMF) technique, this new model, denoted "nonsmooth nonnegative matrix factorization" (nsNMF), corresponds to the optimization of an unambiguous cost function designed to explicitly represent sparseness, in the form of nonsmoothness, which is controlled by a single parameter. In general, this method produces a set of basis and encoding vectors that are not only capable of representing the original data, but they also extract highly focalized patterns, which generally lend themselves to improved interpretability. The properties of this new method are illustrated with several data sets. Comparisons to previously published methods show that the new nsNMF method has some advantages in keeping faithfulness to the data in the achieving a high degree of sparseness for both the estimated basis and the encoding vectors and in better interpretability of the factors.

405 citations


Journal ArticleDOI
TL;DR: An algorithm for estimating the mixing matrix that can be viewed as an extension of the DUET and the TIFROM methods is first developed and a necessary and sufficient condition for recoverability of a source vector is obtained.
Abstract: This paper discusses underdetermined (i.e., with more sources than sensors) blind source separation (BSS) using a two-stage sparse representation approach. The first challenging task of this approach is to estimate precisely the unknown mixing matrix. In this paper, an algorithm for estimating the mixing matrix that can be viewed as an extension of the DUET and the TIFROM methods is first developed. Standard clustering algorithms (e.g., K-means method) also can be used for estimating the mixing matrix if the sources are sufficiently sparse. Compared with the DUET, the TIFROM methods, and standard clustering algorithms, with the authors' proposed method, a broader class of problems can be solved, because the required key condition on sparsity of the sources can be considerably relaxed. The second task of the two-stage approach is to estimate the source matrix using a standard linear programming algorithm. Another main contribution of the work described in this paper is the development of a recoverability analysis. After extending the results in , a necessary and sufficient condition for recoverability of a source vector is obtained. Based on this condition and various types of source sparsity, several probability inequalities and probability estimates for the recoverability issue are established. Finally, simulation results that illustrate the effectiveness of the theoretical results are presented.

337 citations


Journal ArticleDOI
TL;DR: The novel insight that the simultaneous localization and mapping (SLAM) information matrix is exactly sparse in a delayed-state framework is reported, which means it can produce equivalent results to the full-covariance solution.
Abstract: This paper reports the novel insight that the simultaneous localization and mapping (SLAM) information matrix is exactly sparse in a delayed-state framework. Such a framework is used in view-based representations of the environment that rely upon scan-matching raw sensor data to obtain virtual observations of robot motion with respect to a place it has previously been. The exact sparseness of the delayed-state information matrix is in contrast to other recent feature-based SLAM information algorithms, such as sparse extended information filter or thin junction-tree filter, since these methods have to make approximations in order to force the feature-based SLAM information matrix to be sparse. The benefit of the exact sparsity of the delayed-state framework is that it allows one to take advantage of the information space parameterization without incurring any sparse approximation error. Therefore, it can produce equivalent results to the full-covariance solution. The approach is validated experimentally using monocular imagery for two datasets: a test-tank experiment with ground truth, and a remotely operated vehicle survey of the RMS Titanic

320 citations


Proceedings ArticleDOI
25 Apr 2006
TL;DR: This work presents a parallel software package for hypergraph (and sparse matrix) partitioning developed at Sandia National Labs, and presents empirical results that show the parallel implementation achieves good speedup on several large problems.
Abstract: Graph partitioning is often used for load balancing in parallel computing, but it is known that hypergraph partitioning has several advantages. First, hypergraphs more accurately model communication volume, and second, they are more expressive and can better represent nonsymmetric problems. Hypergraph partitioning is particularly suited to parallel sparse matrix-vector multiplication, a common kernel in scientific computing. We present a parallel software package for hypergraph (and sparse matrix) partitioning developed at Sandia National Labs. The algorithm is a variation on multilevel partitioning. Our parallel implementation is novel in that it uses a two-dimensional data distribution among processors. We present empirical results that show our parallel implementation achieves good speedup on several large problems (up to 33 million nonzeros) with up to 64 processors on a Linux cluster.

264 citations


01 Jan 2006
TL;DR: This document describes one specific algebraic multigrid approach: smoothed aggregation, a multilevel and domain decomposition method for symmetric and nonsymmetric systems of equations (like elliptic equations, or compressible and incompressible fluid dynamics problems).
Abstract: ML is a multigrid preconditioning package intended to solve linear systems of equations Ax = b where A is a user supplied nxn sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. ML should be used on large sparse linear systems arising from partial differential equation (PDE) discretizations. While technically any linear system can be considered, ML should be used on linear systems that correspond to things that work well with multigrid methods (e.g. elliptic PDEs). ML can be used as a stand-alone package or to generate preconditioners for a traditional iterative solver package (e.g. Krylov methods). We have supplied support for working with the Aztec 2.1 and AztecOO iterative packages [?]. However, other solvers can be used by supplying a few functions. This document describes one specific algebraic multigrid approach: smoothed aggregation. This approach is used within several specialized multigrid methods: one for the eddy current formulation for Maxwell?s equations, and a multilevel and domain decomposition method for symmetric and nonsymmetric systems of equations (like elliptic equations, or compressible and incompressible fluid dynamics problems). Other methods exist within ML but are not described in this document. Examples are given illustrating the problem definition and exercising multigrid options.

216 citations


Proceedings ArticleDOI
08 Jul 2006
TL;DR: The covariance matrix adaptation (CMA) with rank-one update is introduced into the (1+1)-evolution strategy and an incremental Cholesky update for the covariance Matrix is developed replacing the computational demanding and numerically involved decomposition of the covariances matrix.
Abstract: First, the covariance matrix adaptation (CMA) with rank-one update is introduced into the (1+1)-evolution strategy. An improved implementation of the 1/5-th success rule is proposed for step size adaptation, which replaces cumulative path length control. Second, an incremental Cholesky update for the covariance matrix is developed replacing the computational demanding and numerically involved decomposition of the covariance matrix. The Cholesky update can replace the decomposition only for the update without evolution path and reduces the computational effort from O(n3) to O(n2). The resulting (1+1)-Cholesky-CMA-ES is an elegant algorithm and the perhaps simplest evolution strategy with covariance matrix and step size adaptation. Simulations compare the introduced algorithms to previously published CMA versions.

185 citations


Proceedings ArticleDOI
22 Mar 2006
TL;DR: The results prove that there exists a single O(klogn)timesn measurement matrix such that any such signal can be reconstructed from these measurements, with error at most O(1) times the worst case error for the class of such signals.
Abstract: In sparse approximation theory, the fundamental problem is to reconstruct a signal AisinRn from linear measurements (A,psii) with respect to a dictionary of psii's. Recently, there is focus on the novel direction of Compressed Sensing where the reconstruction can be done with very few-O(klogn)-linear measurements over a modified dictionary if the signal is compressible, that is, its information is concentrated in k coefficients with the original dictionary. In particular, the results prove that there exists a single O(klogn)timesn measurement matrix such that any such signal can be reconstructed from these measurements, with error at most O(1) times the worst case error for the class of such signals. Compressed sensing has generated tremendous excitement both because of the sophisticated underlying mathematics and because of its potential applications. In this paper, we address outstanding open problems in Compressed Sensing. Our main result is an explicit construction of a non-adaptive measurement matrix and the corresponding reconstruction algorithm so that with a number of measurements polynomial in k, logn, 1/epsiv, we can reconstruct compressible signals. This is the first known polynomial time explicit construction of any such measurement matrix. In addition, our result improves the error guarantee from O(1) to 1+epsiv and improves the reconstruction time from poly(n) to poly (klogn). Our second result is a randomized construction of O(kpolylog(n)) measurements that work for each signal with high probability and gives per-instance approximation guarantees rather than over the class of all signals. Previous work on compressed sensing does not provide such per-instance approximation guarantees; our result improves the best known number of measurements known from prior work in other areas including learning theory, streaming algorithms and complexity theory for this case. Our approach is combinatorial. In particular, we use two parallel sets of group tests, one to filter and the other to certify and estimate; the resulting algorithms are quite simple to implement.

179 citations


Journal ArticleDOI
TL;DR: A modified real genetic algorithm for the synthesis of sparse linear arrays to optimize the element positions to reduce the peak sidelobe level (PSLL) of the array and the simulated results confirming the great efficiency and the robustness of this algorithm are provided.
Abstract: This paper describes a modified real genetic algorithm (MGA) for the synthesis of sparse linear arrays. The MGA has been utilized to optimize the element positions to reduce the peak sidelobe level (PSLL) of the array. And here the multiple optimization constraints include the number of elements, the aperture and the minimum element spacing. Unlike standard GA using fixed corresponding relationship between the gene variables and their coding, the MGA utilized the coding resetting of gene variables to avoid infeasible solution during the optimization process. Also, the proposed approach has reduced the size of the searching area of the GA by means of indirect description of individual. The simulated results confirming the great efficiency and the robustness of this algorithm are provided in this paper.

Journal ArticleDOI
TL;DR: A fast direct solver for certain classes of dense structured linear systems that works by first converting the given dense system to a larger system of block sparse equations and then uses standard sparse direct solvers.
Abstract: In this paper we present a fast direct solver for certain classes of dense structured linear systems that works by first converting the given dense system to a larger system of block sparse equations and then uses standard sparse direct solvers. The kind of matrix structures that we consider are induced by numerical low rank in the off-diagonal blocks of the matrix and are related to the structures exploited by the fast multipole method (FMM) of Greengard and Rokhlin. The special structure that we exploit in this paper is captured by what we term the hierarchically semiseparable (HSS) representation of a matrix. Numerical experiments indicate that the method is probably backward stable.

01 Jan 2006
TL;DR: In this paper, the authors describe a specic algebraic multigrid approach, smoothed aggregation, for solving linear systems of equations, which can be used as a stand-alone package or to generate preconditioners for a traditional iterative solver package (e.g. Krylov methods).
Abstract: ML is a multigrid preconditioning package intended to solve linear systems of equations Ax = b where A is a user supplied n n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. ML should be used on large sparse linear systems arising from partial dierential equation (PDE) discretizations. While technically any linear system can be considered, ML should be used on linear systems that correspond to things that work well with multigrid methods (e.g. elliptic PDEs). ML can be used as a stand-alone package or to generate preconditioners for a traditional iterative solver package (e.g. Krylov methods). We have supplied support for working with the Aztec 2.1 and AztecOO iterative packages [20]. However, other solvers can be used by supplying a few functions. This document describes one specic algebraic multigrid approach: smoothed aggregation. This approach is used within several specialized multigrid methods: one for the eddy current formulation for Maxwell’s equations, and a multilevel and domain decomposition method for symmetric and nonsymmetric systems of equations (like elliptic equations, or compressible and incompressible

Journal ArticleDOI
TL;DR: Three algorithms for the model reduction of large-scale, continuous-time, time-invariant, linear, dynamical systems with a sparse or structured transition matrix and a small number of inputs and outputs are described.

Book ChapterDOI
18 Jun 2006
TL;DR: This work presents the "Iterative Solver Template Library" (ISTL), which applies generic programming in C++ to the domain of iterative solvers of linear systems stemming from finite element discretizations, and presents efficient solvers that use the recursive block structure via template metaprogramming.
Abstract: The numerical solution of partial differential equations frequently requires the solution of large and sparse linear systems. Using generic programming techniques in C++ one can create solver libraries that allow efficient realization of "fine grained interfaces", i.e. with functions consisting only of a few lines, like access to individual matrix entries. This prevents code replication and allows programmers to work more efficiently. We present the "Iterative Solver Template Library" (ISTL) which is part of the "Distributed and Unified Numerics Environment" (DUNE). It applies generic programming in C++ to the domain of iterative solvers of linear systems stemming from finite element discretizations. Those discretizations exhibit a lot of structure. Our matrix and vector interface supports a block recursive structure. Each sparse matrix entry can itself be either a sparse or a small dense matrix. Based on this interface we present efficient solvers that use the recursive block structure via template metaprogramming.

Journal ArticleDOI
TL;DR: In this paper, a factorization-splitting scheme using two substeps was proposed to decompose the generalized Crank-Nicolson (CN) matrix into two simple matrices with the terms not factored confined to one sub-step.
Abstract: When a finite-difference time-domain (FDTD) method is constructed by applying the Crank-Nicolson (CN) scheme to discretize Maxwell's equations, a huge sparse irreducible matrix results, which cannot be solved efficiently. This paper proposes a factorization-splitting scheme using two substeps to decompose the generalized CN matrix into two simple matrices with the terms not factored confined to one sub-step. Two unconditionally stable methods are developed: one has the same numerical dispersion relation as the alternating-direction implicit FDTD method, and the other has a much more isotropic numerical velocity. The limit on the time-step size to avoid numerical attenuation is investigated, and is shown to be below the Nyquist sampling rate. The intrinsic temporal numerical dispersion is discussed, which is the fundamental accuracy limit of the methods.

Proceedings ArticleDOI
28 Jun 2006
TL;DR: This paper presents an approach to improving the performance of matrix-vector product based on lossless compression of the index information commonly stored in sparse matrix representations, and two compressed formats are given, along with experimental results demonstrating their effectiveness.
Abstract: Sparse matrix computations are important for many scientific computations, with matrix-vector multiplication being a fundamental operation for modern iterative algorithms. For large sparse matrices, the primary performance limitation on matrix-vector product is memory bandwidth, rather than algorithm performance. In fact, the wide disparity between memory bandwidth and CPU performance suggests that one could trade cycles for bandwidth and still improve the time to compute a matrix-vector product. Accordingly, this paper presents an approach to improving the performance of matrix-vector product based on lossless compression of the index information commonly stored in sparse matrix representations. Two compressed formats, and their multiplication algorithms, are given, along with experimental results demonstrating their effectiveness. For an assortment of large sparse matrices, compression ratios and corresponding speedups of up to 30% are achieved. The efficiency of the compression algorithm allows its cost to be easily amortized across repeated matrix-vector products.

Book ChapterDOI
28 Aug 2006
TL;DR: A simple random-sampling based procedure for producing sparse matrix approximations that computes the sparse matrix approximation in a single pass over the data, leading to much savings in space.
Abstract: We describe a simple random-sampling based procedure for producing sparse matrix approximations. Our procedure and analysis are extremely simple: the analysis uses nothing more than the Chernoff-Hoeffding bounds. Despite the simplicity, the approximation is comparable and sometimes better than previous work. Our algorithm computes the sparse matrix approximation in a single pass over the data. Further, most of the entries in the output matrix are quantized, and can be succinctly represented by a bit vector, thus leading to much savings in space.

Posted Content
TL;DR: This work develops a new collaborative filtering method that combines both previously known users' preferences, as well as product/user attributes, i.e. standard CF, to predict a given user's interest in a particular product.
Abstract: We develop a new collaborative filtering (CF) method that combines both previously known users' preferences, ie standard CF, as well as product/user attributes, ie classical function approximation, to predict a given user's interest in a particular product Our method is a generalized low rank matrix completion problem, where we learn a function whose inputs are pairs of vectors -- the standard low rank matrix completion problem being a special case where the inputs to the function are the row and column indices of the matrix We solve this generalized matrix completion problem using tensor product kernels for which we also formally generalize standard kernel properties Benchmark experiments on movie ratings show the advantages of our generalized matrix completion method over the standard matrix completion one with no information about movies or people, as well as over standard multi-task or single task learning methods

Proceedings ArticleDOI
01 Sep 2006
TL;DR: This work presents an extension to NMF that is convolutive and includes a sparseness constraint, and in combination with a spectral magnitude transform, this method discovers auditory objects and their associated sparse activation patterns.
Abstract: Discovering a representation which allows auditory data to be parsimoniously represented is useful for many machine learning and signal processing tasks Such a representation can be constructed by non-negative matrix factorisation (NMF), a method for finding parts-based representations of non-negative data We present an extension to NMF that is convolutive and includes a sparseness constraint In combination with a spectral magnitude transform, this method discovers auditory objects and their associated sparse activation patterns

Journal ArticleDOI
TL;DR: The Mad package described here facilitates the evaluation of first derivatives of multidimensional functions that are defined by computer codes written in MATLAB through the separation of the linear combination of derivative vectors into a separate derivative vector class derivvec.
Abstract: The Mad package described here facilitates the evaluation of first derivatives of multidimensional functions that are defined by computer codes written in MATLAB. The underlying algorithm is the well-known forward mode of automatic differentiation implemented via operator overloading on variables of the class fmad. The main distinguishing feature of this MATLAB implementation is the separation of the linear combination of derivative vectors into a separate derivative vector class derivvec. This allows for the straightforward performance optimization of the overall package. Additionally, by internally using a matrix (two-dimensional) representation of arbitrary dimension directional derivatives, we may utilize MATLAB's sparse matrix class to propagate sparse directional derivatives for MATLAB code which uses arbitrary dimension arrays. On several examples, the package is shown to be more efficient than Verma's ADMAT package [Verma 1998a].

Journal ArticleDOI
TL;DR: Analysis using EXIT charts shows that the TDMP algorithm offers a better performance-complexity tradeoff when the number of decoding iterations is small, which is attractive for high-speed applications.
Abstract: A turbo-decoding message-passing (TDMP) algorithm for sparse parity-check matrix (SPCM) codes such as low-density parity-check, repeat-accumulate, and turbo-like codes is presented. The main advantages of the proposed algorithm over the standard decoding algorithm are 1) its faster convergence speed by a factor of two in terms of decoding iterations, 2) improvement in coding gain by an order of magnitude at high signal-to-noise ratio (SNR), 3) reduced memory requirements, and 4) reduced decoder complexity. In addition, an efficient algorithm for message computation using simple "max" operations is also presented. Analysis using EXIT charts shows that the TDMP algorithm offers a better performance-complexity tradeoff when the number of decoding iterations is small, which is attractive for high-speed applications. A parallel version of the TDMP algorithm in conjunction with architecture-aware (AA) SPCM codes, which have embedded structure that enables efficient high-throughput decoder implementation, are presented. Design examples of AA-SPCM codes based on graphs with large girth demonstrate that AA-SPCM codes have very good error-correcting capability using the TDMP algorithm

Proceedings ArticleDOI
03 Apr 2006
TL;DR: This paper argues that the proper way to handle sparse data is not to use a vertical schema, but rather to extend the RDBMS tuple storage format to allow the representation of sparse attributes as interpreted fields, and shows that the interpreted storage approach dominates in query efficiency and ease-of-use over the current horizontal storage and vertical schema approaches over a wide range of queries.
Abstract: "Sparse" data, in which relations have many attributes that are null for most tuples, presents a challenge for relational database management systems. If one uses the normal "horizontal" schema to store such data sets in any of the three leading commercial RDBMS, the result is tables that occupy vast amounts of storage, most of which is devoted to nulls. If one attempts to avoid this storage blowup by using a "vertical" schema, the storage utilization is indeed better, but query performance is orders of magnitude slower for certain classes of queries. In this paper, we argue that the proper way to handle sparse data is not to use a vertical schema, but rather to extend the RDBMS tuple storage format to allow the representation of sparse attributes as interpreted fields. The addition of interpreted storage allows for efficient and transparent querying of sparse data, uniform access to all attributes, and schema scalability. We show, through an implementation in PostgreSQL, that the interpreted storage approach dominates in query efficiency and ease-of-use over the current horizontal storage and vertical schema approaches over a wide range of queries and sparse data sets.

Proceedings ArticleDOI
01 Oct 2006
TL;DR: The mean squared error of estimating each symbol of the input vector using BP is proved to be equal to the MMSE of estimating the same symbol through a scalar Gaussian channel with some degradation in the signal-to-noise ratio (SNR).
Abstract: This paper studies the estimation of a high-dimensional vector signal where the observation is a known "sparse" linear transformation of the signal corrupted by additive Gaussian noise. A paradigm of such a linear system is code-division multiple access (CDMA) channel with sparse spreading matrix. Assuming a "semi-regular" ensemble of sparse matrix linear transformations, where the bi-partite graph describing the system is asymptotically cycle-free, it is shown that belief propagation (BP) achieves the minimum mean-square error (MMSE) in estimating the transformation of the input vector in the large-system limit. The result holds regardless of the the distribution and power of the input symbols. Furthermore, the mean squared error of estimating each symbol of the input vector using BP is proved to be equal to the MMSE of estimating the same symbol through a scalar Gaussian channel with some degradation in the signal-to-noise ratio (SNR). The degradation, called the efficiency, is determined from a fixed-point equation due to Guo and Verdu, which is a generalization of Tanaka's formula to arbitrary prior distributions.

Book ChapterDOI
18 Jun 2006
TL;DR: It is argued that many of the tools of high-performance numerical computing can form the nucleus of a robust infrastructure for parallel computing on graphs, and this with an implementation of a graph analysis benchmark using the sparse matrix infrastructure in Star-P, the authors' parallel dialect of the MATLAB programming language.
Abstract: Large-scale computation on graphs and other discrete structures is becoming increasingly important in many applications, including computational biology, web search, and knowledge discovery. High-performance combinatorial computing is an infant field, in sharp contrast with numerical scientific computing. We argue that many of the tools of high-performance numerical computing - in particular, parallel algorithms and data structures for computation with sparse matrices - can form the nucleus of a robust infrastructure for parallel computing on graphs. We demonstrate this with an implementation of a graph analysis benchmark using the sparse matrix infrastructure in Star-P, our parallel dialect of the MATLAB programming language.

Proceedings ArticleDOI
24 Apr 2006
TL;DR: This work introduces a concurrent system architecture for sparse graph algorithms that places graph nodes in small distributed memories paired with specialized graph processing nodes interconnected by a lightweight network.
Abstract: Many important applications are organized around long-lived, irregular sparse graphs (e.g., data and knowledge bases, CAD optimization, numerical problems, simulations). The graph structures are large, and the applications need regular access to a large, data-dependent portion of the graph for each operation (e.g., the algorithm may need to walk the graph, visiting all nodes, or propagate changes through many nodes in the graph). On conventional microprocessors, the graph structures exceed on-chip cache capacities, making main-memory bandwidth and latency the key performance limiters. To avoid this "memory wall," we introduce a concurrent system architecture for sparse graph algorithms that places graph nodes in small distributed memories paired with specialized graph processing nodes interconnected by a lightweight network. This gives us a scalable way to map these applications so that they can exploit the high-bandwidth and low-latency capabilities of embedded memories (e.g., FPGA Block RAMs). On typical spreading-activation queries on the ConceptNet Knowledge Base, a sample application, this translates into an order of magnitude speedup per FPGA compared to a state-of-the-art Pentium processor

Journal ArticleDOI
TL;DR: Simulations of polycrystalline grain growth with a conventional phase field method and with sparse data structures are compared and it is shown that memory usage and simulation time scale are independent of the number of order parameters when a sparse data structure is used.
Abstract: The concepts of sparse data structures and related algorithms for phase field simulations are discussed. Simulations of polycrystalline grain growth with a conventional phase field method and with sparse data structures are compared. It is shown that memory usage and simulation time scale with the number of nodes but are independent of the number of order parameters when a sparse data structure is used.

Journal ArticleDOI
TL;DR: An Arnoldi-based method for solving large and sparse Sylvester matrix equations with low rank right hand sides and how to extract low-rank approximations via a matrix Krylov subspace method is proposed.

Journal ArticleDOI
01 Jan 2006
TL;DR: The problem of reducing the bandwidth of sparse matrices by permuting rows and columns is addressed and solved using a hybrid ant system to generate high-quality renumbering which is refined by a hill climbing local search heuristic.
Abstract: In this work, the problem of reducing the bandwidth of sparse matrices by permuting rows and columns is addressed and solved using a hybrid ant system to generate high-quality renumbering which is refined by a hill climbing local search heuristic. Computational experiments compare the algorithm with the well-known GPS algorithm, as well as recently proposed methods. These show the new approach to be as good as current best algorithms. In addition, an algorithm to randomly generate matrices with known optimal bandwidth is developed and used to evaluate results. Comparisons show that the new algorithm was able to find either the optimal solution or a solution very close to the optimal for most instances.

Journal ArticleDOI
TL;DR: In this paper, the authors consider two finite element model updating problems, which incorporate the measured modal data into the analytical model, producing an adjusted model on the (mass) damping and stiffness, that closely matches the experimental modality data.
Abstract: We consider two finite element model updating problems, which incorporate the measured modal data into the analytical finite element model, producing an adjusted model on the (mass) damping and stiffness, that closely matches the experimental modal data. We develop two efficient numerical algorithms for solving these problems. The new algorithms are direct methods that require O(nk2) and O(nk2 + k6) flops, respectively, and employ sparse matrix techniques when the analytic model is sparse. Here n is the dimension of the coefficient matrices defining the analytical model, and k is the number of measured eigenpairs.