Showing papers on "Sparse matrix published in 2005"

PDF

Open Access

Journal Article•DOI•

SLEPc: A scalable and flexible toolkit for the solution of eigenvalue problems

[...]

Vicente Hernández¹, Jose E. Roman¹, Vicente Vidal¹•Institutions (1)

01 Sep 2005-ACM Transactions on Mathematical Software

TL;DR: The Scalable Library for Eigenvalue Problem Computations (SLEPc) is a software library for computing a few eigenvalues and associated eigenvectors of a large sparse matrix or matrix pencil that has been developed on top of PETSc and enforces the same programming paradigm.

...read moreread less

Abstract: The Scalable Library for Eigenvalue Problem Computations (SLEPc) is a software library for computing a few eigenvalues and associated eigenvectors of a large sparse matrix or matrix pencil. It has been developed on top of PETSc and enforces the same programming paradigm.The emphasis of the software is on methods and techniques appropriate for problems in which the associated matrices are sparse, for example, those arising after the discretization of partial differential equations. Therefore, most of the methods offered by the library are projection methods such as Arnoldi or Lanczos, or other methods with similar properties. SLEPc provides basic methods as well as more sophisticated algorithms. It also provides built-in support for spectral transformations such as the shift-and-invert technique. SLEPc is a general library in the sense that it covers standard and generalized eigenvalue problems, both Hermitian and non-Hermitian, with either real or complex arithmetic.SLEPc can be easily applied to real world problems. To illustrate this, several case studies arising from real applications are presented and solved with SLEPc with little programming effort. The addressed problems include a matrix-free standard problem, a complex generalized problem, and a singular value decomposition. The implemented codes exhibit good properties regarding flexibility as well as parallel performance.

...read moreread less

974 citations

Journal Article•DOI•

OSKI: A Library of Automatically Tuned Sparse Matrix Kernels

[...]

Richard Vuduc¹, James Demmel², Katherine Yelick²•Institutions (2)

Lawrence Livermore National Laboratory¹, University of California, Berkeley²

01 Jan 2005

TL;DR: An overview of OSKI is provided, which is based on research on automatically tuned sparse kernels for modern cache-based superscalar machines, and the primary aim of this interface is to hide the complex decision-making process needed to tune the performance of a kernel implementation for a particular user's sparse matrix and machine.

...read moreread less

Abstract: The Optimized Sparse Kernel Interface (OSKI) is a collection of low-level primitives that provide automatically tuned computational kernels on sparse matrices, for use by solver libraries and applications. These kernels include sparse matrix-vector multiply and sparse triangular solve, among others. The primary aim of this interface is to hide the complex decision-making process needed to tune the performance of a kernel implementation for a particular user's sparse matrix and machine, while also exposing the steps and potentially non-trivial costs of tuning at run-time. This paper provides an overview of OSKI, which is based on our research on automatically tuned sparse kernels for modern cache-based superscalar machines.

...read moreread less

546 citations

Journal Article•DOI•

Designing structured tight frames via an alternating projection method

[...]

Joel A. Tropp¹, Inderjit S. Dhillon¹, Robert W. Heath¹, Thomas Strohmer²•Institutions (2)

University of Texas at Austin¹, University of California, Davis²

01 Jan 2005-IEEE Transactions on Information Theory

TL;DR: This paper proposes an alternating projection method that is versatile enough to solve a huge class of inverse eigenvalue problems (IEPs), which includes the frame design problem, and addresses the most basic design problem: constructing tight frames with prescribed vector norms.

...read moreread less

Abstract: Tight frames, also known as general Welch-bound- equality sequences, generalize orthonormal systems. Numerous applications - including communications, coding, and sparse approximation- require finite-dimensional tight frames that possess additional structural properties. This paper proposes an alternating projection method that is versatile enough to solve a huge class of inverse eigenvalue problems (IEPs), which includes the frame design problem. To apply this method, one needs only to solve a matrix nearness problem that arises naturally from the design specifications. Therefore, it is the fast and easy to develop versions of the algorithm that target new design problems. Alternating projection will often succeed even if algebraic constructions are unavailable. To demonstrate that alternating projection is an effective tool for frame design, the paper studies some important structural properties in detail. First, it addresses the most basic design problem: constructing tight frames with prescribed vector norms. Then, it discusses equiangular tight frames, which are natural dictionaries for sparse approximation. Finally, it examines tight frames whose individual vectors have low peak-to-average-power ratio (PAR), which is a valuable property for code-division multiple-access (CDMA) applications. Numerical experiments show that the proposed algorithm succeeds in each of these three cases. The appendices investigate the convergence properties of the algorithm.

...read moreread less

496 citations

Journal Article•DOI•

An overview of SuperLU: Algorithms, implementation, and user interface

[...]

Xiaoye S. Li

01 Sep 2005-ACM Transactions on Mathematical Software

TL;DR: An overview of the algorithms, design philosophy, and implementation techniques in the software SuperLU, for solving sparse unsymmetric linear systems, and some examples of how the solver has been used in large-scale scientific applications, and the performance.

...read moreread less

Abstract: We give an overview of the algorithms, design philosophy, and implementation techniques in the software SuperLU, for solving sparse unsymmetric linear systems. In particular, we highlight the differences between the sequential SuperLU (including its multithreaded extension) and parallel SuperLU_DIST. These include the numerical pivoting strategy, the ordering strategy for preserving sparsity, the ordering in which the updating tasks are performed, the numerical kernel, and the parallelization strategy. Because of the scalability concern, the parallel code is drastically different from the sequential one. We describe the user interfaces of the libraries, and illustrate how to use the libraries most efficiently depending on some matrix characteristics. Finally, we give some examples of how the solver has been used in large-scale scientific applications, and the performance.

...read moreread less

371 citations

Proceedings Article•DOI•

Unsupervised 3D object recognition and reconstruction in unordered datasets

[...]

Matthew Brown¹, David G. Lowe¹•Institutions (1)

University of British Columbia¹

13 Jun 2005

TL;DR: This paper presents a system for fully automatic recognition and reconstruction of 3D objects in image databases, using invariant local features to find matches between all images, and the RANSAC algorithm to find those that are consistent with the fundamental matrix.

...read moreread less

Abstract: This paper presents a system for fully automatic recognition and reconstruction of 3D objects in image databases. We pose the object recognition problem as one of finding consistent matches between all images, subject to the constraint that the images were taken from a perspective camera. We assume that the objects or scenes are rigid. For each image, we associate a camera matrix, which is parameterised by rotation, translation and focal length. We use invariant local features to find matches between all images, and the RANSAC algorithm to find those that are consistent with the fundamental matrix. Objects are recognised as subsets of matching images. We then solve for the structure and motion of each object, using a sparse bundle adjustment algorithm. Our results demonstrate that it is possible to recognise and reconstruct 3D objects from an unordered image database with no user input at all.

...read moreread less

304 citations

Proceedings Article•DOI•

Simultaneous sparse approximation via greedy pursuit

[...]

Joel A. Tropp¹, Anna C. Gilbert¹, Martin J. Strauss¹•Institutions (1)

University of Michigan¹

18 Mar 2005

TL;DR: A greedy pursuit algorithm called simultaneous orthogonal matching pursuit is presented, which proves that the algorithm calculates simultaneous approximations whose error is within a constant factor of the optimal simultaneous approximation error.

...read moreread less

Abstract: A simple sparse approximation problem requests an approximation of a given input signal as a linear combination of T elementary signals drawn from a large, linearly dependent collection An important generalization is simultaneous sparse approximation Now one must approximate several input signals at once using different linear combinations of the same T elementary signals This formulation appears, for example, when analyzing multiple observations of a sparse signal that have been contaminated with noise A new approach to this problem is presented here: a greedy pursuit algorithm called simultaneous orthogonal matching pursuit The paper proves that the algorithm calculates simultaneous approximations whose error is within a constant factor of the optimal simultaneous approximation error This result requires that the collection of elementary signals be weakly correlated, a property that is also known as incoherence Numerical experiments demonstrate that the algorithm often succeeds, even when the inputs do not meet the hypotheses of the proof

...read moreread less

301 citations

Journal Article•DOI•

What Color Is Your Jacobian? Graph Coloring for Computing Derivatives

[...]

Assefaw H. Gebremedhin, Fredrik Manne, Alex Pothen

01 Apr 2005-Siam Review

TL;DR: A unifying framework for the graph models of the variant matrix estimation problems is presented, based upon the viewpoint that a partition of a matrix into structurally orthogonal groups of columns corresponds to distance-2 coloring an appropriate graph representation.

...read moreread less

Abstract: Graph coloring has been employed since the 1980s to efficiently compute sparse Jacobian and Hessian matrices using either finite differences or automatic differentiation. Several coloring problems occur in this context, depending on whether the matrix is a Jacobian or a Hessian, and on the specifics of the computational techniques employed. We consider eight variant vertex coloring problems here. This article begins with a gentle introduction to the problem of computing a sparse Jacobian, followed by an overview of the historical development of the research area. Then we present a unifying framework for the graph models of the variant matrix estimation problems. The framework is based upon the viewpoint that a partition of a matrix into structurally orthogonal groups of columns corresponds to distance-2 coloring an appropriate graph representation. The unified framework helps integrate earlier work and leads to fresh insights; enables the design of more efficient algorithms for many problems; leads to new algorithms for others; and eases the task of building graph models for new problems. We report computational results on two of the coloring problems to support our claims. Most of the methods for these problems treat a column or a row of a matrix as an atomic entity, and partition the columns or rows (or both). A brief review of methods that do not fit these criteria is provided. We also discuss results in discrete mathematics and theoretical computer science that intersect with the topics considered here.

...read moreread less

291 citations

Journal Article•DOI•

Time reversal imaging of obscured targets from multistatic data

[...]

Anthony J. Devaney¹•Institutions (1)

Northeastern University¹

09 May 2005-IEEE Transactions on Antennas and Propagation

TL;DR: It is shown that the singular vectors of the K matrix together with knowledge of the Green function of the background medium in which the targets are embedded lead directly to classical time-reversal based images of the target locations as well as super-resolution images based on a generalized Multiple-Signal-Classification algorithm recently developed for use with the K Matrix.

...read moreread less

Abstract: The methods employed in time-reversal imaging are applied to radar imaging problems using multistatic data collected from sparse and unstructured phased array antenna systems. The theory is especially suitable to problems involving the detection and tracking (locating) of moving ground targets (MGT) from satellite based phased array antenna systems and locating buried or obscured targets from multistatic data collected from phased array antenna systems mounted on unmanned aerial vehicles (UAV). The theory is based on the singular value decomposition (SVD) of the multistatic data matrix K and applies to general phased array antenna systems whose elements are arbitrarily located in space. It is shown that the singular vectors of the K matrix together with knowledge of the Green function of the background medium in which the targets are embedded lead directly to classical time-reversal based images of the target locations as well as super-resolution images based on a generalized Multiple-Signal-Classification algorithm recently developed for use with the K matrix. The theory is applied in a computer simulation study of the TechSat project whose goal is the location of MGTs from an unstructured and sparse phased array of freely orbiting antennas located above the ionosphere.

...read moreread less

269 citations

Journal Article•DOI•

A Two-Dimensional Data Distribution Method for Parallel Sparse Matrix-Vector Multiplication

[...]

Brendan Vastenhouw¹, Rob H. Bisseling¹•Institutions (1)

Utrecht University¹

01 Jan 2005-Siam Review

TL;DR: Experimental timings of an actual parallel sparse matrix-vector multiplication on an SGI Origin 3800 computer show that a sufficiently large reduction in communication volume leads to savings in execution time.

...read moreread less

Abstract: A new method is presented for distributing data in sparse matrix-vector multiplication. The method is two-dimensional, tries to minimize the true communication volume, and also tries to spread the computation and communication work evenly over the processors. The method starts with a recursive bipartitioning of the sparse matrix, each time splitting a rectangular matrix into two parts with a nearly equal number of nonzeros. The communication volume caused by the split is minimized. After the matrix partitioning, the input and output vectors are partitioned with the objective of minimizing the maximum communication volume per processor. Experimental results of our implementation, Mondriaan, for a set of sparse test matrices show a reduction in communication volume compared to one-dimensional methods, and in general a good balance in the communication work. Experimental timings of an actual parallel sparse matrix-vector multiplication on an SGI Origin 3800 computer show that a sufficiently large reduction in communication volume leads to savings in execution time.

...read moreread less

266 citations

Proceedings Article•DOI•

Sparse Matrix-Vector multiplication on FPGAs

[...]

Ling Zhuo¹, Viktor K. Prasanna¹•Institutions (1)

University of Southern California¹

20 Feb 2005

TL;DR: Besides solving SpMXV problem, the design provides a parameterized and flexible tree-based design for floating-point applications on FPGAs, which demonstrates significant speedup over general-purpose processors particularly for matrices with very irregular sparsity structure.

...read moreread less

Abstract: Floating-point Sparse Matrix-Vector Multiplication (SpMXV) is a key computational kernel in scientific and engineering applications. The poor data locality of sparse matrices significantly reduces the performance of SpMXV on general-purpose processors, which rely heavily on the cache hierarchy to achieve high performance. The abundant hardware resources on current FPGAs provide new opportunities to improve the performance of SpMXV. In this paper, we propose an FPGA-based design for SpMXV. Our design accepts sparse matrices in Compressed Row Storage format, and makes no assumptions about the sparsity structure of the input matrix. The design employs IEEE-754 format double-precision floating-point multipliers/adders, and performs multiple floating-point operations as well as I/O operations in parallel. The performance of our design for SpMXV is evaluated using various sparse matrices from the scientific computing community, with the Xilinx Virtex-II Pro XC2VP70 as the target device. The MFLOPS performance increases with the hardware resources on the device as well as the available memory bandwidth. For example, when the memory bandwidth is 8 GB/s, our design achieves over 350 MFLOPS for all the test matrices. It demonstrates significant speedup over general-purpose processors particularly for matrices with very irregular sparsity structure. Besides solving SpMXV problem, our design provides a parameterized and flexible tree-based design for floating-point applications on FPGAs.

...read moreread less

247 citations

Journal Article•DOI•

Fast protein classification with multiple networks

[...]

Koji Tsuda¹, Hyunjung Shin¹, Bernhard Schölkopf¹•Institutions (1)

Max Planck Society¹

15 Jan 2005

TL;DR: An efficient method of protein classification using multiple protein networks is proposed, and experiments on function prediction of 3588 yeast proteins show promising results: the computation time is enormously reduced, while the accuracy is still comparable to the SDP/SVM method.

...read moreread less

Abstract: Motivation: Support vector machines (SVMs) have been successfully used to classify proteins into functional categories. Recently, to integrate multiple data sources, a semidefinite programming (SDP) based SVM method was introduced. In SDP/SVM, multiple kernel matrices corresponding to each of data sources are combined with weights obtained by solving an SDP. However, when trying to apply SDP/SVM to large problems, the computational cost can become prohibitive, since both converting the data to a kernel matrix for the SVM and solving the SDP are time and memory demanding. Another application-specific drawback arises when some of the data sources are protein networks. A common method of converting the network to a kernel matrix is the diffusion kernel method, which has time complexity of O(n3), and produces a dense matrix of size n × n. Results: We propose an efficient method of protein classification using multiple protein networks. Available protein networks, such as a physical interaction network or a metabolic network, can be directly incorporated. Vectorial data can also be incorporated after conversion into a network by means of neighbor point connection. Similar to the SDP/SVM method, the combination weights are obtained by convex optimization. Due to the sparsity of network edges, the computation time is nearly linear in the number of edges of the combined network. Additionally, the combination weights provide information useful for discarding noisy or irrelevant networks. Experiments on function prediction of 3588 yeast proteins show promising results: the computation time is enormously reduced, while the accuracy is still comparable to the SDP/SVM method. Availability: Software and data will be available on request. Contact: shin@tuebingen.mpg.de

...read moreread less

Journal Article•DOI•

Algorithm 847: Spinterp: piecewise multilinear hierarchical sparse grid interpolation in MATLAB

[...]

Andreas Klimke¹, Barbara Wohlmuth¹•Institutions (1)

University of Stuttgart¹

01 Dec 2005-ACM Transactions on Mathematical Software

TL;DR: This work describes three possible piecewise multilinear hierarchical interpolation schemes in detail, and documents the features of the sparse grid interpolation software package spinterp for MATLAB.

...read moreread less

Abstract: To recover or approximate smooth multivariate functions, sparse grids are superior to full grids due to a significant reduction of the required support nodes. The order of the convergence rate in the maximum norm is preserved up to a logarithmic factor. We describe three possible piecewise multilinear hierarchical interpolation schemes in detail and conduct a numerical comparison. Furthermore, we document the features of our sparse grid interpolation software package spinterp for MATLAB.

...read moreread less

Proceedings Article•DOI•

Sparse image coding using a 3D non-negative tensor factorization

[...]

Tamir Hazan¹, S. Polak¹, Amnon Shashua¹•Institutions (1)

Hebrew University of Jerusalem¹

17 Oct 2005

TL;DR: An algorithm for a non-negative 3D tensor factorization for establishing a local parts feature decomposition from an object class of images shows a superior decomposition to what an NMF can provide on all fronts.

...read moreread less

Abstract: We introduce an algorithm for a non-negative 3D tensor factorization for the purpose of establishing a local parts feature decomposition from an object class of images. In the past, such a decomposition was obtained using non-negative matrix factorization (NMF) where images were vectorized before being factored by NMF. A tensor factorization (NTF) on the other hand preserves the 2D representations of images and provides a unique factorization (unlike NMF which is not unique). The resulting "factors" from the NTF factorization are both sparse (like with NMF) but also separable allowing efficient convolution with the test image. Results show a superior decomposition to what an NMF can provide on all fronts - degree of sparsity, lack of ghost residue due to invariant parts and efficiency of coding of around an order of magnitude better. Experiments on using the local parts decomposition for face detection using SVM and Adaboost classifiers demonstrate that the recovered features are discriminatory and highly effective for classification.

...read moreread less

Proceedings Article•DOI•

Co-clustering by block value decomposition

[...]

Bo Long¹, Zhongfei Zhang¹, Philip S. Yu²•Institutions (2)

Binghamton University¹, IBM²

21 Aug 2005

TL;DR: A new co-clustering framework, block value decomposition(BVD), is presented, which factorizes the dyadic data matrix into three components, the row- coefficient matrix R, the block value matrix B, and the column-coefficient matrix C, which iteratively computes the three decomposition matrices based on the multiplicative updating rules.

...read moreread less

Abstract: Dyadic data matrices, such as co-occurrence matrix, rating matrix, and proximity matrix, arise frequently in various important applications. A fundamental problem in dyadic data analysis is to find the hidden block structure of the data matrix. In this paper, we present a new co-clustering framework, block value decomposition(BVD), for dyadic data, which factorizes the dyadic data matrix into three components, the row-coefficient matrix R, the block value matrix B, and the column-coefficient matrix C. Under this framework, we focus on a special yet very popular case -- non-negative dyadic data, and propose a specific novel co-clustering algorithm that iteratively computes the three decomposition matrices based on the multiplicative updating rules. Extensive experimental evaluations also demonstrate the effectiveness and potential of this framework as well as the specific algorithms for co-clustering, and in particular, for discovering the hidden block structure in the dyadic data.

...read moreread less

Journal Article•DOI•

Combining Fast Multipole Techniques and an Approximate Inverse Preconditioner for Large Electromagnetism Calculations

[...]

Bruno Carpentieri, Iain S. Duff¹, Luc Giraud, Guillaume Sylvand²•Institutions (2)

Rutherford Appleton Laboratory¹, French Institute for Research in Computer Science and Automation²

01 Oct 2005-SIAM Journal on Scientific Computing

TL;DR: This paper designs an efficient parallelizable preconditioner that can be naturally implemented in a parallel code that implements the multipole technique for the matrix-vector product calculation and proposes an embedded iterative scheme that combines nested GMRES solvers with different fast multipole computations.

...read moreread less

Abstract: The boundary element method has become a popular tool for the solution of Maxwell's equations in electromagnetism. From a linear algebra point of view, this leads to the solution of large dense complex linear systems, where the unknowns are associated with the edges of the mesh defined on the surface of the illuminated object. In this paper, we address the iterative solution of these linear systems via preconditioned Krylov solvers. Our primary focus is on the design of an efficient parallelizable preconditioner. In that respect, we consider an approximate inverse method based on the Frobenius-norm minimization. The preconditioner is constructed from a sparse approximation of the dense coefficient matrix, and the patterns both for the preconditioner and for the coefficient matrix are computed a priori using geometric information from the mesh. We describe how such a preconditioner can be naturally implemented in a parallel code that implements the multipole technique for the matrix-vector product calculation. We investigate the numerical scalability of our preconditioner on realistic industrial test problems and show that it exhibits some limitations on very large problems of size close to one million unknowns. To improve its robustness on those large problems we propose an embedded iterative scheme that combines nested GMRES solvers with different fast multipole computations. We show through extensive numerical experiments that this new scheme is extremely robust at affordable memory and CPU costs for the solution of very large and challenging problems.

...read moreread less

Book Chapter•DOI•

Fast sparse matrix-vector multiplication by exploiting variable block structure

[...]

Richard Vuduc¹, Hyun-Jin Moon²•Institutions (2)

Lawrence Livermore National Laboratory¹, University of California, Los Angeles²

21 Sep 2005

TL;DR: This work split the matrix, A, into a sum, A1 + A2 + ... + As, where each term is stored in a new data structure the authors refer to as unaligned block compressed sparse row (UBCSR) format, which improves the performance of sparse matrix-vector multiplication (SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks.

...read moreread less

Abstract: We improve the performance of sparse matrix-vector multiplication(SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this structure. We split the matrix, A, into a sum, A1 + A2 + ... + As, where each term is stored in a new data structure we refer to as unaligned block compressed sparse row (UBCSR) format. A classical approach which stores A in a BCSR can also reduce execution time, but the improvements may be limited because BCSR imposes an alignment of the matrix non-zeros that leads to extra work from filled-in zeros. Combining splitting with UBCSR reduces this extra work while retaining the generally lower memory bandwidth requirements and register-level tiling opportunities of BCSR. We show speedups can be as high as 2.1× over no blocking, and as high as 1.8× over BCSR as used in prior work on a set of application matrices. Even when performance does not improve significantly, split UBCSR usually reduces matrix storage.

...read moreread less

Journal Article•DOI•

Hybrid cross approximation of integral operators

[...]

Steffen Börm¹, Lars Grasedyck¹•Institutions (1)

Max Planck Society¹

01 Aug 2005-Numerische Mathematik

TL;DR: This article uses the -matrix representation that approximates the dense stiffness matrix in admissible blocks by low-rank matrices by a new hybrid algorithm that has the same proven convergence as standard interpolation but also the same efficiency as the (heuristic) adaptive cross approximation (ACA).

...read moreread less

Abstract: The efficient treatment of dense matrices arising, e.g., from the finite element discretisation of integral operators requires special compression techniques. In this article we use the **-matrix representation that approximates the dense stiffness matrix in admissible blocks (corresponding to subdomains where the underlying kernel function is smooth) by low-rank matrices. The low-rank matrices are assembled by a new hybrid algorithm (HCA) that has the same proven convergence as standard interpolation but also the same efficiency as the (heuristic) adaptive cross approximation (ACA).

...read moreread less

Proceedings Article•DOI•

Learning unions of orthonormal bases with thresholded singular value decomposition

[...]

Sylvain Lesage, Rémi Gribonval, Frédéric Bimbot, Laurent Benaroya

18 Mar 2005

TL;DR: It is shown that it is possible to design an iterative learning algorithm that produces a dictionary with the required structure, and how well the learning algorithm recovers dictionaries that may or may not have the necessary structure is assessed.

...read moreread less

Abstract: We propose a new method to learn overcomplete dictionaries for sparse coding structured as unions of orthonormal bases. The interest of such a structure is manifold. Indeed, it seems that many signals or images can be modeled as the superimposition of several layers with sparse decompositions in as many bases. Moreover, in such dictionaries, the efficient block coordinate relaxation (BCR) algorithm can be used to compute sparse decompositions. We show that it is possible to design an iterative learning algorithm that produces a dictionary with the required structure. Each step is based on the coefficients estimation, using a variant of BCR, followed by the update of one chosen basis, using singular value decomposition. We assess experimentally how well the learning algorithm recovers dictionaries that may or may not have the required structure, and to what extent the noise level is a disturbing factor.

...read moreread less

Book Chapter•DOI•

Document clustering based on nonnegative sparse matrix factorization

[...]

Chengfu Yang¹, Mao Ye¹, Jing Zhao¹•Institutions (1)

University of Electronic Science and Technology of China¹

27 Aug 2005

TL;DR: A novel algorithm of document clustering based on non-negative sparse analysis that can obtain documents topics exactly by controlling the sparseness of the topic matrix and the encoding matrix explicitly is proposed.

...read moreread less

Abstract: A novel algorithm of document clustering based on non-negative sparse analysis is proposed. In contrast to the algorithm based on non-negative matrix factorization, our algorithm can obtain documents topics exactly by controlling the sparseness of the topic matrix and the encoding matrix explicitly. Thus, the clustering accuracy has been improved greatly. In the end, simulation results are employed to further illustrate the accuracy and efficiency of this algorithm.

...read moreread less

Journal Article•DOI•

Algorithm 849: A concise sparse Cholesky factorization package

[...]

Timothy A. Davis¹•Institutions (1)

University of Florida¹

01 Dec 2005-ACM Transactions on Mathematical Software

TL;DR: The LDL software package is a set of short, concise routines for factorizing symmetric positive-definite sparse matrices, with some applicability to symmetric indefinite matrices.

...read moreread less

Abstract: The LDL software package is a set of short, concise routines for factorizing symmetric positive-definite sparse matrices, with some applicability to symmetric indefinite matrices. Its primary purpose is to illustrate much of the basic theory of sparse matrix algorithms in as concise a code as possible, including an elegant method of sparse symmetric factorization that computes the factorization row-by-row but stores it column-by-column. The entire symbolic and numeric factorization consists of less than 50 executable lines of code. The package is written in C, and includes a MATLAB interface.

...read moreread less

Journal Article•DOI•

A comparison of eigensolvers for large‐scale 3D modal analysis using AMG‐preconditioned iterative methods

[...]

Peter Arbenz¹, Ulrich Hetmaniuk², Richard B. Lehoucq², Raymond S. Tuminaro²•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, Sandia National Laboratories²

14 Sep 2005-International Journal for Numerical Methods in Engineering

TL;DR: The goal of this paper is to compare a number of algorithms for computing a large number of eigenvectors of the generalized symmetric eigenvalue problem arising from a modal analysis of elastic structures by considering the use of preconditioned iterative methods.

...read moreread less

Abstract: The goal of our paper is to compare a number of algorithms for computing a large number of eigenvectors of the generalized symmetric eigenvalue problem arising from a modal analysis of elastic structures. The shift-invert Lanczos algorithm has emerged as the workhorse for the solution of this generalized eigenvalue problem; however, a sparse direct factorization is required for the resulting set of linear equations. Instead, our paper considers the use of preconditioned iterative methods. We present a brief review of available preconditioned eigensolvers followed by a numerical comparison on three problems using a scalable algebraic multigrid (AMG) preconditioner.

...read moreread less

Journal Article•DOI•

Row Modifications of a Sparse Cholesky Factorization

[...]

Timothy A. Davis¹, William W. Hager¹•Institutions (1)

University of Florida¹

01 Mar 2005-SIAM Journal on Matrix Analysis and Applications

TL;DR: It is shown how the modification in the Cholesky factorization associated with this rank-2 modification of C can be computed efficiently using a sparse rank-1 technique developed in [T. A. Davis, SIAM J. Matrix Anal. Appl., 20 (1999), pp. 606--627].

...read moreread less

Abstract: Given a sparse, symmetric positive definite matrix C and an associated sparse Cholesky factorization LDL$\tr$, we develop sparse techniques for updating the factorization after a symmetric modification of a row and column of C. We show how the modification in the Cholesky factorization associated with this rank-2 modification of C can be computed efficiently using a sparse rank-1 technique developed in [T. A. Davis and W. W. Hager, SIAM J. Matrix Anal. Appl., 20 (1999), pp. 606--627]. We also determine how the solution of a linear system Lx = b changes after changing a row and column of C or after a rank-r change in C.

...read moreread less

Journal Article•DOI•

Fast and numerically stable algorithms for discrete cosine transforms

[...]

Gerlind Plonka¹, Manfred Tasche²•Institutions (2)

University of Duisburg-Essen¹, University of Rostock²

01 Jan 2005-Linear Algebra and its Applications

TL;DR: A detailed analysis of the roundoff errors for the presented DCT algorithms shows their excellent numerical stability which outperforms a real fast DCT algorithm based on polynomial arithmetic.

...read moreread less

Journal Article•DOI•

Algorithm 844: Computing sparse reduced-rank approximations to sparse matrices

[...]

Michael W. Berry¹, Shakhina Pulatova¹, G. W. Stewart²•Institutions (2)

University of Tennessee¹, University of Maryland, College Park²

01 Jun 2005-ACM Transactions on Mathematical Software

TL;DR: The computational details of a variant of the classical Gram--Schmidt algorithm, called the quasi--Gram-Schmidt--algorithm, to obtain two kinds of low-rank approximations are treated and a MATLAB implementation is described.

...read moreread less

Abstract: In many applications---latent semantic indexing, for example---it is required to obtain a reduced rank approximation to a sparse matrix A. Unfortunately, the approximations based on traditional decompositions, like the singular value and QR decompositions, are not in general sparse. Stewart [(1999), 313--323] has shown how to use a variant of the classical Gram--Schmidt algorithm, called the quasi--Gram-Schmidt--algorithm, to obtain two kinds of low-rank approximations. The first, the SPQR, approximation, is a pivoted, Q-less QR approximation of the form (XR11−1)(R11R12), where X consists of columns of A. The second, the SCR approximation, is of the form the form A ≅ XTYT, where X and Y consist of columns and rows A and T, is small. In this article we treat the computational details of these algorithms and describe a MATLAB implementation.

...read moreread less

Proceedings Article•DOI•

High Performance Linear Algebra Operations on Reconfigurable Systems

[...]

Ling Zhuo¹, Viktor K. Prasanna¹•Institutions (1)

University of Southern California¹

12 Nov 2005

TL;DR: This paper proposes a BLAS (Basic Linear Algebra Subprograms) library for state-of-the-art reconfigurable systems, and proposes a design which employs a linear array of FPGAs for matrix multiply operation.

...read moreread less

Abstract: Field-Programmable Gate Arrays (FPGAs) have become an attractive option for scientific computing. Several vendors have developed high performance reconfigurable systems which employ FPGAs for application acceleration. In this paper, we propose a BLAS (Basic Linear Algebra Subprograms) library for state-of-the-art reconfigurable systems. We study three data-intensive operations: dot product, matrix-vector multiply and dense matrix multiply. The first two operations are I/O bound, and our designs efficiently utilize the available memory bandwidth in the systems. As these operations require accumulation of sequentially delivered floating-point values, we develop a high performance reduction circuit. This circuit uses only one floating-point adder and buffers of moderate size. For matrix multiply operation, we propose a design which employs a linear array of FPGAs. This design exploits the memory hierarchy in the reconfigurable systems, and has very low memory bandwidth requirements. To illustrate our ideas, we have implemented our designs for Level 2 and Level 3 BLAS on Cray XD1.

...read moreread less

Patent•

Methods and systems for multi-pattern searching

[...]

Marc A. Norton, Daniel J. Roelker

20 Jul 2005

TL;DR: In this paper, the state table of an Aho-corasick algorithm is reduced by applying a banded-row sparse matrix technique to the state transition table of the state tables.

...read moreread less

Abstract: Embodiments of the present invention relate to systems and methods for optimizing and reducing the memory requirements of state machine algorithms in pattern matching applications. Memory requirements of an Aho-Corasick algorithm are reduced in an intrusion detection system by representing the state table as three separate data structures. Memory requirements of an Aho-Corasick algorithm are also reduced by applying a banded-row sparse matrix technique to the state transition table of the state table. The pattern matching performance of the intrusion detection system is improved by performing a case insensitive search, where the characters of the test sequence are converted to uppercase as the characters are read. Testing, reveals that state transition tables with sixteen bit elements outperform state transition tables with thirty-two bit elements and do not reduce the functionality of intrusion detection systems using the Aho-Corasick algorithm.

...read moreread less

Book Chapter•DOI•

Vectorized sparse matrix multiply for compressed row storage format

[...]

Eduardo D'Azevedo¹, Mark R. Fahey¹, Richard T. Mills¹•Institutions (1)

Oak Ridge National Laboratory¹

22 May 2005

TL;DR: A simple vectorizable algorithm for performing sparse matrix vector multiply in compressed sparse row (CSR) storage format that requires no data rearrangement and can be easily adapted to a sophisticated library framework such as PETSc.

...read moreread less

Abstract: The innovation of this work is a simple vectorizable algorithm for performing sparse matrix vector multiply in compressed sparse row (CSR) storage format. Unlike the vectorizable jagged diagonal format (JAD), this algorithm requires no data rearrangement and can be easily adapted to a sophisticated library framework such as PETSc. Numerical experiments on the Cray X1 show an order of magnitude improvement over the non-vectorized algorithm.

...read moreread less

Proceedings Article•

First results on uniqueness of sparse non-negative matrix factorization

[...]

Fabian J. Theis¹, K. Stadlthanner¹, Toshihisa Tanaka²•Institutions (2)

University of Regensburg¹, Tokyo University of Agriculture and Technology²

01 Sep 2005

TL;DR: It is shown that the employed projection step proposed by Hoyer has a unique solution, and that it indeed finds this solution, both theoretically and experimentally.

...read moreread less

Abstract: Sparse non-negative matrix factorization (sNMF) allows for the decomposition of a given data set into a mixing matrix and a feature data set, which are both non-negative and fulfill certain sparsity conditions. In this paper it is shown that the employed projection step proposed by Hoyer has a unique solution, and that it indeed finds this solution. Then indeterminacies of the sNMF model are identified and first uniqueness results are presented, both theoretically and experimentally.

...read moreread less

Proceedings Article•DOI•

Sparse representations for multiple measurement vectors (MMV) in an over-complete dictionary

[...]

Jie Chen¹, Xiaoming Huo¹•Institutions (1)

Georgia Institute of Technology¹

18 Mar 2005

TL;DR: The theoretical results show the fundamental limitation on when a sparse representation is unique, and the relation between the solutions of /spl lscr//sub 0/-norm minimization and the solution of /Spl lscR//sub 1/- norm minimization indicates a computationally efficient approach to find a sparse representations.

...read moreread less

Abstract: The multiple measurement vector (MMV), a newly emerged problem in sparse representation in an over-complete dictionary motivated by a neuro-magnetic inverse problem that arises in magnetoencephalography (MEG) - a modality for imaging the possible activation regions in the brain, poses new challenges. Efficient methods have been designed to search for sparse representations; however, we have not seen substantial development in the theoretical analysis, considering what has been done in a simpler case - single measurement vector (SMV) - in which many theoretical results are known. This paper extends the known results of SMV to MMV. Our theoretical results show the fundamental limitation on when a sparse representation is unique. Moreover, the relation between the solutions of /spl lscr//sub 0/-norm minimization and the solutions of /spl lscr//sub 1/-norm minimization indicates a computationally efficient approach to find a sparse representation. Interestingly, simulations show that the predictions made by these theorems tend to be conservative.

...read moreread less

Journal Article•DOI•

A Formal Two-Phase Method for Decomposition of Complex Design Problems

[...]

Li Chen¹, Zhendong Ding¹, Simon Li¹•Institutions (1)

University of Toronto¹

01 Mar 2005-Journal of Mechanical Design

TL;DR: This paper presents a formal two-phase decomposition method for complex design problems that are represented in an attribute-component incidence matrix that decouples the overall decomposition process into two separate, autonomous function components: dependency analysis and matrix partitioning, which are algorithmically achieved by an extended Hierarchical Cluster Analysis and a Partition Point Analysis.

...read moreread less

Abstract: This paper presents a formal two-phase decomposition method for complex design problems that are represented in an attribute-component incidence matrix. Unlike the conventional approaches, this method decouples the overall decomposition process into two separate, autonomous function components: dependency analysis and matrix partitioning, which are algorithmically achieved by an extended Hierarchical Cluster Analysis (HCA) and a Partition Point Analysis (PPA), respectively. The extended HCA (Phase I) is applied to convert the (input) incidence matrix, which is originally unorganized, into a banded diagonal matrix. The PPA (Phase 2) is applied to further transform this matrix into a block-angular matrix according to a given set of decomposition criteria. This method provides both flexibility in the choice of the different settings on the decomposition criteria, and diversity in the generation of the decomposition solutions, both taking place in Phase 2 without resort to Phase I. These features essentially make this decomposition method effective, especially in its application to re-decomposition. A powertrain design example is employed for illustration and discussion.

...read moreread less

Collapse