scispace - formally typeset
Search or ask a question

Showing papers on "Sparse matrix published in 2005"


Journal ArticleDOI
TL;DR: The Scalable Library for Eigenvalue Problem Computations (SLEPc) is a software library for computing a few eigenvalues and associated eigenvectors of a large sparse matrix or matrix pencil that has been developed on top of PETSc and enforces the same programming paradigm.
Abstract: The Scalable Library for Eigenvalue Problem Computations (SLEPc) is a software library for computing a few eigenvalues and associated eigenvectors of a large sparse matrix or matrix pencil. It has been developed on top of PETSc and enforces the same programming paradigm.The emphasis of the software is on methods and techniques appropriate for problems in which the associated matrices are sparse, for example, those arising after the discretization of partial differential equations. Therefore, most of the methods offered by the library are projection methods such as Arnoldi or Lanczos, or other methods with similar properties. SLEPc provides basic methods as well as more sophisticated algorithms. It also provides built-in support for spectral transformations such as the shift-and-invert technique. SLEPc is a general library in the sense that it covers standard and generalized eigenvalue problems, both Hermitian and non-Hermitian, with either real or complex arithmetic.SLEPc can be easily applied to real world problems. To illustrate this, several case studies arising from real applications are presented and solved with SLEPc with little programming effort. The addressed problems include a matrix-free standard problem, a complex generalized problem, and a singular value decomposition. The implemented codes exhibit good properties regarding flexibility as well as parallel performance.

974 citations


Journal ArticleDOI
01 Jan 2005
TL;DR: An overview of OSKI is provided, which is based on research on automatically tuned sparse kernels for modern cache-based superscalar machines, and the primary aim of this interface is to hide the complex decision-making process needed to tune the performance of a kernel implementation for a particular user's sparse matrix and machine.
Abstract: The Optimized Sparse Kernel Interface (OSKI) is a collection of low-level primitives that provide automatically tuned computational kernels on sparse matrices, for use by solver libraries and applications. These kernels include sparse matrix-vector multiply and sparse triangular solve, among others. The primary aim of this interface is to hide the complex decision-making process needed to tune the performance of a kernel implementation for a particular user's sparse matrix and machine, while also exposing the steps and potentially non-trivial costs of tuning at run-time. This paper provides an overview of OSKI, which is based on our research on automatically tuned sparse kernels for modern cache-based superscalar machines.

546 citations


Journal ArticleDOI
TL;DR: This paper proposes an alternating projection method that is versatile enough to solve a huge class of inverse eigenvalue problems (IEPs), which includes the frame design problem, and addresses the most basic design problem: constructing tight frames with prescribed vector norms.
Abstract: Tight frames, also known as general Welch-bound- equality sequences, generalize orthonormal systems. Numerous applications - including communications, coding, and sparse approximation- require finite-dimensional tight frames that possess additional structural properties. This paper proposes an alternating projection method that is versatile enough to solve a huge class of inverse eigenvalue problems (IEPs), which includes the frame design problem. To apply this method, one needs only to solve a matrix nearness problem that arises naturally from the design specifications. Therefore, it is the fast and easy to develop versions of the algorithm that target new design problems. Alternating projection will often succeed even if algebraic constructions are unavailable. To demonstrate that alternating projection is an effective tool for frame design, the paper studies some important structural properties in detail. First, it addresses the most basic design problem: constructing tight frames with prescribed vector norms. Then, it discusses equiangular tight frames, which are natural dictionaries for sparse approximation. Finally, it examines tight frames whose individual vectors have low peak-to-average-power ratio (PAR), which is a valuable property for code-division multiple-access (CDMA) applications. Numerical experiments show that the proposed algorithm succeeds in each of these three cases. The appendices investigate the convergence properties of the algorithm.

496 citations


Journal ArticleDOI
TL;DR: An overview of the algorithms, design philosophy, and implementation techniques in the software SuperLU, for solving sparse unsymmetric linear systems, and some examples of how the solver has been used in large-scale scientific applications, and the performance.
Abstract: We give an overview of the algorithms, design philosophy, and implementation techniques in the software SuperLU, for solving sparse unsymmetric linear systems. In particular, we highlight the differences between the sequential SuperLU (including its multithreaded extension) and parallel SuperLU_DIST. These include the numerical pivoting strategy, the ordering strategy for preserving sparsity, the ordering in which the updating tasks are performed, the numerical kernel, and the parallelization strategy. Because of the scalability concern, the parallel code is drastically different from the sequential one. We describe the user interfaces of the libraries, and illustrate how to use the libraries most efficiently depending on some matrix characteristics. Finally, we give some examples of how the solver has been used in large-scale scientific applications, and the performance.

371 citations


Proceedings ArticleDOI
13 Jun 2005
TL;DR: This paper presents a system for fully automatic recognition and reconstruction of 3D objects in image databases, using invariant local features to find matches between all images, and the RANSAC algorithm to find those that are consistent with the fundamental matrix.
Abstract: This paper presents a system for fully automatic recognition and reconstruction of 3D objects in image databases. We pose the object recognition problem as one of finding consistent matches between all images, subject to the constraint that the images were taken from a perspective camera. We assume that the objects or scenes are rigid. For each image, we associate a camera matrix, which is parameterised by rotation, translation and focal length. We use invariant local features to find matches between all images, and the RANSAC algorithm to find those that are consistent with the fundamental matrix. Objects are recognised as subsets of matching images. We then solve for the structure and motion of each object, using a sparse bundle adjustment algorithm. Our results demonstrate that it is possible to recognise and reconstruct 3D objects from an unordered image database with no user input at all.

304 citations


Proceedings ArticleDOI
18 Mar 2005
TL;DR: A greedy pursuit algorithm called simultaneous orthogonal matching pursuit is presented, which proves that the algorithm calculates simultaneous approximations whose error is within a constant factor of the optimal simultaneous approximation error.
Abstract: A simple sparse approximation problem requests an approximation of a given input signal as a linear combination of T elementary signals drawn from a large, linearly dependent collection An important generalization is simultaneous sparse approximation Now one must approximate several input signals at once using different linear combinations of the same T elementary signals This formulation appears, for example, when analyzing multiple observations of a sparse signal that have been contaminated with noise A new approach to this problem is presented here: a greedy pursuit algorithm called simultaneous orthogonal matching pursuit The paper proves that the algorithm calculates simultaneous approximations whose error is within a constant factor of the optimal simultaneous approximation error This result requires that the collection of elementary signals be weakly correlated, a property that is also known as incoherence Numerical experiments demonstrate that the algorithm often succeeds, even when the inputs do not meet the hypotheses of the proof

301 citations


Journal ArticleDOI
TL;DR: A unifying framework for the graph models of the variant matrix estimation problems is presented, based upon the viewpoint that a partition of a matrix into structurally orthogonal groups of columns corresponds to distance-2 coloring an appropriate graph representation.
Abstract: Graph coloring has been employed since the 1980s to efficiently compute sparse Jacobian and Hessian matrices using either finite differences or automatic differentiation. Several coloring problems occur in this context, depending on whether the matrix is a Jacobian or a Hessian, and on the specifics of the computational techniques employed. We consider eight variant vertex coloring problems here. This article begins with a gentle introduction to the problem of computing a sparse Jacobian, followed by an overview of the historical development of the research area. Then we present a unifying framework for the graph models of the variant matrix estimation problems. The framework is based upon the viewpoint that a partition of a matrix into structurally orthogonal groups of columns corresponds to distance-2 coloring an appropriate graph representation. The unified framework helps integrate earlier work and leads to fresh insights; enables the design of more efficient algorithms for many problems; leads to new algorithms for others; and eases the task of building graph models for new problems. We report computational results on two of the coloring problems to support our claims. Most of the methods for these problems treat a column or a row of a matrix as an atomic entity, and partition the columns or rows (or both). A brief review of methods that do not fit these criteria is provided. We also discuss results in discrete mathematics and theoretical computer science that intersect with the topics considered here.

291 citations


Journal ArticleDOI
TL;DR: It is shown that the singular vectors of the K matrix together with knowledge of the Green function of the background medium in which the targets are embedded lead directly to classical time-reversal based images of the target locations as well as super-resolution images based on a generalized Multiple-Signal-Classification algorithm recently developed for use with the K Matrix.
Abstract: The methods employed in time-reversal imaging are applied to radar imaging problems using multistatic data collected from sparse and unstructured phased array antenna systems. The theory is especially suitable to problems involving the detection and tracking (locating) of moving ground targets (MGT) from satellite based phased array antenna systems and locating buried or obscured targets from multistatic data collected from phased array antenna systems mounted on unmanned aerial vehicles (UAV). The theory is based on the singular value decomposition (SVD) of the multistatic data matrix K and applies to general phased array antenna systems whose elements are arbitrarily located in space. It is shown that the singular vectors of the K matrix together with knowledge of the Green function of the background medium in which the targets are embedded lead directly to classical time-reversal based images of the target locations as well as super-resolution images based on a generalized Multiple-Signal-Classification algorithm recently developed for use with the K matrix. The theory is applied in a computer simulation study of the TechSat project whose goal is the location of MGTs from an unstructured and sparse phased array of freely orbiting antennas located above the ionosphere.

269 citations


Journal ArticleDOI
TL;DR: Experimental timings of an actual parallel sparse matrix-vector multiplication on an SGI Origin 3800 computer show that a sufficiently large reduction in communication volume leads to savings in execution time.
Abstract: A new method is presented for distributing data in sparse matrix-vector multiplication. The method is two-dimensional, tries to minimize the true communication volume, and also tries to spread the computation and communication work evenly over the processors. The method starts with a recursive bipartitioning of the sparse matrix, each time splitting a rectangular matrix into two parts with a nearly equal number of nonzeros. The communication volume caused by the split is minimized. After the matrix partitioning, the input and output vectors are partitioned with the objective of minimizing the maximum communication volume per processor. Experimental results of our implementation, Mondriaan, for a set of sparse test matrices show a reduction in communication volume compared to one-dimensional methods, and in general a good balance in the communication work. Experimental timings of an actual parallel sparse matrix-vector multiplication on an SGI Origin 3800 computer show that a sufficiently large reduction in communication volume leads to savings in execution time.

266 citations


Proceedings ArticleDOI
20 Feb 2005
TL;DR: Besides solving SpMXV problem, the design provides a parameterized and flexible tree-based design for floating-point applications on FPGAs, which demonstrates significant speedup over general-purpose processors particularly for matrices with very irregular sparsity structure.
Abstract: Floating-point Sparse Matrix-Vector Multiplication (SpMXV) is a key computational kernel in scientific and engineering applications. The poor data locality of sparse matrices significantly reduces the performance of SpMXV on general-purpose processors, which rely heavily on the cache hierarchy to achieve high performance. The abundant hardware resources on current FPGAs provide new opportunities to improve the performance of SpMXV. In this paper, we propose an FPGA-based design for SpMXV. Our design accepts sparse matrices in Compressed Row Storage format, and makes no assumptions about the sparsity structure of the input matrix. The design employs IEEE-754 format double-precision floating-point multipliers/adders, and performs multiple floating-point operations as well as I/O operations in parallel. The performance of our design for SpMXV is evaluated using various sparse matrices from the scientific computing community, with the Xilinx Virtex-II Pro XC2VP70 as the target device. The MFLOPS performance increases with the hardware resources on the device as well as the available memory bandwidth. For example, when the memory bandwidth is 8 GB/s, our design achieves over 350 MFLOPS for all the test matrices. It demonstrates significant speedup over general-purpose processors particularly for matrices with very irregular sparsity structure. Besides solving SpMXV problem, our design provides a parameterized and flexible tree-based design for floating-point applications on FPGAs.

247 citations


Journal ArticleDOI
15 Jan 2005
TL;DR: An efficient method of protein classification using multiple protein networks is proposed, and experiments on function prediction of 3588 yeast proteins show promising results: the computation time is enormously reduced, while the accuracy is still comparable to the SDP/SVM method.
Abstract: Motivation: Support vector machines (SVMs) have been successfully used to classify proteins into functional categories. Recently, to integrate multiple data sources, a semidefinite programming (SDP) based SVM method was introduced. In SDP/SVM, multiple kernel matrices corresponding to each of data sources are combined with weights obtained by solving an SDP. However, when trying to apply SDP/SVM to large problems, the computational cost can become prohibitive, since both converting the data to a kernel matrix for the SVM and solving the SDP are time and memory demanding. Another application-specific drawback arises when some of the data sources are protein networks. A common method of converting the network to a kernel matrix is the diffusion kernel method, which has time complexity of O(n3), and produces a dense matrix of size n × n. Results: We propose an efficient method of protein classification using multiple protein networks. Available protein networks, such as a physical interaction network or a metabolic network, can be directly incorporated. Vectorial data can also be incorporated after conversion into a network by means of neighbor point connection. Similar to the SDP/SVM method, the combination weights are obtained by convex optimization. Due to the sparsity of network edges, the computation time is nearly linear in the number of edges of the combined network. Additionally, the combination weights provide information useful for discarding noisy or irrelevant networks. Experiments on function prediction of 3588 yeast proteins show promising results: the computation time is enormously reduced, while the accuracy is still comparable to the SDP/SVM method. Availability: Software and data will be available on request. Contact: shin@tuebingen.mpg.de

Journal ArticleDOI
TL;DR: This work describes three possible piecewise multilinear hierarchical interpolation schemes in detail, and documents the features of the sparse grid interpolation software package spinterp for MATLAB.
Abstract: To recover or approximate smooth multivariate functions, sparse grids are superior to full grids due to a significant reduction of the required support nodes. The order of the convergence rate in the maximum norm is preserved up to a logarithmic factor. We describe three possible piecewise multilinear hierarchical interpolation schemes in detail and conduct a numerical comparison. Furthermore, we document the features of our sparse grid interpolation software package spinterp for MATLAB.

Proceedings ArticleDOI
17 Oct 2005
TL;DR: An algorithm for a non-negative 3D tensor factorization for establishing a local parts feature decomposition from an object class of images shows a superior decomposition to what an NMF can provide on all fronts.
Abstract: We introduce an algorithm for a non-negative 3D tensor factorization for the purpose of establishing a local parts feature decomposition from an object class of images. In the past, such a decomposition was obtained using non-negative matrix factorization (NMF) where images were vectorized before being factored by NMF. A tensor factorization (NTF) on the other hand preserves the 2D representations of images and provides a unique factorization (unlike NMF which is not unique). The resulting "factors" from the NTF factorization are both sparse (like with NMF) but also separable allowing efficient convolution with the test image. Results show a superior decomposition to what an NMF can provide on all fronts - degree of sparsity, lack of ghost residue due to invariant parts and efficiency of coding of around an order of magnitude better. Experiments on using the local parts decomposition for face detection using SVM and Adaboost classifiers demonstrate that the recovered features are discriminatory and highly effective for classification.

Proceedings ArticleDOI
21 Aug 2005
TL;DR: A new co-clustering framework, block value decomposition(BVD), is presented, which factorizes the dyadic data matrix into three components, the row- coefficient matrix R, the block value matrix B, and the column-coefficient matrix C, which iteratively computes the three decomposition matrices based on the multiplicative updating rules.
Abstract: Dyadic data matrices, such as co-occurrence matrix, rating matrix, and proximity matrix, arise frequently in various important applications. A fundamental problem in dyadic data analysis is to find the hidden block structure of the data matrix. In this paper, we present a new co-clustering framework, block value decomposition(BVD), for dyadic data, which factorizes the dyadic data matrix into three components, the row-coefficient matrix R, the block value matrix B, and the column-coefficient matrix C. Under this framework, we focus on a special yet very popular case -- non-negative dyadic data, and propose a specific novel co-clustering algorithm that iteratively computes the three decomposition matrices based on the multiplicative updating rules. Extensive experimental evaluations also demonstrate the effectiveness and potential of this framework as well as the specific algorithms for co-clustering, and in particular, for discovering the hidden block structure in the dyadic data.

Journal ArticleDOI
TL;DR: This paper designs an efficient parallelizable preconditioner that can be naturally implemented in a parallel code that implements the multipole technique for the matrix-vector product calculation and proposes an embedded iterative scheme that combines nested GMRES solvers with different fast multipole computations.
Abstract: The boundary element method has become a popular tool for the solution of Maxwell's equations in electromagnetism. From a linear algebra point of view, this leads to the solution of large dense complex linear systems, where the unknowns are associated with the edges of the mesh defined on the surface of the illuminated object. In this paper, we address the iterative solution of these linear systems via preconditioned Krylov solvers. Our primary focus is on the design of an efficient parallelizable preconditioner. In that respect, we consider an approximate inverse method based on the Frobenius-norm minimization. The preconditioner is constructed from a sparse approximation of the dense coefficient matrix, and the patterns both for the preconditioner and for the coefficient matrix are computed a priori using geometric information from the mesh. We describe how such a preconditioner can be naturally implemented in a parallel code that implements the multipole technique for the matrix-vector product calculation. We investigate the numerical scalability of our preconditioner on realistic industrial test problems and show that it exhibits some limitations on very large problems of size close to one million unknowns. To improve its robustness on those large problems we propose an embedded iterative scheme that combines nested GMRES solvers with different fast multipole computations. We show through extensive numerical experiments that this new scheme is extremely robust at affordable memory and CPU costs for the solution of very large and challenging problems.

Book ChapterDOI
21 Sep 2005
TL;DR: This work split the matrix, A, into a sum, A1 + A2 + ... + As, where each term is stored in a new data structure the authors refer to as unaligned block compressed sparse row (UBCSR) format, which improves the performance of sparse matrix-vector multiplication (SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks.
Abstract: We improve the performance of sparse matrix-vector multiplication(SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this structure. We split the matrix, A, into a sum, A1 + A2 + ... + As, where each term is stored in a new data structure we refer to as unaligned block compressed sparse row (UBCSR) format. A classical approach which stores A in a BCSR can also reduce execution time, but the improvements may be limited because BCSR imposes an alignment of the matrix non-zeros that leads to extra work from filled-in zeros. Combining splitting with UBCSR reduces this extra work while retaining the generally lower memory bandwidth requirements and register-level tiling opportunities of BCSR. We show speedups can be as high as 2.1× over no blocking, and as high as 1.8× over BCSR as used in prior work on a set of application matrices. Even when performance does not improve significantly, split UBCSR usually reduces matrix storage.

Journal ArticleDOI
TL;DR: This article uses the -matrix representation that approximates the dense stiffness matrix in admissible blocks by low-rank matrices by a new hybrid algorithm that has the same proven convergence as standard interpolation but also the same efficiency as the (heuristic) adaptive cross approximation (ACA).
Abstract: The efficient treatment of dense matrices arising, e.g., from the finite element discretisation of integral operators requires special compression techniques. In this article we use the **-matrix representation that approximates the dense stiffness matrix in admissible blocks (corresponding to subdomains where the underlying kernel function is smooth) by low-rank matrices. The low-rank matrices are assembled by a new hybrid algorithm (HCA) that has the same proven convergence as standard interpolation but also the same efficiency as the (heuristic) adaptive cross approximation (ACA).

Proceedings ArticleDOI
18 Mar 2005
TL;DR: It is shown that it is possible to design an iterative learning algorithm that produces a dictionary with the required structure, and how well the learning algorithm recovers dictionaries that may or may not have the necessary structure is assessed.
Abstract: We propose a new method to learn overcomplete dictionaries for sparse coding structured as unions of orthonormal bases. The interest of such a structure is manifold. Indeed, it seems that many signals or images can be modeled as the superimposition of several layers with sparse decompositions in as many bases. Moreover, in such dictionaries, the efficient block coordinate relaxation (BCR) algorithm can be used to compute sparse decompositions. We show that it is possible to design an iterative learning algorithm that produces a dictionary with the required structure. Each step is based on the coefficients estimation, using a variant of BCR, followed by the update of one chosen basis, using singular value decomposition. We assess experimentally how well the learning algorithm recovers dictionaries that may or may not have the required structure, and to what extent the noise level is a disturbing factor.

Book ChapterDOI
27 Aug 2005
TL;DR: A novel algorithm of document clustering based on non-negative sparse analysis that can obtain documents topics exactly by controlling the sparseness of the topic matrix and the encoding matrix explicitly is proposed.
Abstract: A novel algorithm of document clustering based on non-negative sparse analysis is proposed. In contrast to the algorithm based on non-negative matrix factorization, our algorithm can obtain documents topics exactly by controlling the sparseness of the topic matrix and the encoding matrix explicitly. Thus, the clustering accuracy has been improved greatly. In the end, simulation results are employed to further illustrate the accuracy and efficiency of this algorithm.

Journal ArticleDOI
TL;DR: The LDL software package is a set of short, concise routines for factorizing symmetric positive-definite sparse matrices, with some applicability to symmetric indefinite matrices.
Abstract: The LDL software package is a set of short, concise routines for factorizing symmetric positive-definite sparse matrices, with some applicability to symmetric indefinite matrices. Its primary purpose is to illustrate much of the basic theory of sparse matrix algorithms in as concise a code as possible, including an elegant method of sparse symmetric factorization that computes the factorization row-by-row but stores it column-by-column. The entire symbolic and numeric factorization consists of less than 50 executable lines of code. The package is written in C, and includes a MATLAB interface.

Journal ArticleDOI
TL;DR: The goal of this paper is to compare a number of algorithms for computing a large number of eigenvectors of the generalized symmetric eigenvalue problem arising from a modal analysis of elastic structures by considering the use of preconditioned iterative methods.
Abstract: The goal of our paper is to compare a number of algorithms for computing a large number of eigenvectors of the generalized symmetric eigenvalue problem arising from a modal analysis of elastic structures. The shift-invert Lanczos algorithm has emerged as the workhorse for the solution of this generalized eigenvalue problem; however, a sparse direct factorization is required for the resulting set of linear equations. Instead, our paper considers the use of preconditioned iterative methods. We present a brief review of available preconditioned eigensolvers followed by a numerical comparison on three problems using a scalable algebraic multigrid (AMG) preconditioner.

Journal ArticleDOI
TL;DR: It is shown how the modification in the Cholesky factorization associated with this rank-2 modification of C can be computed efficiently using a sparse rank-1 technique developed in [T. A. Davis, SIAM J. Matrix Anal. Appl., 20 (1999), pp. 606--627].
Abstract: Given a sparse, symmetric positive definite matrix C and an associated sparse Cholesky factorization LDL$\tr$, we develop sparse techniques for updating the factorization after a symmetric modification of a row and column of C. We show how the modification in the Cholesky factorization associated with this rank-2 modification of C can be computed efficiently using a sparse rank-1 technique developed in [T. A. Davis and W. W. Hager, SIAM J. Matrix Anal. Appl., 20 (1999), pp. 606--627]. We also determine how the solution of a linear system Lx = b changes after changing a row and column of C or after a rank-r change in C.

Journal ArticleDOI
TL;DR: A detailed analysis of the roundoff errors for the presented DCT algorithms shows their excellent numerical stability which outperforms a real fast DCT algorithm based on polynomial arithmetic.

Journal ArticleDOI
TL;DR: The computational details of a variant of the classical Gram--Schmidt algorithm, called the quasi--Gram-Schmidt--algorithm, to obtain two kinds of low-rank approximations are treated and a MATLAB implementation is described.
Abstract: In many applications---latent semantic indexing, for example---it is required to obtain a reduced rank approximation to a sparse matrix A. Unfortunately, the approximations based on traditional decompositions, like the singular value and QR decompositions, are not in general sparse. Stewart [(1999), 313--323] has shown how to use a variant of the classical Gram--Schmidt algorithm, called the quasi--Gram-Schmidt--algorithm, to obtain two kinds of low-rank approximations. The first, the SPQR, approximation, is a pivoted, Q-less QR approximation of the form (XR11−1)(R11R12), where X consists of columns of A. The second, the SCR approximation, is of the form the form A ≅ XTYT, where X and Y consist of columns and rows A and T, is small. In this article we treat the computational details of these algorithms and describe a MATLAB implementation.

Proceedings ArticleDOI
12 Nov 2005
TL;DR: This paper proposes a BLAS (Basic Linear Algebra Subprograms) library for state-of-the-art reconfigurable systems, and proposes a design which employs a linear array of FPGAs for matrix multiply operation.
Abstract: Field-Programmable Gate Arrays (FPGAs) have become an attractive option for scientific computing. Several vendors have developed high performance reconfigurable systems which employ FPGAs for application acceleration. In this paper, we propose a BLAS (Basic Linear Algebra Subprograms) library for state-of-the-art reconfigurable systems. We study three data-intensive operations: dot product, matrix-vector multiply and dense matrix multiply. The first two operations are I/O bound, and our designs efficiently utilize the available memory bandwidth in the systems. As these operations require accumulation of sequentially delivered floating-point values, we develop a high performance reduction circuit. This circuit uses only one floating-point adder and buffers of moderate size. For matrix multiply operation, we propose a design which employs a linear array of FPGAs. This design exploits the memory hierarchy in the reconfigurable systems, and has very low memory bandwidth requirements. To illustrate our ideas, we have implemented our designs for Level 2 and Level 3 BLAS on Cray XD1.

Patent
20 Jul 2005
TL;DR: In this paper, the state table of an Aho-corasick algorithm is reduced by applying a banded-row sparse matrix technique to the state transition table of the state tables.
Abstract: Embodiments of the present invention relate to systems and methods for optimizing and reducing the memory requirements of state machine algorithms in pattern matching applications. Memory requirements of an Aho-Corasick algorithm are reduced in an intrusion detection system by representing the state table as three separate data structures. Memory requirements of an Aho-Corasick algorithm are also reduced by applying a banded-row sparse matrix technique to the state transition table of the state table. The pattern matching performance of the intrusion detection system is improved by performing a case insensitive search, where the characters of the test sequence are converted to uppercase as the characters are read. Testing, reveals that state transition tables with sixteen bit elements outperform state transition tables with thirty-two bit elements and do not reduce the functionality of intrusion detection systems using the Aho-Corasick algorithm.

Book ChapterDOI
22 May 2005
TL;DR: A simple vectorizable algorithm for performing sparse matrix vector multiply in compressed sparse row (CSR) storage format that requires no data rearrangement and can be easily adapted to a sophisticated library framework such as PETSc.
Abstract: The innovation of this work is a simple vectorizable algorithm for performing sparse matrix vector multiply in compressed sparse row (CSR) storage format. Unlike the vectorizable jagged diagonal format (JAD), this algorithm requires no data rearrangement and can be easily adapted to a sophisticated library framework such as PETSc. Numerical experiments on the Cray X1 show an order of magnitude improvement over the non-vectorized algorithm.

Proceedings Article
01 Sep 2005
TL;DR: It is shown that the employed projection step proposed by Hoyer has a unique solution, and that it indeed finds this solution, both theoretically and experimentally.
Abstract: Sparse non-negative matrix factorization (sNMF) allows for the decomposition of a given data set into a mixing matrix and a feature data set, which are both non-negative and fulfill certain sparsity conditions. In this paper it is shown that the employed projection step proposed by Hoyer has a unique solution, and that it indeed finds this solution. Then indeterminacies of the sNMF model are identified and first uniqueness results are presented, both theoretically and experimentally.

Proceedings ArticleDOI
18 Mar 2005
TL;DR: The theoretical results show the fundamental limitation on when a sparse representation is unique, and the relation between the solutions of /spl lscr//sub 0/-norm minimization and the solution of /Spl lscR//sub 1/- norm minimization indicates a computationally efficient approach to find a sparse representations.
Abstract: The multiple measurement vector (MMV), a newly emerged problem in sparse representation in an over-complete dictionary motivated by a neuro-magnetic inverse problem that arises in magnetoencephalography (MEG) - a modality for imaging the possible activation regions in the brain, poses new challenges. Efficient methods have been designed to search for sparse representations; however, we have not seen substantial development in the theoretical analysis, considering what has been done in a simpler case - single measurement vector (SMV) - in which many theoretical results are known. This paper extends the known results of SMV to MMV. Our theoretical results show the fundamental limitation on when a sparse representation is unique. Moreover, the relation between the solutions of /spl lscr//sub 0/-norm minimization and the solutions of /spl lscr//sub 1/-norm minimization indicates a computationally efficient approach to find a sparse representation. Interestingly, simulations show that the predictions made by these theorems tend to be conservative.

Journal ArticleDOI
TL;DR: This paper presents a formal two-phase decomposition method for complex design problems that are represented in an attribute-component incidence matrix that decouples the overall decomposition process into two separate, autonomous function components: dependency analysis and matrix partitioning, which are algorithmically achieved by an extended Hierarchical Cluster Analysis and a Partition Point Analysis.
Abstract: This paper presents a formal two-phase decomposition method for complex design problems that are represented in an attribute-component incidence matrix. Unlike the conventional approaches, this method decouples the overall decomposition process into two separate, autonomous function components: dependency analysis and matrix partitioning, which are algorithmically achieved by an extended Hierarchical Cluster Analysis (HCA) and a Partition Point Analysis (PPA), respectively. The extended HCA (Phase I) is applied to convert the (input) incidence matrix, which is originally unorganized, into a banded diagonal matrix. The PPA (Phase 2) is applied to further transform this matrix into a block-angular matrix according to a given set of decomposition criteria. This method provides both flexibility in the choice of the different settings on the decomposition criteria, and diversity in the generation of the decomposition solutions, both taking place in Phase 2 without resort to Phase I. These features essentially make this decomposition method effective, especially in its application to re-decomposition. A powertrain design example is employed for illustration and discussion.