Showing papers on "Sparse matrix published in 2002"

PDF

Open Access

Proceedings Article•DOI•

[...]

07 Nov 2002

TL;DR: A simple yet efficient multiplicative algorithm for finding the optimal values of the hidden components of non-negative sparse coding and how the basis vectors can be learned from the observed data is shown.

...read moreread less

Abstract: Non-negative sparse coding is a method for decomposing multivariate data into non-negative sparse components. We briefly describe the motivation behind this type of data representation and its relation to standard sparse coding and non-negative matrix factorization. We then give a simple yet efficient multiplicative algorithm for finding the optimal values of the hidden components. In addition, we show how the basis vectors can be learned from the observed data. Simulations demonstrate the effectiveness of the proposed method.

...read moreread less

871 citations

Book Chapter•DOI•

Solving Unsymmetric Sparse Systems of Linear Equations with PARDISO

[...]

Olaf Schenk¹, Klaus Gärtner•Institutions (1)

University of Basel¹

21 Apr 2002

TL;DR: Experiments demonstrate that a wide set of unsymmetric linear systems can be solved and high performance is consistently achieved for large sparse unsympetric matrices from real world applications.

...read moreread less

Abstract: Supernode pivoting for unsymmetric matrices coupled with supernode partitioning and asynchronous computation can achieve high gigaflop rates for parallel sparse LU factorization on shared memory parallel computers. The progress in weighted graph matching algorithms helps to extend these concepts further and prepermutation of rows is used to place large matrix entries on the diagonal. Supernode pivoting allows dynamical interchanges of columns and rows during the factorization process. The BLAS-3 level efficiency is retained. An enhanced left-right looking scheduling scheme is uneffected and results in good speedup on SMP machines without increasing the operation count. These algorithms have been integrated into the recent unsymmetric version of the PARDISO solver. Experiments demonstrate that a wide set of unsymmetric linear systems can be solved and high performance is consistently achieved for large sparse unsymmetric matrices from real world applications.

...read moreread less

323 citations

Proceedings Article•DOI•

Novel three-phase AC-DC-AC sparse matrix converter

[...]

Johann W. Kolar, M. Baumann¹, F. Schafmeister², Hans Ertl¹•Institutions (2)

Vienna University of Technology¹, ETH Zurich²

07 Aug 2002

TL;DR: In this article, a three-phase AC-DC-AC sparse matrix converter (SMC) with no energy storage elements in the DC link and employing only 15 IGBTs was proposed.

...read moreread less

Abstract: A novel three-phase AC-DC-AC sparse matrix converter (SMC) having no energy storage elements in the DC link and employing only 15 IGBTs as opposed to 18 IGBTs of a functionally equivalent conventional AC-AC matrix converter (CMC) is proposed. It is shown that the realization effort could be further reduced to only 9 IGBTs (ultra sparse matrix converter, USMC) in case the phase displacement of the fundamentals of voltage and current at the input and at the output is limited to /spl plusmn//spl pi//6. The dependency of the voltage and current transfer ratios of the systems on the operating parameters is analyzed and a space vector modulation scheme is described in combination with a zero current commutation procedure. Furthermore, a safe multi-step current commutation concept is treated briefly. Conduction and switching losses of the SMC and USMC are calculated in analytically closed form. Finally, the theoretical results are verified in Part II of the paper by digital simulations and results of a first experimental investigation of a 10 kW/400 V SMC prototype are given.

...read moreread less

270 citations

Journal Article•DOI•

Efficient computation of minimum-variance wave-front reconstructors with sparse matrix techniques.

[...]

Brent Ellerbroek

01 Sep 2002-Journal of The Optical Society of America A-optics Image Science and Vision

TL;DR: In this paper, the authors proposed a sparse minimum-variance reconstructor for a conventional natural guide star AO system using a sparse approximation for turbulence statistics and recognizing that the nonsparse matrix terms arising from LGS position uncertainty are low-rank adjustments that can be evaluated by using the matrix inversion lemma.

...read moreread less

Abstract: The complexity of computing conventional matrix multiply wave-front reconstructors scales as O(n3) for most adaptive optical (AO) systems, where n is the number of deformable mirror (DM) actuators. This is impractical for proposed systems with extremely large n. It is known that sparse matrix methods improve this scaling for least-squares reconstructors, but sparse techniques are not immediately applicable to the minimum-variance reconstructors now favored for multiconjugate adaptive optical (MCAO) systems with multiple wave-front sensors (WFSs) and DMs. Complications arise from the nonsparse statistics of atmospheric turbulence, and the global tip/tilt WFS measurement errors associated with laser guide star (LGS) position uncertainty. A description is given of how sparse matrix methods can still be applied by use of a sparse approximation for turbulence statistics and by recognizing that the nonsparse matrix terms arising from LGS position uncertainty are low-rank adjustments that can be evaluated by using the matrix inversion lemma. Sample numerical results for AO and MCAO systems illustrate that the approximation made to turbulence statistics has negligible effect on estimation accuracy, the time to compute the sparse minimum-variance reconstructor for a conventional natural guide star AO system scales as O(n3/2) and is only a few seconds for n = 3500, and sparse techniques reduce the reconstructor computations by a factor of 8 for sample MCAO systems with 2417 DM actuators and 4280 WFS subapertures. With extrapolation to 9700 actuators and 17,120 subapertures, a reduction by a factor of approximately 30 or 40 to 1 is predicted.

...read moreread less

178 citations

Journal Article•DOI•

A primal-dual interior point method for optimal power flow dispatching

[...]

Rabih A. Jabr¹, A.H. Coonick², B.J. Cory²•Institutions (2)

University of Notre Dame¹, Imperial College London²

07 Nov 2002-IEEE Transactions on Power Systems

TL;DR: In this article, a primal-dual interior point method for optimal power flow dispatching (OPFD) has been proposed, which is a direct extension of primal dual methods for linear programming.

...read moreread less

Abstract: In this paper, the solution of the optimal power flow dispatching (OPFD) problem by a primal-dual interior point method is considered. Several primal-dual methods for optimal power flow (OPF) have been suggested, all of which are essentially direct extensions of primal-dual methods for linear programming. The aim of the present work is to enhance convergence through two modifications: a filter technique to guide the choice of the step length and an altered search direction in order to avoid convergence to a nonminimizing stationary point. A reduction in computational time is also gained through solving a positive definite matrix for the search direction. Numerical tests on standard IEEE systems and on a realistic network are very encouraging and show that the new algorithm converges where other algorithms fail.

...read moreread less

143 citations

Proceedings Article•DOI•

Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply

[...]

Richard Vuduc¹, James Demmel¹, Katherine Yelick¹, Shoaib Kamil¹, Rajesh Nishtala¹, Benjamin C. Lee¹ - Show less +2 more•Institutions (1)

University of California, Berkeley¹

16 Nov 2002

TL;DR: Upper and lower bounds on the performance (Mflop/s) of SpM×V when tuned using the previously proposed register blocking optimization are developed and a new heuristic is presented that selects optimal or near-optimal register block sizes more accurately than the previous heuristic.

...read moreread less

Abstract: We consider performance tuning, by code and data structure reorganization, of sparse matrix-vector multiply (SpM × V), one of the most important computational kernels in scientific applications. This paper addresses the fundamental questions of what limits exist on such performance tuning, and how closely tuned code approaches these limits.Specifically, we develop upper and lower bounds on the performance (Mflop/s) of SpM × V when tuned using our previously proposed register blocking optimization. These bounds are based on the non-zero pattern in the matrix and the cost of basic memory operations, such as cache hits and misses. We evaluate our tuned implementations with respect to these bounds using hardware counter data on 4 different platforms and on a test set of 44 sparse matrices. We find that we can often get within 20% of the upper bound, particularly on a class of matrices from finite element modeling (FEM) problems; on non-FEM matrices, performance improvements of 2× are still possible. Lastly, we present a new heuristic that selects optimal or near-optimal register block sizes (the key tuning parameters) more accurately than our previous heuristic. Using the new heuristic, we show improvements in SpM × V performance (Mflop/s) by as much as 2.5× over an untuned implementation.Collectively, our results suggest that future performance improvements, beyond those that we have already demonstrated for SpM × V, will come from two sources: (1) consideration of higher-level matrix structures (e.g., exploiting symmetry, matrix reordering, multiple register block sizes), and (2) optimizing kernels with more opportunity for data reuse (e.g., sparse matrix-multiple vector multiply, multiplication of ATA by a vector).

...read moreread less

135 citations

Journal Article•DOI•

An overview of the sparse basic linear algebra subprograms: The new standard from the BLAS technical forum

[...]

Iain S. Duff¹, Michael A. Heroux², Roldan Pozo³•Institutions (3)

Rutherford Appleton Laboratory¹, Sandia National Laboratories², National Institute of Standards and Technology³

01 Jun 2002-ACM Transactions on Mathematical Software

TL;DR: The interface design for the Sparse Basic Linear Algebra Subprograms (BLAS) is discussed, the kernels in the recent standard that are concerned with unstructured sparse matrices are discussed, and how this interface can shield one from concern over the specific storage scheme for the sparse matrix.

...read moreread less

Abstract: We discuss the interface design for the Sparse Basic Linear Algebra Subprograms (BLAS), the kernels in the recent standard from the BLAS Technical Forum that are concerned with unstructured sparse matrices. The motivation for such a standard is to encourage portable programming while allowing for library-specific optimizations. In particular, we show how this interface can shield one from concern over the specific storage scheme for the sparse matrix. This design makes it easy to add further functionality to the sparse BLAS in the future.We illustrate the use of the Sparse BLAS with examples in the three supported programming languages, Fortran 95, Fortran 77, and C.

...read moreread less

133 citations

Patent•

Process and system for sparse vector and matrix representation of document indexing and retrieval

[...]

Aric Coady

11 Jan 2002

TL;DR: In this paper, a new data structure and algorithms which offer at least equal performance in common sparse matrix tasks, and improved performance in many, were proposed and applied to a word document index to produce fast build and query times for document retrieval.

...read moreread less

Abstract: A new data structure and algorithms which offer at least equal performance in common sparse matrix tasks, and improved performance in many. This is applied to a word-document index to produce fast build and query times for document retrieval.

...read moreread less

117 citations

Journal Article•DOI•

Recent advances in direct methods for solving unsymmetric sparse systems of linear equations

[...]

Anshul Gupta¹•Institutions (1)

IBM¹

01 Sep 2002-ACM Transactions on Mathematical Software

TL;DR: The experiments show that the algorithmic choices made in WSMP enable it to run more than twice as fast as the best among similar solvers and that WSMP can factor some of the largest sparse matrices available from real applications in only a few seconds on a 4-CPU workstation.

...read moreread less

Abstract: During the past few years, algorithmic improvements alone have reduced the time required for the direct solution of unsymmetric sparse systems of linear equations by almost an order of magnitude. This paper compares the performance of some well-known software packages for solving general sparse systems. In particular, it demonstrates the consistently high level of performance achieved by WSMP---the most recent of such solvers. It compares the various algorithmic components of these solvers and discusses their impact on solver performance. Our experiments show that the algorithmic choices made in WSMP enable it to run more than twice as fast as the best among similar solvers and that WSMP can factor some of the largest sparse matrices available from real applications in only a few seconds on a 4-CPU workstation. Thus, the combination of advances in hardware and algorithms makes it possible to solve those general sparse linear systems quickly and easily that might have been considered too large until recently.

...read moreread less

110 citations

Journal Article•DOI•

A branch-estimation-based state estimation method for radial distribution systems

[...]

Youman Deng¹, Ying He¹, Boming Zhang¹•Institutions (1)

Tsinghua University¹

10 Dec 2002-IEEE Power & Energy Magazine

TL;DR: Wang et al. as discussed by the authors proposed a branch-based state estimation method for radial distribution systems that can handle most kinds of real-time measurements, which decomposes the WLS problem of a whole system into a series of WLS subproblems, and each subproblem is to deal with only single branch state estimation.

...read moreread less

Abstract: This paper presents a new branch-based state-estimation method that is an estimation technique for radial distribution systems that can handle most kinds of real-time measurements. In contrast to the traditional weighted-least-square (WLS) method, the idea of this algorithm is to decompose the WLS problem of a whole system into a series of WLS subproblems, and each subproblem is to deal with only single branch state estimation. This approach can be implemented in a forward/backward sweep scheme for radial distribution systems and does not need the sparse matrix technique. Test results of a large-scale practical distribution system in China show that the proposed method is valid and efficient.

...read moreread less

101 citations

Journal Article•DOI•

Efficient matrix preconditioners for black box linear algebra

[...]

Li Chen¹, Wayne Eberly², Erich Kaltofen³, B. David Saunders¹, William J. Turner³, Gilles Villard⁴ - Show less +2 more•Institutions (4)

University UCINF¹, University of Calgary², North Carolina State University³, Centre national de la recherche scientifique⁴

01 Mar 2002-Linear Algebra and its Applications

TL;DR: Improvements are offered for the efficiency and applicability of preconditioners on linear algebra problems over finite fields, but most results are valid for entries from arbitrary fields.

...read moreread less

Journal Article•DOI•

Compactly supported radial basis functions for shallow water equations

[...]

S. M. Wong¹, Y.C. Hon², Michael A. Golberg•Institutions (2)

Open University of Hong Kong¹, City University of Hong Kong²

25 Mar 2002-Applied Mathematics and Computation

TL;DR: The compactly supported radial basis functions (CSRBFs) are presented in solving a system of shallow water hydrodynamics equations and the resulting banded matrix has shown improvement in both ill-conditioning and computational efficiency.

...read moreread less

Proceedings Article•DOI•

The improved BiCGStab method for large and sparse unsymmetric linear systems on parallel distributed memory architectures

[...]

Laurence T. Yang¹, Richard P. Brent•Institutions (1)

St. Francis Xavier University¹

23 Oct 2002

TL;DR: An improved version of the Bi CGStab (IBiCGStab) method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices is proposed, which combines elements of numerical stability and parallel algorithm design without increasing the computational costs.

...read moreread less

Abstract: In this paper, an improved version of the BiCGStab (IBiCGStab) method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices is proposed. The method combines elements of numerical stability and parallel algorithm design without increasing the computational costs. The algorithm is derived such that all inner products of a single iteration step are independent and communication time required for the inner product can be overlapped efficiently with computation time of vector updates. Therefore, the cost of global communication which represents the bottleneck of the parallel performance can be significantly reduced. The resulting IBiCGStab algorithm maintains the favorable properties of the original method while not increasing computational costs. Data distribution suitable for both irregularly and regularly structured matrices based on the analysis of the nonzero matrix elements is presented. Communication scheme is supported by overlapping execution of computation and communication to reduce waiting times. The efficiency of this method is demonstrated by numerical experimental results carried out on a massively parallel distributed memory system.

...read moreread less

Novel Three-Phase AC-DC-AC Sparse Matrix Converter Part I: Derivation, Basic Principle of Operation, Space Vector Modulation, Dimensioning

[...]

Johann W. Kolar, M. Baumann, F. Schafmeister, Hans Ertl

01 Jan 2002

TL;DR: In this article, a three-phase AC-DC-AC Sparse Matrix Converter (SMC) with no energy storage elements in the DC link and employing only 15 IGBTs (USMC) was proposed, where the phase displacement of the voltage and current at the input and at the output is limited to ±π/6.

...read moreread less

Abstract: A novel three-phase AC-DC-AC Sparse Matrix Converter (SMC) having no energy storage elements in the DC link and employing only 15 IGBTs as opposed to 18 IGBTs of a functionally equivalent conventional AC-AC matrix converter (CMC) is pro- posed. It is shown that the realization effort could be further reduced to only 9 IGBTs (Ultra Sparse Matrix Converter, USMC) in case the phase displacement of the fundamentals of voltage and current at the input and at the output is limited to ±π/6. The dependency of the voltage and current transfer ratios of the systems on the operating parameters is analyzed and a space vector modulation scheme is described in combination with a zero current commutation proce- dure. Furthermore, a safe multi-step current commutation concept is treated briefly. Conduction and switching losses of the SMC and USMC are calculated in analytically closed form. Finally, the theoretical results are verified in Part II of the paper by digital simulations and results of a first experimental investigation of a 10kW/400V SMC prototype are given.

...read moreread less

Journal Article•DOI•

IRBL: An Implicitly Restarted Block-Lanczos Method for Large-Scale Hermitian Eigenproblems

[...]

James Baglama, Daniela Calvetti, Lothar Reichel

01 May 2002-SIAM Journal on Scientific Computing

TL;DR: The irbleigs code is an implementation of an implicitly restarted block-Lanczos method for computing a few selected nearby eigenvalues and associated eigenvectors of a large, possibly sparse, Hermitian matrix A, which makes it well suited for large-scale problems.

...read moreread less

Abstract: The irbleigs code is an implementation of an implicitly restarted block-Lanczos method for computing a few selected nearby eigenvalues and associated eigenvectors of a large, possibly sparse, Hermitian matrix A. The code requires only the evaluation of matrix-vector products with A; in particular, factorization of A is not demanded, nor is the solution of linear systems of equations with the matrix A. This, together with a fairly small storage requirement, makes the irbleigs code well suited for large-scale problems. Applications of the irbleigs code to certain generalized eigenvalue problems and to the computation of a few singular values and associated singular vectors are also discussed. Numerous computed examples illustrate the performance of the method and provide comparisons with other available codes.

...read moreread less

Journal Article•DOI•

Interpolation of geophysical data using continuous global surfaces

[...]

Stephen D. Billings¹, Rick Beatson², Garry N. Newsam³•Institutions (3)

University of British Columbia¹, University of Canterbury², Defence Science and Technology Organisation³

01 Nov 2002-Geophysics

TL;DR: A three‐step process to replace the direct inversion techniques with iterative methods such as conjugate gradients and use preconditioning to cluster the eigenvalues of the interpolation matrix and hence speed convergence is outlined.

...read moreread less

Abstract: A wide class of interpolation methods, including thin‐plate and tension splines, kriging, sinc functions, equivalent‐source, and radial basis functions, can be encompassed in a common mathematical framework involving continuous global surfaces (CGSs). The difficulty in applying these techniques to geophysical data sets has been the computational and memory requirements involved in solving the large, dense matrix equations that arise. We outline a three‐step process for reducing the computational requirements: (1) replace the direct inversion techniques with iterative methods such as conjugate gradients; (2) use preconditioning to cluster the eigenvalues of the interpolation matrix and hence speed convergence; and (3) compute the matrix–vector product required at each iteration with a fast multipole or fast moment method.We apply the new methodology to a regional gravity compilation with a highly heterogeneous sampling density. The industry standard minimum‐curvature algorithms and several scale‐dependent ...

...read moreread less

Journal Article•DOI•

Multiscale Bases for the Sparse Representation of Boundary Integral Operators on Complex Geometry

[...]

Johannes Tausch¹, Jacob K. White•Institutions (1)

Southern Methodist University¹

01 May 2002-SIAM Journal on Scientific Computing

TL;DR: The new feature presented here is to construct the basis in a hierarchical decomposition of the three-space and not, as in previous approaches, in a parameter space of the boundary manifold, which leads to sparse representations of the operator.

...read moreread less

Abstract: A multilevel transform is introduced to represent discretizations of integral operators from potential theory by nearly sparse matrices. The new feature presented here is to construct the basis in a hierarchical decomposition of the three-space and not, as in previous approaches, in a parameter space of the boundary manifold. This construction leads to sparse representations of the operator even for geometrically complicated, multiply connected domains. We will demonstrate that the numerical cost to apply a vector to the operator using the nonstandard form is essentially equal to performing the same operation with the fast multipole method. With a second compression scheme the multiscale approach can be further optimized. The diagonal blocks of the transformed matrix can be used as an inexpensive preconditioner which is empirically shown to reduce the condition number of discretizations of the single layer operator so as to be independent of mesh size.

...read moreread less

Journal Article•DOI•

On the Relations between ILUs and Factored Approximate Inverses

[...]

Matthias Bollhöfer¹, Yousef Saad•Institutions (1)

Technical University of Berlin¹

01 Jan 2002-SIAM Journal on Matrix Analysis and Applications

TL;DR: The paper shows that certain forms of approximate inverse techniques amount to approximately inverting the triangular factors obtained from some variants of ILU factorization of the original matrix.

...read moreread less

Abstract: This paper discusses some relationships between ILU factorization techniques and factored sparse approximate inverse techniques. While ILU factorizations compute approximate LU factors of the coefficient matrix A, approximate inverse techniques aim at building triangular matrices Z and W such that $W^\top AZ$ is approximately diagonal. The paper shows that certain forms of approximate inverse techniques amount to approximately inverting the triangular factors obtained from some variants of ILU factorization of the original matrix. A few useful applications of these relationships will be discussed.

...read moreread less

Journal Article•DOI•

Improved Symbolic and Numerical Factorization Algorithms for Unsymmetric Sparse Matrices

[...]

Anshul Gupta

01 Feb 2002-SIAM Journal on Matrix Analysis and Applications

TL;DR: Two algorithms for the symbolic and numerical factorization phases in the direct solution of sparse unsymmetric systems of linear equations have been implemented in WSMP and have enabled WSMP to significantly outperform other similar solvers.

...read moreread less

Abstract: We present algorithms for the symbolic and numerical factorization phases in the direct solution of sparse unsymmetric systems of linear equations. We have modified a classical symbolic factorization algorithm for unsymmetric matrices to inexpensively compute minimal elimination structures. We give an efficient algorithm to compute a near-minimal data-dependency graph for unsymmetric multifrontal factorization that is valid irrespective of the amount of dynamic pivoting performed during factorization. Finally, we describe an unsymmetric-pattern multifrontal algorithm for Gaussian elimination with partial pivoting that uses the task- and data-dependency graphs computed during the symbolic phase. These algorithms have been implemented in WSMP---an industrial strength sparse solver package---and have enabled WSMP to significantly outperform other similar solvers. We present experimental results to demonstrate the merits of the new algorithms.

...read moreread less

Book Chapter•DOI•

Analysis of Bernstein's Factorization Circuit

[...]

Arjen K. Lenstra¹, Adi Shamir², Jim Tomlinson, Eran Tromer²•Institutions (2)

Eindhoven University of Technology¹, Weizmann Institute of Science²

01 Dec 2002

TL;DR: In this paper, the authors show that the security of RSA relies exclusively on the hardness of the relation collection step of the number field sieve, and they conclude that from a practical standpoint, the RSA relies on hardness of RSA.

...read moreread less

Abstract: In [1], Bernstein proposed a circuit-based implementation of the matrix step of the number field sieve factorization algorithm. These circuits offer an asymptotic cost reduction under the measure "construction cost × run time". We evaluate the cost of these circuits, in agreement with [1], but argue that compared to previously known methods these circuits can factor integers that are 1.17 times larger, rather than 3.01 as claimed (and even this, only under the non-standard cost measure). We also propose an improved circuit design based on a new mesh routing algorithm, and show that for factorization of 1024-bit integers the matrix step can, under an optimistic assumption about the matrix size, be completed within a day by a device that costs a few thousand dollars. We conclude that from a practical standpoint, the security of RSA relies exclusively on the hardness of the relation collection step of the number field sieve.

...read moreread less

Journal Article•DOI•

Inertia-controlling factorizations for optimization algorithms

[...]

Anders Forsgren¹•Institutions (1)

Royal Institute of Technology¹

01 Oct 2002-Applied Numerical Mathematics

TL;DR: The issues of an inertia-controlling factorization are described and an off-the shelf sparse factorization routine can be used, where the pivot selection is based on sparsity and numerical stability.

...read moreread less

Journal Article•DOI•

Estimating the fundamental matrix via constrained least-squares: a convex approach

[...]

Graziano Chesi¹, Andrea Garulli¹, Antonio Vicino¹, Roberto Cipolla²•Institutions (2)

University of Siena¹, University of Cambridge²

01 Mar 2002-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The obtained estimate of the fundamental matrix turns out to be more accurate than the one provided by the linear criterion, where the rank constraint of the matrix is imposed after its computation by setting the smallest singular value to zero.

...read moreread less

Abstract: In this paper, a new method for the estimation of the fundamental matrix from point correspondences in stereo vision is presented. The minimization of the algebraic error is performed while taking explicitly into account the rank-two constraint on the fundamental matrix. It is shown how this nonconvex optimization problem can be solved avoiding local minima by using recently developed convexification techniques. The obtained estimate of the fundamental matrix turns out to be more accurate than the one provided by the linear criterion, where the rank constraint of the matrix is imposed after its computation by setting the smallest singular value to zero. This suggests that the proposed estimate can be used to initialize nonlinear criteria, such as the distance to epipolar lines and the gradient criterion, in order to obtain a more accurate estimate of the fundamental matrix.

...read moreread less

Journal Article•DOI•

Semidefinite programming for discrete optimization and matrix completion problems

[...]

Henry Wolkowicz¹, Miguel F. Anjos¹•Institutions (1)

University of Waterloo¹

15 Nov 2002-Discrete Applied Mathematics

TL;DR: A recipe for finding SDP relaxations based on adding redundant constraints and using Lagrangian relaxation is presented and a new application of SDP to find approximate matrix completions for large and sparse instances of Euclidean distance matrices is concluded.

...read moreread less

Book Chapter•DOI•

Combining performance aspects of irregular gauss-seidel via sparse tiling

[...]

Michelle Mills Strout¹, Larry Carter¹, Jeanne Ferrante¹, Jonathan Freeman¹, Barbara Kreaseck¹ - Show less +1 more•Institutions (1)

University of California, San Diego¹

25 Jul 2002

TL;DR: The results show the effectiveness of Gauss-Seidel parallelized with sparse tiling techniques on shared memory machines, specifically compared to owner-computes based Gaussian Seidel methods, which employ only parallelism and intra-iteration locality.

...read moreread less

Abstract: Finite Element problems are often solved using multigrid techniques. The most time consuming part of multigrid is the iterative smoother, such as Gauss-Seidel. To improve performance, iterative smoothers can exploit parallelism, intra-iteration data reuse, and inter-iteration data reuse. Current methods for parallelizing Gauss-Seidel on irregular grids, such as multi-coloring and owner-computes based techniques, exploit parallelism and possibly intra-iteration data reuse but not inter-iteration data reuse. Sparse tiling techniques were developed to improve intra-iteration and inter-iteration data locality in iterative smoothers. This paper describes how sparse tiling can additionally provide parallelism. Our results show the effectiveness of Gauss-Seidel parallelized with sparse tiling techniques on shared memory machines, specifically compared to owner-computes based Gauss-Seidel methods. The latter employ only parallelism and intra-iteration locality. Our results support the premise that better performance occurs when all three performance aspects (parallelism, intra-iteration, and inter-iteration data locality) are combined.

...read moreread less

Automatic Performance Tuning and Analysis of Sparse Triangular Solve

[...]

Richard Vuduc Shoaib, Kamil Jen Hsu, Rajesh Nishtala, James Demmel, Katherine Yelick - Show less +1 more

01 Jan 2002

TL;DR: This paper addresses the problem of building high-performance uniprocessor implementations of sparse triangular solve (SpTS) automatically and describes fully automatic hybrid off-line/on-line heuristics for selecting the key tuning parameters: the register block size and the point at which to use the dense algorithm.

...read moreread less

Abstract: We address the problem of building high-performance uniprocessor implementations of sparse triangular solve (SpTS) automatically. This computational kernel is often the bottleneck in a variety of scientific and engineering applications that require the direct solution of sparse linear systems. Performance tuning of SpTS—and sparse matrix kernels in general—is a tedious and time-consuming task, because performance depends on the complex interaction of many factors: the performance gap between processors and memory, the limits on the scope of compiler analyses and transformations, and the overhead of manipulating sparse data structures. Consequently, it is not unusual to see kernels such as SpTS run at under 10% of peak uniprocessor floating point performance. Our approach to automatic tuning of SpTS builds on prior experience with building tuning systems for sparse matrix-vector multiply (SpM×V) [21, 22, 40], and dense matrix kernels [8, 41]. In particular, we adopt the two-step methodology of previous approaches: (1) we identify and generate a set of reasonable candidate implementations, and (2) search this set for the fastest implementation by some combination of performance modeling and actually executing the implementations. In this paper, we consider the solution of the sparse lower triangular system Lx = y for a single dense vector x, given the lower triangular sparse matrix L and dense vector y. We refer to x as the solution vector and y as the right-hand side (RHS). Many of the lower triangular factors we have observed from sparse LU factorization have a large, dense triangle in the lower right-hand corner of the matrix; this trailing triangle can account for as much as 90% of the matrix non-zeros. Therefore, we consider both algorithmic and data structure reorganizations which partition the solve into a sparse phase and a dense phase. To the sparse phase, we adapt the register blocking optimization, previously proposed for sparse matrix-vector multiply (SpM×V) in the Sparsity system [21, 22], to the SpTS kernel; to the dense phase, we make judicious use of highly tuned BLAS routines by switching to a dense implementation (switch-to-dense optimization). We describe fully automatic hybrid off-line/on-line heuristics for selecting the key tuning parameters: the register block size and the point at which to use the dense algorithm. (See Section 2.) We then evaluate the performance of our optimized implementations relative to the fundamental limits on performance. Specifically, we first derive simple models of the upper bounds on the execution rate (Mflop/s) of our implementations. Using hardware counter data collected with the PAPI library [10], we then verify our models on three hardware platforms (Table 1) and a set of triangular factors from applications (Table 2). We observe that our optimized implementations can achieve 80% or more of these bounds; furthermore, we observe speedups of up to 1.8x when both register blocking and switch-to-dense optimizations are applied. We also present preliminary results confirming that our heuristics choose reasonable values for the tuning parameters. These results support our prior findings with SpM×V [40], suggesting two new directions for performance enhancements: (1) the use of higher-level matrix structures (e.g., matrix reordering and multiple register block sizes), and (2) optimizing kernels with more opportunities for data reuse (e.g., multiplication and solve with multiple vectors, multiplication of AA by a vector).

...read moreread less

Journal Article•DOI•

Implicit SUPG solution of Euler equations using edge-based data structures

[...]

Lucia Catabriga¹, Alvaro L. G. A. Coutinho²•Institutions (2)

Universidade Federal do Espírito Santo¹, Federal University of Rio de Janeiro²

07 Jun 2002-Computer Methods in Applied Mechanics and Engineering

TL;DR: This work presents an implicit, edge-based implementation of the semi-discrete SUPG formulation with shock-capturing for the Euler equations in conservative variables that requires less memory and CPU time than element-based implementations.

...read moreread less

Proceedings Article•DOI•

Window-Based Susceptance Models for Large-Scale RLC Circuit Analyses

[...]

Hui Zheng¹, Byron L. Krauter², Michael W. Beattie¹, Larry Pileggi¹•Institutions (2)

Carnegie Mellon University¹, IBM²

04 Mar 2002

TL;DR: This paper demonstrates a localized, window-based extraction and simulation methodology that employs the recently proposed susceptance (the inverse of inductance matrix) concept, and provides a qualitative explanation for the efficacy and how it facilitates pre-manufacturing simulations that would otherwise be intractable.

...read moreread less

Abstract: Due to the increasing operating frequencies and the manner in which the corresponding integrated circuits and systems must be designed, the extraction, modeling and simulation of the magnetic couplings for final design verification can be a daunting task. In general, when modeling inductance and the associated return paths, one must consider the on-chip conductors as well as the system packaging. This can result in an RLC circuit size that is impractical for traditional simulators. In this paper we demonstrate a localized, window-based extraction and simulation methodology that employs the recently proposed susceptance (the inverse of inductance matrix) concept. We provide a qualitative explanation for the efficacy of this approach, and demonstrate how it facilitates pre-manufacturing simulations that would otherwise be intractable. A critical aspect of this simulation efficiency is owed to a susceptance-based circuit formation that we prove to be symmetric positive definite. This property, along with the sparsity of the susceptance matrix, enables the use of some advanced sparse matrix solvers. lye demonstrate this extraction and simulation methodology on some industrial examples.

...read moreread less

Journal Article•DOI•

Finding Exact and Approximate Block Structures for ILU Preconditioning

[...]

Yousef Saad

01 Apr 2002-SIAM Journal on Scientific Computing

TL;DR: A standard "graph compression" algorithm used in direct sparse matrix methods is considered along with two other algorithms which are also capable of unraveling approximate block structures.

...read moreread less

Abstract: Sparse matrices which arise in many applications often possess a block structure that can be exploited in iterative and direct solution methods. These block-matrices have as their entries small dense blocks with constant or variable dimensions. Block versions of incomplete LU factorizations which have been developed to take advantage of such structures give rise to a class of preconditioners that are among the most effective available. This paper presents general techniques for automatically determining block structures in sparse matrices. A standard "graph compression" algorithm used in direct sparse matrix methods is considered along with two other algorithms which are also capable of unraveling approximate block structures.

...read moreread less

Proceedings Article•DOI•

Normalized natural gradient adaptive filtering for sparse and non-sparse systems

[...]

Scott C. Douglas¹•Institutions (1)

Southern Methodist University¹

13 May 2002

TL;DR: It is shown that the so-called proportionate normalized least mean squares (PNLMS) algorithm, an adaptive filter that converges quickly for sparse solutions, is in fact an NNG on a certain parameter space warping, and by choosing a warping that favors diverse or dense impulse responses, a new adaptive algorithm is obtained.

...read moreread less

Abstract: This paper introduces a class of normalized natural gradient algorithms (NNGs) for adaptive filtering tasks. Natural gradient techniques are useful for generating relatively simple adaptive filtering algorithms where the space of the adaptive coefficients is curved or warped with respect to Euclidean space. The advantage of normalizing gradient adaptive filters is that constant rates of convergence for signals with wide dynamic ranges may be achieved. We show that the so-called proportionate normalized least mean squares (PNLMS) algorithm, an adaptive filter that converges quickly for sparse solutions, is in fact an NNG on a certain parameter space warping. We also show that by choosing a warping that favors diverse or dense impulse responses, we may obtain a new adaptive algorithm, the inverse proportionate NLMS (INLMS) algorithm. This procedure converges quickly to and accurately tracks non-sparse impulse responses.

...read moreread less

Journal Article•DOI•

MSP: A Class of Parallel Multistep Successive Sparse Approximate Inverse Preconditioning Strategies

[...]

Kai Wang, Jun Zhang

01 Apr 2002-SIAM Journal on Scientific Computing

TL;DR: A class of parallel multistep successive preconditionsing strategies to enhance efficiency and robustness of standard sparse approximate inverse preconditioning techniques are developed.

...read moreread less

Abstract: We develop a class of parallel multistep successive preconditioning strategies to enhance efficiency and robustness of standard sparse approximate inverse preconditioning techniques. The key idea is to compute a series of simple sparse matrices to approximate the inverse of the original matrix. Studies are conducted to show the advantages of such an approach in terms of both improving preconditioning accuracy and reducing computational cost, compared to the standard sparse approximate inverse preconditioners. Numerical experiments using one prototype implementation to solve a few sparse matrices on a distributed memory parallel computer are reported.

...read moreread less

Collapse