scispace - formally typeset
Search or ask a question

Showing papers on "Sparse matrix published in 2002"


Proceedings ArticleDOI
07 Nov 2002
TL;DR: A simple yet efficient multiplicative algorithm for finding the optimal values of the hidden components of non-negative sparse coding and how the basis vectors can be learned from the observed data is shown.
Abstract: Non-negative sparse coding is a method for decomposing multivariate data into non-negative sparse components. We briefly describe the motivation behind this type of data representation and its relation to standard sparse coding and non-negative matrix factorization. We then give a simple yet efficient multiplicative algorithm for finding the optimal values of the hidden components. In addition, we show how the basis vectors can be learned from the observed data. Simulations demonstrate the effectiveness of the proposed method.

871 citations


Book ChapterDOI
21 Apr 2002
TL;DR: Experiments demonstrate that a wide set of unsymmetric linear systems can be solved and high performance is consistently achieved for large sparse unsympetric matrices from real world applications.
Abstract: Supernode pivoting for unsymmetric matrices coupled with supernode partitioning and asynchronous computation can achieve high gigaflop rates for parallel sparse LU factorization on shared memory parallel computers. The progress in weighted graph matching algorithms helps to extend these concepts further and prepermutation of rows is used to place large matrix entries on the diagonal. Supernode pivoting allows dynamical interchanges of columns and rows during the factorization process. The BLAS-3 level efficiency is retained. An enhanced left-right looking scheduling scheme is uneffected and results in good speedup on SMP machines without increasing the operation count. These algorithms have been integrated into the recent unsymmetric version of the PARDISO solver. Experiments demonstrate that a wide set of unsymmetric linear systems can be solved and high performance is consistently achieved for large sparse unsymmetric matrices from real world applications.

323 citations


Proceedings ArticleDOI
07 Aug 2002
TL;DR: In this article, a three-phase AC-DC-AC sparse matrix converter (SMC) with no energy storage elements in the DC link and employing only 15 IGBTs was proposed.
Abstract: A novel three-phase AC-DC-AC sparse matrix converter (SMC) having no energy storage elements in the DC link and employing only 15 IGBTs as opposed to 18 IGBTs of a functionally equivalent conventional AC-AC matrix converter (CMC) is proposed. It is shown that the realization effort could be further reduced to only 9 IGBTs (ultra sparse matrix converter, USMC) in case the phase displacement of the fundamentals of voltage and current at the input and at the output is limited to /spl plusmn//spl pi//6. The dependency of the voltage and current transfer ratios of the systems on the operating parameters is analyzed and a space vector modulation scheme is described in combination with a zero current commutation procedure. Furthermore, a safe multi-step current commutation concept is treated briefly. Conduction and switching losses of the SMC and USMC are calculated in analytically closed form. Finally, the theoretical results are verified in Part II of the paper by digital simulations and results of a first experimental investigation of a 10 kW/400 V SMC prototype are given.

270 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a sparse minimum-variance reconstructor for a conventional natural guide star AO system using a sparse approximation for turbulence statistics and recognizing that the nonsparse matrix terms arising from LGS position uncertainty are low-rank adjustments that can be evaluated by using the matrix inversion lemma.
Abstract: The complexity of computing conventional matrix multiply wave-front reconstructors scales as O(n3) for most adaptive optical (AO) systems, where n is the number of deformable mirror (DM) actuators. This is impractical for proposed systems with extremely large n. It is known that sparse matrix methods improve this scaling for least-squares reconstructors, but sparse techniques are not immediately applicable to the minimum-variance reconstructors now favored for multiconjugate adaptive optical (MCAO) systems with multiple wave-front sensors (WFSs) and DMs. Complications arise from the nonsparse statistics of atmospheric turbulence, and the global tip/tilt WFS measurement errors associated with laser guide star (LGS) position uncertainty. A description is given of how sparse matrix methods can still be applied by use of a sparse approximation for turbulence statistics and by recognizing that the nonsparse matrix terms arising from LGS position uncertainty are low-rank adjustments that can be evaluated by using the matrix inversion lemma. Sample numerical results for AO and MCAO systems illustrate that the approximation made to turbulence statistics has negligible effect on estimation accuracy, the time to compute the sparse minimum-variance reconstructor for a conventional natural guide star AO system scales as O(n3/2) and is only a few seconds for n = 3500, and sparse techniques reduce the reconstructor computations by a factor of 8 for sample MCAO systems with 2417 DM actuators and 4280 WFS subapertures. With extrapolation to 9700 actuators and 17,120 subapertures, a reduction by a factor of approximately 30 or 40 to 1 is predicted.

178 citations


Journal ArticleDOI
TL;DR: In this article, a primal-dual interior point method for optimal power flow dispatching (OPFD) has been proposed, which is a direct extension of primal dual methods for linear programming.
Abstract: In this paper, the solution of the optimal power flow dispatching (OPFD) problem by a primal-dual interior point method is considered. Several primal-dual methods for optimal power flow (OPF) have been suggested, all of which are essentially direct extensions of primal-dual methods for linear programming. The aim of the present work is to enhance convergence through two modifications: a filter technique to guide the choice of the step length and an altered search direction in order to avoid convergence to a nonminimizing stationary point. A reduction in computational time is also gained through solving a positive definite matrix for the search direction. Numerical tests on standard IEEE systems and on a realistic network are very encouraging and show that the new algorithm converges where other algorithms fail.

143 citations


Proceedings ArticleDOI
16 Nov 2002
TL;DR: Upper and lower bounds on the performance (Mflop/s) of SpM×V when tuned using the previously proposed register blocking optimization are developed and a new heuristic is presented that selects optimal or near-optimal register block sizes more accurately than the previous heuristic.
Abstract: We consider performance tuning, by code and data structure reorganization, of sparse matrix-vector multiply (SpM × V), one of the most important computational kernels in scientific applications. This paper addresses the fundamental questions of what limits exist on such performance tuning, and how closely tuned code approaches these limits.Specifically, we develop upper and lower bounds on the performance (Mflop/s) of SpM × V when tuned using our previously proposed register blocking optimization. These bounds are based on the non-zero pattern in the matrix and the cost of basic memory operations, such as cache hits and misses. We evaluate our tuned implementations with respect to these bounds using hardware counter data on 4 different platforms and on a test set of 44 sparse matrices. We find that we can often get within 20% of the upper bound, particularly on a class of matrices from finite element modeling (FEM) problems; on non-FEM matrices, performance improvements of 2× are still possible. Lastly, we present a new heuristic that selects optimal or near-optimal register block sizes (the key tuning parameters) more accurately than our previous heuristic. Using the new heuristic, we show improvements in SpM × V performance (Mflop/s) by as much as 2.5× over an untuned implementation.Collectively, our results suggest that future performance improvements, beyond those that we have already demonstrated for SpM × V, will come from two sources: (1) consideration of higher-level matrix structures (e.g., exploiting symmetry, matrix reordering, multiple register block sizes), and (2) optimizing kernels with more opportunity for data reuse (e.g., sparse matrix-multiple vector multiply, multiplication of ATA by a vector).

135 citations


Journal ArticleDOI
TL;DR: The interface design for the Sparse Basic Linear Algebra Subprograms (BLAS) is discussed, the kernels in the recent standard that are concerned with unstructured sparse matrices are discussed, and how this interface can shield one from concern over the specific storage scheme for the sparse matrix.
Abstract: We discuss the interface design for the Sparse Basic Linear Algebra Subprograms (BLAS), the kernels in the recent standard from the BLAS Technical Forum that are concerned with unstructured sparse matrices. The motivation for such a standard is to encourage portable programming while allowing for library-specific optimizations. In particular, we show how this interface can shield one from concern over the specific storage scheme for the sparse matrix. This design makes it easy to add further functionality to the sparse BLAS in the future.We illustrate the use of the Sparse BLAS with examples in the three supported programming languages, Fortran 95, Fortran 77, and C.

133 citations


Patent
11 Jan 2002
TL;DR: In this paper, a new data structure and algorithms which offer at least equal performance in common sparse matrix tasks, and improved performance in many, were proposed and applied to a word document index to produce fast build and query times for document retrieval.
Abstract: A new data structure and algorithms which offer at least equal performance in common sparse matrix tasks, and improved performance in many. This is applied to a word-document index to produce fast build and query times for document retrieval.

117 citations


Journal ArticleDOI
Anshul Gupta1
TL;DR: The experiments show that the algorithmic choices made in WSMP enable it to run more than twice as fast as the best among similar solvers and that WSMP can factor some of the largest sparse matrices available from real applications in only a few seconds on a 4-CPU workstation.
Abstract: During the past few years, algorithmic improvements alone have reduced the time required for the direct solution of unsymmetric sparse systems of linear equations by almost an order of magnitude. This paper compares the performance of some well-known software packages for solving general sparse systems. In particular, it demonstrates the consistently high level of performance achieved by WSMP---the most recent of such solvers. It compares the various algorithmic components of these solvers and discusses their impact on solver performance. Our experiments show that the algorithmic choices made in WSMP enable it to run more than twice as fast as the best among similar solvers and that WSMP can factor some of the largest sparse matrices available from real applications in only a few seconds on a 4-CPU workstation. Thus, the combination of advances in hardware and algorithms makes it possible to solve those general sparse linear systems quickly and easily that might have been considered too large until recently.

110 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a branch-based state estimation method for radial distribution systems that can handle most kinds of real-time measurements, which decomposes the WLS problem of a whole system into a series of WLS subproblems, and each subproblem is to deal with only single branch state estimation.
Abstract: This paper presents a new branch-based state-estimation method that is an estimation technique for radial distribution systems that can handle most kinds of real-time measurements. In contrast to the traditional weighted-least-square (WLS) method, the idea of this algorithm is to decompose the WLS problem of a whole system into a series of WLS subproblems, and each subproblem is to deal with only single branch state estimation. This approach can be implemented in a forward/backward sweep scheme for radial distribution systems and does not need the sparse matrix technique. Test results of a large-scale practical distribution system in China show that the proposed method is valid and efficient.

101 citations


Journal ArticleDOI
TL;DR: Improvements are offered for the efficiency and applicability of preconditioners on linear algebra problems over finite fields, but most results are valid for entries from arbitrary fields.

Journal ArticleDOI
TL;DR: The compactly supported radial basis functions (CSRBFs) are presented in solving a system of shallow water hydrodynamics equations and the resulting banded matrix has shown improvement in both ill-conditioning and computational efficiency.

Proceedings ArticleDOI
23 Oct 2002
TL;DR: An improved version of the Bi CGStab (IBiCGStab) method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices is proposed, which combines elements of numerical stability and parallel algorithm design without increasing the computational costs.
Abstract: In this paper, an improved version of the BiCGStab (IBiCGStab) method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices is proposed. The method combines elements of numerical stability and parallel algorithm design without increasing the computational costs. The algorithm is derived such that all inner products of a single iteration step are independent and communication time required for the inner product can be overlapped efficiently with computation time of vector updates. Therefore, the cost of global communication which represents the bottleneck of the parallel performance can be significantly reduced. The resulting IBiCGStab algorithm maintains the favorable properties of the original method while not increasing computational costs. Data distribution suitable for both irregularly and regularly structured matrices based on the analysis of the nonzero matrix elements is presented. Communication scheme is supported by overlapping execution of computation and communication to reduce waiting times. The efficiency of this method is demonstrated by numerical experimental results carried out on a massively parallel distributed memory system.

01 Jan 2002
TL;DR: In this article, a three-phase AC-DC-AC Sparse Matrix Converter (SMC) with no energy storage elements in the DC link and employing only 15 IGBTs (USMC) was proposed, where the phase displacement of the voltage and current at the input and at the output is limited to ±π/6.
Abstract: A novel three-phase AC-DC-AC Sparse Matrix Converter (SMC) having no energy storage elements in the DC link and employing only 15 IGBTs as opposed to 18 IGBTs of a functionally equivalent conventional AC-AC matrix converter (CMC) is pro- posed. It is shown that the realization effort could be further reduced to only 9 IGBTs (Ultra Sparse Matrix Converter, USMC) in case the phase displacement of the fundamentals of voltage and current at the input and at the output is limited to ±π/6. The dependency of the voltage and current transfer ratios of the systems on the operating parameters is analyzed and a space vector modulation scheme is described in combination with a zero current commutation proce- dure. Furthermore, a safe multi-step current commutation concept is treated briefly. Conduction and switching losses of the SMC and USMC are calculated in analytically closed form. Finally, the theoretical results are verified in Part II of the paper by digital simulations and results of a first experimental investigation of a 10kW/400V SMC prototype are given.

Journal ArticleDOI
TL;DR: The irbleigs code is an implementation of an implicitly restarted block-Lanczos method for computing a few selected nearby eigenvalues and associated eigenvectors of a large, possibly sparse, Hermitian matrix A, which makes it well suited for large-scale problems.
Abstract: The irbleigs code is an implementation of an implicitly restarted block-Lanczos method for computing a few selected nearby eigenvalues and associated eigenvectors of a large, possibly sparse, Hermitian matrix A. The code requires only the evaluation of matrix-vector products with A; in particular, factorization of A is not demanded, nor is the solution of linear systems of equations with the matrix A. This, together with a fairly small storage requirement, makes the irbleigs code well suited for large-scale problems. Applications of the irbleigs code to certain generalized eigenvalue problems and to the computation of a few singular values and associated singular vectors are also discussed. Numerous computed examples illustrate the performance of the method and provide comparisons with other available codes.

Journal ArticleDOI
TL;DR: A three‐step process to replace the direct inversion techniques with iterative methods such as conjugate gradients and use preconditioning to cluster the eigenvalues of the interpolation matrix and hence speed convergence is outlined.
Abstract: A wide class of interpolation methods, including thin‐plate and tension splines, kriging, sinc functions, equivalent‐source, and radial basis functions, can be encompassed in a common mathematical framework involving continuous global surfaces (CGSs). The difficulty in applying these techniques to geophysical data sets has been the computational and memory requirements involved in solving the large, dense matrix equations that arise. We outline a three‐step process for reducing the computational requirements: (1) replace the direct inversion techniques with iterative methods such as conjugate gradients; (2) use preconditioning to cluster the eigenvalues of the interpolation matrix and hence speed convergence; and (3) compute the matrix–vector product required at each iteration with a fast multipole or fast moment method.We apply the new methodology to a regional gravity compilation with a highly heterogeneous sampling density. The industry standard minimum‐curvature algorithms and several scale‐dependent ...

Journal ArticleDOI
TL;DR: The new feature presented here is to construct the basis in a hierarchical decomposition of the three-space and not, as in previous approaches, in a parameter space of the boundary manifold, which leads to sparse representations of the operator.
Abstract: A multilevel transform is introduced to represent discretizations of integral operators from potential theory by nearly sparse matrices. The new feature presented here is to construct the basis in a hierarchical decomposition of the three-space and not, as in previous approaches, in a parameter space of the boundary manifold. This construction leads to sparse representations of the operator even for geometrically complicated, multiply connected domains. We will demonstrate that the numerical cost to apply a vector to the operator using the nonstandard form is essentially equal to performing the same operation with the fast multipole method. With a second compression scheme the multiscale approach can be further optimized. The diagonal blocks of the transformed matrix can be used as an inexpensive preconditioner which is empirically shown to reduce the condition number of discretizations of the single layer operator so as to be independent of mesh size.

Journal ArticleDOI
TL;DR: The paper shows that certain forms of approximate inverse techniques amount to approximately inverting the triangular factors obtained from some variants of ILU factorization of the original matrix.
Abstract: This paper discusses some relationships between ILU factorization techniques and factored sparse approximate inverse techniques. While ILU factorizations compute approximate LU factors of the coefficient matrix A, approximate inverse techniques aim at building triangular matrices Z and W such that $W^\top AZ$ is approximately diagonal. The paper shows that certain forms of approximate inverse techniques amount to approximately inverting the triangular factors obtained from some variants of ILU factorization of the original matrix. A few useful applications of these relationships will be discussed.

Journal ArticleDOI
TL;DR: Two algorithms for the symbolic and numerical factorization phases in the direct solution of sparse unsymmetric systems of linear equations have been implemented in WSMP and have enabled WSMP to significantly outperform other similar solvers.
Abstract: We present algorithms for the symbolic and numerical factorization phases in the direct solution of sparse unsymmetric systems of linear equations. We have modified a classical symbolic factorization algorithm for unsymmetric matrices to inexpensively compute minimal elimination structures. We give an efficient algorithm to compute a near-minimal data-dependency graph for unsymmetric multifrontal factorization that is valid irrespective of the amount of dynamic pivoting performed during factorization. Finally, we describe an unsymmetric-pattern multifrontal algorithm for Gaussian elimination with partial pivoting that uses the task- and data-dependency graphs computed during the symbolic phase. These algorithms have been implemented in WSMP---an industrial strength sparse solver package---and have enabled WSMP to significantly outperform other similar solvers. We present experimental results to demonstrate the merits of the new algorithms.

Book ChapterDOI
01 Dec 2002
TL;DR: In this paper, the authors show that the security of RSA relies exclusively on the hardness of the relation collection step of the number field sieve, and they conclude that from a practical standpoint, the RSA relies on hardness of RSA.
Abstract: In [1], Bernstein proposed a circuit-based implementation of the matrix step of the number field sieve factorization algorithm. These circuits offer an asymptotic cost reduction under the measure "construction cost × run time". We evaluate the cost of these circuits, in agreement with [1], but argue that compared to previously known methods these circuits can factor integers that are 1.17 times larger, rather than 3.01 as claimed (and even this, only under the non-standard cost measure). We also propose an improved circuit design based on a new mesh routing algorithm, and show that for factorization of 1024-bit integers the matrix step can, under an optimistic assumption about the matrix size, be completed within a day by a device that costs a few thousand dollars. We conclude that from a practical standpoint, the security of RSA relies exclusively on the hardness of the relation collection step of the number field sieve.

Journal ArticleDOI
TL;DR: The issues of an inertia-controlling factorization are described and an off-the shelf sparse factorization routine can be used, where the pivot selection is based on sparsity and numerical stability.

Journal ArticleDOI
TL;DR: The obtained estimate of the fundamental matrix turns out to be more accurate than the one provided by the linear criterion, where the rank constraint of the matrix is imposed after its computation by setting the smallest singular value to zero.
Abstract: In this paper, a new method for the estimation of the fundamental matrix from point correspondences in stereo vision is presented. The minimization of the algebraic error is performed while taking explicitly into account the rank-two constraint on the fundamental matrix. It is shown how this nonconvex optimization problem can be solved avoiding local minima by using recently developed convexification techniques. The obtained estimate of the fundamental matrix turns out to be more accurate than the one provided by the linear criterion, where the rank constraint of the matrix is imposed after its computation by setting the smallest singular value to zero. This suggests that the proposed estimate can be used to initialize nonlinear criteria, such as the distance to epipolar lines and the gradient criterion, in order to obtain a more accurate estimate of the fundamental matrix.

Journal ArticleDOI
TL;DR: A recipe for finding SDP relaxations based on adding redundant constraints and using Lagrangian relaxation is presented and a new application of SDP to find approximate matrix completions for large and sparse instances of Euclidean distance matrices is concluded.

Book ChapterDOI
25 Jul 2002
TL;DR: The results show the effectiveness of Gauss-Seidel parallelized with sparse tiling techniques on shared memory machines, specifically compared to owner-computes based Gaussian Seidel methods, which employ only parallelism and intra-iteration locality.
Abstract: Finite Element problems are often solved using multigrid techniques. The most time consuming part of multigrid is the iterative smoother, such as Gauss-Seidel. To improve performance, iterative smoothers can exploit parallelism, intra-iteration data reuse, and inter-iteration data reuse. Current methods for parallelizing Gauss-Seidel on irregular grids, such as multi-coloring and owner-computes based techniques, exploit parallelism and possibly intra-iteration data reuse but not inter-iteration data reuse. Sparse tiling techniques were developed to improve intra-iteration and inter-iteration data locality in iterative smoothers. This paper describes how sparse tiling can additionally provide parallelism. Our results show the effectiveness of Gauss-Seidel parallelized with sparse tiling techniques on shared memory machines, specifically compared to owner-computes based Gauss-Seidel methods. The latter employ only parallelism and intra-iteration locality. Our results support the premise that better performance occurs when all three performance aspects (parallelism, intra-iteration, and inter-iteration data locality) are combined.

01 Jan 2002
TL;DR: This paper addresses the problem of building high-performance uniprocessor implementations of sparse triangular solve (SpTS) automatically and describes fully automatic hybrid off-line/on-line heuristics for selecting the key tuning parameters: the register block size and the point at which to use the dense algorithm.
Abstract: We address the problem of building high-performance uniprocessor implementations of sparse triangular solve (SpTS) automatically. This computational kernel is often the bottleneck in a variety of scientific and engineering applications that require the direct solution of sparse linear systems. Performance tuning of SpTS—and sparse matrix kernels in general—is a tedious and time-consuming task, because performance depends on the complex interaction of many factors: the performance gap between processors and memory, the limits on the scope of compiler analyses and transformations, and the overhead of manipulating sparse data structures. Consequently, it is not unusual to see kernels such as SpTS run at under 10% of peak uniprocessor floating point performance. Our approach to automatic tuning of SpTS builds on prior experience with building tuning systems for sparse matrix-vector multiply (SpM×V) [21, 22, 40], and dense matrix kernels [8, 41]. In particular, we adopt the two-step methodology of previous approaches: (1) we identify and generate a set of reasonable candidate implementations, and (2) search this set for the fastest implementation by some combination of performance modeling and actually executing the implementations. In this paper, we consider the solution of the sparse lower triangular system Lx = y for a single dense vector x, given the lower triangular sparse matrix L and dense vector y. We refer to x as the solution vector and y as the right-hand side (RHS). Many of the lower triangular factors we have observed from sparse LU factorization have a large, dense triangle in the lower right-hand corner of the matrix; this trailing triangle can account for as much as 90% of the matrix non-zeros. Therefore, we consider both algorithmic and data structure reorganizations which partition the solve into a sparse phase and a dense phase. To the sparse phase, we adapt the register blocking optimization, previously proposed for sparse matrix-vector multiply (SpM×V) in the Sparsity system [21, 22], to the SpTS kernel; to the dense phase, we make judicious use of highly tuned BLAS routines by switching to a dense implementation (switch-to-dense optimization). We describe fully automatic hybrid off-line/on-line heuristics for selecting the key tuning parameters: the register block size and the point at which to use the dense algorithm. (See Section 2.) We then evaluate the performance of our optimized implementations relative to the fundamental limits on performance. Specifically, we first derive simple models of the upper bounds on the execution rate (Mflop/s) of our implementations. Using hardware counter data collected with the PAPI library [10], we then verify our models on three hardware platforms (Table 1) and a set of triangular factors from applications (Table 2). We observe that our optimized implementations can achieve 80% or more of these bounds; furthermore, we observe speedups of up to 1.8x when both register blocking and switch-to-dense optimizations are applied. We also present preliminary results confirming that our heuristics choose reasonable values for the tuning parameters. These results support our prior findings with SpM×V [40], suggesting two new directions for performance enhancements: (1) the use of higher-level matrix structures (e.g., matrix reordering and multiple register block sizes), and (2) optimizing kernels with more opportunities for data reuse (e.g., multiplication and solve with multiple vectors, multiplication of AA by a vector).

Journal ArticleDOI
TL;DR: This work presents an implicit, edge-based implementation of the semi-discrete SUPG formulation with shock-capturing for the Euler equations in conservative variables that requires less memory and CPU time than element-based implementations.

Proceedings ArticleDOI
04 Mar 2002
TL;DR: This paper demonstrates a localized, window-based extraction and simulation methodology that employs the recently proposed susceptance (the inverse of inductance matrix) concept, and provides a qualitative explanation for the efficacy and how it facilitates pre-manufacturing simulations that would otherwise be intractable.
Abstract: Due to the increasing operating frequencies and the manner in which the corresponding integrated circuits and systems must be designed, the extraction, modeling and simulation of the magnetic couplings for final design verification can be a daunting task. In general, when modeling inductance and the associated return paths, one must consider the on-chip conductors as well as the system packaging. This can result in an RLC circuit size that is impractical for traditional simulators. In this paper we demonstrate a localized, window-based extraction and simulation methodology that employs the recently proposed susceptance (the inverse of inductance matrix) concept. We provide a qualitative explanation for the efficacy of this approach, and demonstrate how it facilitates pre-manufacturing simulations that would otherwise be intractable. A critical aspect of this simulation efficiency is owed to a susceptance-based circuit formation that we prove to be symmetric positive definite. This property, along with the sparsity of the susceptance matrix, enables the use of some advanced sparse matrix solvers. lye demonstrate this extraction and simulation methodology on some industrial examples.

Journal ArticleDOI
TL;DR: A standard "graph compression" algorithm used in direct sparse matrix methods is considered along with two other algorithms which are also capable of unraveling approximate block structures.
Abstract: Sparse matrices which arise in many applications often possess a block structure that can be exploited in iterative and direct solution methods. These block-matrices have as their entries small dense blocks with constant or variable dimensions. Block versions of incomplete LU factorizations which have been developed to take advantage of such structures give rise to a class of preconditioners that are among the most effective available. This paper presents general techniques for automatically determining block structures in sparse matrices. A standard "graph compression" algorithm used in direct sparse matrix methods is considered along with two other algorithms which are also capable of unraveling approximate block structures.

Proceedings ArticleDOI
13 May 2002
TL;DR: It is shown that the so-called proportionate normalized least mean squares (PNLMS) algorithm, an adaptive filter that converges quickly for sparse solutions, is in fact an NNG on a certain parameter space warping, and by choosing a warping that favors diverse or dense impulse responses, a new adaptive algorithm is obtained.
Abstract: This paper introduces a class of normalized natural gradient algorithms (NNGs) for adaptive filtering tasks. Natural gradient techniques are useful for generating relatively simple adaptive filtering algorithms where the space of the adaptive coefficients is curved or warped with respect to Euclidean space. The advantage of normalizing gradient adaptive filters is that constant rates of convergence for signals with wide dynamic ranges may be achieved. We show that the so-called proportionate normalized least mean squares (PNLMS) algorithm, an adaptive filter that converges quickly for sparse solutions, is in fact an NNG on a certain parameter space warping. We also show that by choosing a warping that favors diverse or dense impulse responses, we may obtain a new adaptive algorithm, the inverse proportionate NLMS (INLMS) algorithm. This procedure converges quickly to and accurately tracks non-sparse impulse responses.

Journal ArticleDOI
TL;DR: A class of parallel multistep successive preconditionsing strategies to enhance efficiency and robustness of standard sparse approximate inverse preconditioning techniques are developed.
Abstract: We develop a class of parallel multistep successive preconditioning strategies to enhance efficiency and robustness of standard sparse approximate inverse preconditioning techniques. The key idea is to compute a series of simple sparse matrices to approximate the inverse of the original matrix. Studies are conducted to show the advantages of such an approach in terms of both improving preconditioning accuracy and reducing computational cost, compared to the standard sparse approximate inverse preconditioners. Numerical experiments using one prototype implementation to solve a few sparse matrices on a distributed memory parallel computer are reported.