Showing papers in "ACM Transactions on Mathematical Software in 2008"

PDF

Open Access

Journal Article•DOI•

Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky Factorization and Update/Downdate

[...]

Yanqing Chen¹, Timothy A. Davis¹, William W. Hager¹, Sivasankaran Rajamanickam¹•Institutions (1)

01 Oct 2008-ACM Transactions on Mathematical Software

TL;DR: CHOLMOD is a set of routines for factorizing sparse symmetric positive definite matrices of the form A or AAT, updating/downdating a sparse Cholesky factorization, solving linear systems, updating the solution to the triangular system Lx = b, and many other sparse matrix functions for both symmetric and unsymmetric matrices.

...read moreread less

Abstract: CHOLMOD is a set of routines for factorizing sparse symmetric positive definite matrices of the form A or AAT, updating/downdating a sparse Cholesky factorization, solving linear systems, updating/downdating the solution to the triangular system Lx = b, and many other sparse matrix functions for both symmetric and unsymmetric matrices. Its supernodal Cholesky factorization relies on LAPACK and the Level-3 BLAS, and obtains a substantial fraction of the peak performance of the BLAS. Both real and complex matrices are supported. CHOLMOD is written in ANSI/ISO C, with both C and MATLABTM interfaces. It appears in MATLAB 7.2 as x = A\b when A is sparse symmetric positive definite, as well as in several other sparse matrix functions.

...read moreread less

746 citations

Journal Article•DOI•

Anatomy of high-performance matrix multiplication

[...]

Kazushige Goto¹, Robert A. van de Geijn¹•Institutions (1)

University of Texas at Austin¹

16 May 2008-ACM Transactions on Mathematical Software

TL;DR: The basic principles that underlie the high-performance implementation of the matrix-matrix multiplication that is part of the widely used GotoBLAS library are presented.

...read moreread less

Abstract: We present the basic principles that underlie the high-performance implementation of the matrix-matrix multiplication that is part of the widely used GotoBLAS library. Design decisions are justified by successively refining a model of architectures with multilevel memories. A simple but effective algorithm for executing this operation results. Implementations on a broad selection of architectures are shown to achieve near-peak performance.

...read moreread less

711 citations

Journal Article•DOI•

High-performance implementation of the level-3 BLAS

[...]

Kazushige Goto¹, Robert A. van de Geijn¹•Institutions (1)

University of Texas at Austin¹

25 Jul 2008-ACM Transactions on Mathematical Software

TL;DR: A simple but highly effective approach for transforming high-performance implementations on cache-based architectures of matrix-Matrix multiplication into implementations of other commonly used matrix-matrix computations (the level-3 BLAS) is presented.

...read moreread less

Abstract: A simple but highly effective approach for transforming high-performance implementations on cache-based architectures of matrix-matrix multiplication into implementations of other commonly used matrix-matrix computations (the level-3 BLAS) is presented. Exceptional performance is demonstrated on various architectures.

...read moreread less

349 citations

Journal Article•DOI•

SNOBFIT -- Stable Noisy Optimization by Branch and Fit

[...]

Waltraud Huyer¹, Arnold Neumaier¹•Institutions (1)

University of Vienna¹

01 Jul 2008-ACM Transactions on Mathematical Software

TL;DR: The software package SNOBFIT for bound- Constrained (and soft-constrained) noisy optimization of an expensive objective function is described and is made robust and flexible for practical use by allowing for hidden constraints, batch function evaluations, change of search regions, etc.

...read moreread less

Abstract: The software package SNOBFIT for bound-constrained (and soft-constrained) noisy optimization of an expensive objective function is described. It combines global and local search by branching and local fits. The program is made robust and flexible for practical use by allowing for hidden constraints, batch function evaluations, change of search regions, etc.

...read moreread less

234 citations

Journal Article•DOI•

Algorithm 883: SparsePOP---A Sparse Semidefinite Programming Relaxation of Polynomial Optimization Problems

[...]

Hayato Waki¹, Sunyoung Kim², Masakazu Kojima¹, Masakazu Muramatsu³, Hiroshi Sugimoto¹ - Show less +1 more•Institutions (3)

Tokyo Institute of Technology¹, Ewha Womans University², University of Electro-Communications³

01 Jul 2008-ACM Transactions on Mathematical Software

TL;DR: The efficiency of SparsePOP to approximate optimal solutions of POPs is increased, and larger-scale POPs can be handled.

...read moreread less

Abstract: SparsePOP is a Matlab implementation of the sparse semidefinite programming (SDP) relaxation method for approximating a global optimal solution of a polynomial optimization problem (POP) proposed by Waki et al. [2006]. The sparse SDP relaxation exploits a sparse structure of polynomials in POPs when applying “a hierarchy of LMI relaxations of increasing dimensions” Lasserre [2006]. The efficiency of SparsePOP to approximate optimal solutions of POPs is thus increased, and larger-scale POPs can be handled.

...read moreread less

208 citations

Journal Article•DOI•

Hybrid differentiation strategies for simulation and analysis of applications in C

[...]

Roscoe A. Bartlett¹, Bart G. van Bloemen Waanders¹, Martin Berggren²•Institutions (2)

Sandia National Laboratories¹, Uppsala University²

25 Jul 2008-ACM Transactions on Mathematical Software

TL;DR: This work demonstrates the use of template overloading forward AD tools in C++ applications that attempts to achieve an optimal balance of implementation and computational efficiency by selecting the appropriate components of the target algorithms for AD and analytical derivation.

...read moreread less

Abstract: Computationally efficient and accurate derivatives are important to the success of many different types of numerical methods. Automatic differentation (AD) approaches compute truncation-free derivatives and can be efficient in many cases. Although present AD tools can provide a convenient implementation mechanism, the computational efficiency rarely compares to analytically derived versions that have been carefully implemented. The focus of this work is to combine the strength of these methods into a hybrid strategy that attempts to achieve an optimal balance of implementation and computational efficiency by selecting the appropriate components of the target algorithms for AD and analytical derivation. Although several AD approaches can be considered, our focus is on the use of template overloading forward AD tools in C++ applications. We demonstrate this hybrid strategy for a system of partial differential equations in gas dynamics. These methods apply however to other systems of differentiable equations, including DAEs and ODEs.

...read moreread less

198 citations

Journal Article•DOI•

OpenAD/F: A Modular Open-Source Tool for Automatic Differentiation of Fortran Codes

[...]

Jean Utke¹, Uwe Naumann², Mike Fagan³, Nathan R. Tallent³, Michelle Mills Strout⁴, Patrick Heimbach⁵, Chris Hill⁵, Carl Wunsch⁵ - Show less +4 more•Institutions (5)

Argonne National Laboratory¹, RWTH Aachen University², Rice University³, Colorado State University⁴, Massachusetts Institute of Technology⁵

01 Jul 2008-ACM Transactions on Mathematical Software

TL;DR: The Open/ADF tool allows the evaluation of derivatives of functions defined by a Fortran program, and supports various code reversal schemes with hierarchical checkpointing at the subroutine level for the generation of adjoint codes.

...read moreread less

Abstract: The Open/ADF tool allows the evaluation of derivatives of functions defined by a Fortran program. The derivative evaluation is performed by a Fortran code resulting from the analysis and transformation of the original program that defines the function of interest. Open/ADF has been designed with a particular emphasis on modularity, flexibility, and the use of open source components. While the code transformation follows the basic principles of automatic differentiation, the tool implements new algorithmic approaches at various levels, for example, for basic block preaccumulation and call graph reversal. Unlike most other automatic differentiation tools, Open/ADF uses components provided by the Open/AD framework, which supports a comparatively easy extension of the code transformations in a language-independent fashion. It uses code analysis results implemented in the OpenAnalysis component. The interface to the language-independent transformation engine is an XML-based format, specified through an XML schema. The implemented transformation algorithms allow efficient derivative computations using locally optimized cross-country sequences of vertex, edge, and face elimination steps. Specifically, for the generation of adjoint codes, Open/ADF supports various code reversal schemes with hierarchical checkpointing at the subroutine level. As an example from geophysical fluid dynamics, a nonlinear time-dependent scalable, yet simple, barotropic ocean model is considered. OpenAD/F's reverse mode is applied to compute sensitivities of some of the model's transport properties with respect to gridded fields such as bottom topography as independent (control) variables.

...read moreread less

170 citations

Journal Article•DOI•

An event-driven method to simulate Filippov systems with accurate computing of sliding motions

[...]

Petri T. Piiroinen¹, Yuri A. Kuznetsov²•Institutions (2)

National University of Ireland, Galway¹, Utrecht University²

16 May 2008-ACM Transactions on Mathematical Software

TL;DR: The introduced algorithms are based on an extension to Filippov's method to stabilise the sliding flow together with accurate detection of the entrance and exit of sliding regions.

...read moreread less

Abstract: This article describes how to use smooth solvers for simulation of a class of piecewise smooth systems of ordinary differential equations, called Filippov systems, with discontinuous vector fields. In these systems constrained motion along a discontinuity surface (so-called sliding) is possible and requires special treatment numerically. The introduced algorithms are based on an extension to Filippov's method to stabilise the sliding flow together with accurate detection of the entrance and exit of sliding regions. The methods are implemented in a general way in MATLAB and sufficient details are given to enable users to modify the code to run on arbitrary examples. Here, the method is used to compute the dynamics of three example systems, a dry-friction oscillator, a relay feedback system and a model of an oil well drill-string.

...read moreread less

141 citations

Journal Article•DOI•

Block variants of Hammarling's method for solving Lyapunov equations

[...]

Daniel Kressner¹•Institutions (1)

Umeå University¹

19 Jan 2008-ACM Transactions on Mathematical Software

TL;DR: This article discusses the efficient implementation of Hammarling's method and proposes among other algorithmic improvements a block variant, which is demonstrated to perform significantly better than existing implementations.

...read moreread less

Abstract: This article is concerned with the efficient numerical solution of the Lyapunov equation ATX + XA = -C with a stable matrix A and a symmetric positive semidefinite matrix C of possibly small rank. We discuss the efficient implementation of Hammarling's method and propose among other algorithmic improvements a block variant, which is demonstrated to perform significantly better than existing implementations. An extension to the discrete-time Lyapunov equation ATXA - X = -C is also described.

...read moreread less

139 citations

Journal Article•DOI•

Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy

[...]

Alfredo Buttari¹, Jack Dongarra², Jakub Kurzak², Piotr Luszczek³, Stanimir Tomov² - Show less +1 more•Institutions (3)

École normale supérieure de Lyon¹, University of Tennessee², MathWorks³

01 Jul 2008-ACM Transactions on Mathematical Software

TL;DR: The approach presented here can apply not only to conventional processors but also to exotic technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the Cell BE processor.

...read moreread less

Abstract: By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. These ideas can be applied to sparse multifrontal and supernodal direct techniques and sparse iterative techniques such as Krylov subspace methods. The approach presented here can apply not only to conventional processors but also to exotic technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the Cell BE processor.

...read moreread less

100 citations

Journal Article•DOI•

Families of algorithms related to the inversion of a Symmetric Positive Definite matrix

[...]

Paolo Bientinesi¹, Brian Gunter², Robert A. van de Geijn³•Institutions (3)

RWTH Aachen University¹, Delft University of Technology², University of Texas at Austin³

25 Jul 2008-ACM Transactions on Mathematical Software

TL;DR: This work states different algorithms for each of these sweeps of the inversion of a Symmetric Positive Definite matrix as well as algorithms that compute the result in a single sweep and outperforms the current ScaLAPACK implementation by 20-30 percent due to improved load-balance on a distributed memory architecture.

...read moreread less

Abstract: We study the high-performance implementation of the inversion of a Symmetric Positive Definite (SPD) matrix on architectures ranging from sequential processors to Symmetric MultiProcessors to distributed memory parallel computers. This inversion is traditionally accomplished in three “sweeps”: a Cholesky factorization of the SPD matrix, the inversion of the resulting triangular matrix, and finally the multiplication of the inverted triangular matrix by its own transpose. We state different algorithms for each of these sweeps as well as algorithms that compute the result in a single sweep. One algorithm outperforms the current ScaLAPACK implementation by 20-30 percent due to improved load-balance on a distributed memory architecture.

...read moreread less

Journal Article•DOI•

Dense Linear Algebra over Word-Size Prime Fields: the FFLAS and FFPACK Packages

[...]

Jean-Guillaume Dumas¹, Pascal Giorgi², Clément Pernet³•Institutions (3)

Joseph Fourier University¹, University of Perpignan², University of Washington³

01 Oct 2008-ACM Transactions on Mathematical Software

TL;DR: This article studies high performance implementations of basic linear algebra routines over word size prime fields: especially matrix multiplication, with the goal being to provide an exact alternate to the numerical BLAS library.

...read moreread less

Abstract: In the past two decades, some major efforts have been made to reduce exact (e.g. integer, rational, polynomial) linear algebra problems to matrix multiplication in order to provide algorithms with optimal asymptotic complexity. To provide efficient implementations of such algorithms one need to be careful with the underlying arithmetic. It is well known that modular techniques such as the Chinese remainder algorithm or the p-adic lifting allow very good practical performance, especially when word size arithmetic is used. Therefore, finite field arithmetic becomes an important core for efficient exact linear algebra libraries. In this article, we study high performance implementations of basic linear algebra routines over word size prime fields: especially matrix multiplication; our goal being to provide an exact alternate to the numerical BLAS library. We show that this is made possible by a careful combination of numerical computations and asymptotically faster algorithms. Our kernel has several symbolic linear algebra applications enabled by diverse matrix multiplication reductions: symbolic triangularization, system solving, determinant, and matrix inverse implementations are thus studied.

...read moreread less

Journal Article•

DSDP5 : software for semidefinite programming.

[...]

Steven J. Benson, Yinyu Ye

01 Jan 2008-ACM Transactions on Mathematical Software

TL;DR: DSDP implements the dual-scaling algorithm for semidefinite programming, written entirely in ANSI C, and has been used in many applications and tested for efficiency, robustness, and ease of use.

...read moreread less

Abstract: DSDP implements the dual-scaling algorithm for semidefinite programming. The source code for this interior-point algorithm, written entirely in ANSI C, is freely available. The solver can be used as a subroutine library, as a function within the Matlab environment, or as an executable that reads and writes to data files. Initiated in 1997, DSDP has developed into an efficient and robust general-purpose solver for semidefinite programming. Its features include a convergence proof with polynomially bounded worst-case complexity, primal and dual feasible solutions when they exist, certificates of infeasibility when solutions do not exist, initial points that can be feasible or infeasible, relatively low memory requirements for an interior-point method, sparse and lowrank data structures, extensibility that allows applications to customize the solver and improve its performance, a subroutine library that enables it to be linked to larger applications, scalable performance for large problems on parallel architectures, and a well-documented interface and examples of its use. The package has been used in many applications and tested for efficiency, robustness, and ease of use.

...read moreread less

Journal Article•DOI•

Updating an LU Factorization with Pivoting

[...]

Enrique S. Quintana-Ortí¹, Robert A. van de Geijn²•Institutions (2)

James I University¹, University of Texas at Austin²

01 Jul 2008-ACM Transactions on Mathematical Software

TL;DR: This work shows how to compute an LU factorization of a matrix when the factors of a leading principle submatrix are already known, and incorporates pivoting akin to partial pivoting, a strategy it calls incremental pivoting.

...read moreread less

Abstract: We show how to compute an LU factorization of a matrix when the factors of a leading principle submatrix are already known. The approach incorporates pivoting akin to partial pivoting, a strategy we call incremental pivoting. An implementation using the Formal Linear Algebra Methods Environment (FLAME) application programming interface (API) is described. Experimental results demonstrate practical numerical stability and high performance on an Intel Itanium2 processor-based server.

...read moreread less

Journal Article•DOI•

Algorithm 876: Solving Fredholm Integral Equations of the Second Kind in Matlab

[...]

Kendall Atkinson¹, Lawrence F. Shampine²•Institutions (2)

University of Iowa¹, Southern Methodist University²

01 Jul 2008-ACM Transactions on Mathematical Software

TL;DR: The algorithms and user interface of a Matlab program, Fie, are presented that solves numerically Fredholm integral equations of the second kind on an interval [a, b] to a specified, modest accuracy.

...read moreread less

Abstract: We present here the algorithms and user interface of a Matlab program, Fie, that solves numerically Fredholm integral equations of the second kind on an interval [a,b] to a specified, modest accuracy. The kernel function K(s,t) is moderately smooth on [a,b] ×[a,b] except possibly across the diagonal s = t. If the interval is finite, provides for kernel functions that behave in a variety of ways across the diagonal, that is, K(s,t) may be smooth, have a discontinuity in a low-order derivative, have a logarithmic singularity, or have an algebraic singularity. Fie also solves a large class of integral equations with moderately smooth kernel function on [0, ∞ ).

...read moreread less

Journal Article•DOI•

Algorithm 873: LSTRS: MATLAB software for large-scale trust-region subproblems and regularization

[...]

Marielba Rojas¹, Sandra A. Santos², Danny C. Sorensen³•Institutions (3)

Technical University of Denmark¹, State University of Campinas², Rice University³

19 Mar 2008-ACM Transactions on Mathematical Software

TL;DR: A MATLAB 6.0 implementation of the LSTRS method, designed for large-scale quadratic problems with one norm constraint, and of the MATLAB software, version 1.2, are presented.

...read moreread less

Abstract: A MATLAB 6.0 implementation of the LSTRS method is presented. LSTRS was described in Rojas et al. l2000r. LSTRS is designed for large-scale quadratic problems with one norm constraint. The method is based on a reformulation of the trust-region subproblem as a parameterized eigenvalue problem, and consists of an iterative procedure that finds the optimal value for the parameter. The adjustment of the parameter requires the solution of a large-scale eigenvalue problem at each step. LSTRS relies on matrix-vector products only and has low and fixed storage requirements, features that make it suitable for large-scale computations. In the MATLAB implementation, the Hessian matrix of the quadratic objective function can be specified either explicitly, or in the form of a matrix-vector multiplication routine. Therefore, the implementation preserves the matrix-free nature of the method. A description of the LSTRS method and of the MATLAB software, version 1.2, is presented. Comparisons with other techniques and applications of the method are also included. A guide for using the software and examples are provided.

...read moreread less

Journal Article•DOI•

Algorithm 888: Spherical Harmonic Transform Algorithms

[...]

John B. Drake¹, Pat Worley¹, Eduardo D'Azevedo¹•Institutions (1)

Oak Ridge National Laboratory¹

01 Oct 2008-ACM Transactions on Mathematical Software

TL;DR: A collection of MATLAB classes for computing and using spherical harmonic transforms is presented and the use of the spectral synthesis and analysis algorithms using fast Fourier transforms and Legendre transforms with the associated Legendre functions is demonstrated.

...read moreread less

Abstract: A collection of MATLAB classes for computing and using spherical harmonic transforms is presented. Methods of these classes compute differential operators on the sphere and are used to solve simple partial differential equations in a spherical geometry. The spectral synthesis and analysis algorithms using fast Fourier transforms and Legendre transforms with the associated Legendre functions are presented in detail. A set of methods associated with a spectral_field class provides spectral approximation to the differential operators ∇ c, ∇ ×, ∇, and ∇2 in spherical geometry. Laplace inversion and Helmholtz equation solvers are also methods for this class. The use of the class and methods in MATLAB is demonstrated by the solution of the barotropic vorticity equation on the sphere. A survey of alternative algorithms is given and implementations for parallel high performance computers are discussed in the context of global climate and weather models.

...read moreread less

Journal Article•DOI•

Optimal vertex elimination in single-expression-use graphs

[...]

Uwe Naumann¹, Yuxiao Hu¹•Institutions (1)

RWTH Aachen University¹

25 Jul 2008-ACM Transactions on Mathematical Software

TL;DR: This article presents an efficient algorithm for solving a relevant subclass of this combinatorial optimization problem that aims to minimize the number of arithmetic operations performed by the elimination algorithm and relies on the ability to compute vertex covers in bipartite graphs in polynomial time.

...read moreread less

Abstract: The source transformation tool for automatic differentiation of Fortran programs ADIFOR uses a preaccumulation technique to speed up tangent-linear codes significantly compared to the standard forward mode. Reverse mode automatic differentiation is applied to all scalar assignments to generate efficient code for the computation of local gradients. It has been well known for some time that reverse mode is not necessarily the optimal choice for the computation of these statement-level gradients as it does not minimize the number of operations required. This article presents an efficient algorithm for the solution of this combinatorial optimization problem. The corresponding software is freely available for downloading on our website. Developers of software for automatic differentiation are invited to integrate the algorithm into their tools.Gradients of scalar multivariate functions can be computed by elimination methods on the linearized computational graph. The combinatorial optimization problem that aims to minimize the number of arithmetic operations performed by the elimination algorithm is known to be NP-complete. In this article we present a polynomial algorithm for solving a relevant subclass of this problem's instances. The proposed method relies on the ability to compute vertex covers in bipartite graphs in polynomial time. A simplified version of this graph algorithm is used in a research prototype of the differentiation-enabled NAGWare Fortran compiler for the preaccumulation of local gradients of scalar assignments in the context of automatic generation of efficient tangent-linear code for numerical programs.

...read moreread less

Journal Article•DOI•

Algorithm 872: Parallel 2D constrained Delaunay mesh generation

[...]

Andrey N. Chernikov¹, Nikos Chrisochoides¹•Institutions (1)

College of William & Mary¹

19 Jan 2008-ACM Transactions on Mathematical Software

TL;DR: An algorithm and a software are presented for the parallel constrained Delaunay mesh generation in two dimensions based on the decomposition of the original mesh generation problem into N smaller subproblems which are meshed in parallel.

...read moreread less

Abstract: Delaunay refinement is a widely used method for the construction of guaranteed quality triangular and tetrahedral meshes We present an algorithm and a software for the parallel constrained Delaunay mesh generation in two dimensions Our approach is based on the decomposition of the original mesh generation problem into N smaller subproblems which are meshed in parallel The parallel algorithm is asynchronous with small messages which can be aggregated and exhibits low communication costs On a heterogeneous cluster of more than 100 processors our implementation can generate over one billion triangles in less than 3 minutes, while the single-node performance is comparable to that of the fastest to our knowledge sequential guaranteed quality Delaunay meshing library (the Triangle)

...read moreread less

Journal Article•DOI•

Algorithm 880: A testing infrastructure for symmetric tridiagonal eigensolvers

[...]

Osni Marques¹, Christof Vömel¹, James Demmel², Beresford N. Parlett²•Institutions (2)

Lawrence Berkeley National Laboratory¹, University of California, Berkeley²

25 Jul 2008-ACM Transactions on Mathematical Software

TL;DR: An infrastructure for exhaustively testing the symmetric tridiagonal eigensolvers implemented in LAPACK is described, which consists of a selection of carefully chosen test matrices with particular idiosyncrasies and a portable testing framework that allows for easy testing and data processing.

...read moreread less

Abstract: LAPACK is often mentioned as a positive example of a software library that encapsulates complex, robust, and widely used numerical algorithms for a wide range of applications. At installation time, the user has the option of running a (limited) number of test cases to verify the integrity of the installation process. On the algorithm developer's side, however, more exhaustive tests are usually performed to study algorithm behavior on a variety of problem settings and also computer architectures. In this process, difficult test cases need to be found that reflect particular challenges of an application or push algorithms to extreme behavior. These tests are then assembled into a comprehensive collection, therefore making it possible for any new or competing algorithm to be stressed in a similar way. This article describes an infrastructure for exhaustively testing the symmetric tridiagonal eigensolvers implemented in LAPACK. It consists of two parts: a selection of carefully chosen test matrices with particular idiosyncrasies and a portable testing framework that allows for easy testing and data processing. The tester facilitates experiments with algorithmic choices, parameter and threshold studies, and performance comparisons on different architectures.

...read moreread less

Journal Article•DOI•

Algorithm 875: DSDP5—software for semidefinite programming

[...]

Steven J. Benson¹, Yinyu Ye²•Institutions (2)

Argonne National Laboratory¹, Stanford University²

16 May 2008-ACM Transactions on Mathematical Software

...read moreread less

Abstract: DSDP implements the dual-scaling algorithm for semidefinite programming. The source code for this interior-point algorithm, written entirely in ANSI C, is freely available under an open source license. The solver can be used as a subroutine library, as a function within the Matlab environment, or as an executable that reads and writes to data files. Initiated in 1997, DSDP has developed into an efficient and robust general-purpose solver for semidefinite programming. Its features include a convergence proof with polynomially bounded worst-case complexity, primal and dual feasible solutions when they exist, certificates of infeasibility when solutions do not exist, initial points that can be feasible or infeasible, relatively low memory requirements for an interior-point method, sparse and low-rank data structures, extensibility that allows applications to customize the solver and improve its performance, a subroutine library that enables it to be linked to larger applications, scalable performance for large problems on parallel architectures, and a well-documented interface and examples of its use. The package has been used in many applications and tested for efficiency, robustness, and ease of use.

...read moreread less

Journal Article•DOI•

On the Failure of Rank-Revealing QR Factorization Software -- A Case Study

[...]

Zlatko Drmač¹, Zvonimir Bujanović¹•Institutions (1)

University of Zagreb¹

01 Jul 2008-ACM Transactions on Mathematical Software

TL;DR: An unexpected and rather erratic behavior of the LAPACK software implementation of the QR factorization with Businger-Golub column pivoting is reported and a new, equally efficient, and provably numerically safe partial-column norm-updating strategy is provided.

...read moreread less

Abstract: This article reports an unexpected and rather erratic behavior of the LAPACK software implementation of the QR factorization with Businger-Golub column pivoting. It is shown that, due to finite precision arithmetic, the software implementation of the factorization can catastrophically fail to produce a properly structured triangular factor, thus leading to a potentially severe underestimate of a matrix's numerical rank. The 30-year old problem, dating back to LINPACK, has (undetectedly) badly affected many computational routines and software packages, as well as the study of rank-revealing QR factorizations. We combine computer experiments and numerical analysis to isolate, analyze, and fix the problem. Our modification of the current LAPACK xGEQP3 routine is already included in the LAPACK 3.1.0 release. The modified routine is numerically more robust and with a negligible overhead. We also provide a new, equally efficient, and provably numerically safe partial-column norm-updating strategy.

...read moreread less

Journal Article•DOI•

Algorithm 884: A Simple Matlab Implementation of the Argyris Element

[...]

Víctor Domínguez¹, Francisco-Javier Sayas²•Institutions (2)

Universidad Pública de Navarra¹, University of Zaragoza²

01 Jul 2008-ACM Transactions on Mathematical Software

TL;DR: A new algorithm to evaluate the basis functions of the Argyris finite element and their derivatives is proposed, an efficient way to calculate the matrix which gives the change of coordinates between the bases of theArgyis element for the reference and for an arbitrary triangle is presented.

...read moreread less

Abstract: In this work we propose a new algorithm to evaluate the basis functions of the Argyris finite element and their derivatives. The main novelty here is an efficient way to calculate the matrix which gives the change of coordinates between the bases of the Argyis element for the reference and for an arbitrary triangle. This matrix is factored as the product of two rectangular matrices with a strong block structure which makes their computation very easy. We show and comment on an implementation of this algorithm in Matlab. Two numerical experiments, an interpolation of a smooth function on a triangle and the finite-element solution of the Dirichlet problem for the biLaplacian, are presented in the last section to check the performance of our implementation.

...read moreread less

Journal Article•DOI•

CGMN Revisited: Robust and Efficient Solution of Stiff Linear Systems Derived from Elliptic Partial Differential Equations

[...]

Dan Gordon¹, Rachel Gordon²•Institutions (2)

University of Haifa¹, Technion – Israel Institute of Technology²

01 Oct 2008-ACM Transactions on Mathematical Software

TL;DR: The SSOR algorithm, applied to the normal equations, can be accelerated by the conjugate gradient algorithm (CG), and the resulting algorithm, called CGMN, is error-reducing and in theory it always converges even when the equation system is inconsistent and/or nonsquare.

...read moreread less

Abstract: Given a linear system Ax = b, one can construct a related “normal equations” system AATy = b, x = ATy. Bjorck and Elfving have shown that the SSOR algorithm, applied to the normal equations, can be accelerated by the conjugate gradient algorithm (CG). The resulting algorithm, called CGMN, is error-reducing and in theory it always converges even when the equation system is inconsistent and/or nonsquare. SSOR on the normal equations is equivalent to the Kaczmarz algorithm (KACZ), with a fixed relaxation parameter, run in a double (forward and backward) sweep on the original equations. CGMN was tested on nine well-known large and sparse linear systems obtained by central-difference discretization of elliptic convection-diffusion partial differential equations (PDEs). Eight of the PDEs were strongly convection-dominated, and these are known to produce very stiff systems with large off-diagonal elements. CGMN was compared with some of the foremost state-of-the art Krylov subspace methods: restarted GMRES, Bi-CGSTAB, and CGS. These methods were tested both with and without various preconditioners. CGMN converged in all the cases, while none of the preceding algorithm/preconditioner combinations achieved this level of robustness. Furthermore, on varying grid sizes, there was only a gradual increase in the number of iterations as the grid was refined. On the eight convection-dominated cases, the initial convergence rate of CGMN was better than all the other combinations of algorithms and preconditioners, and the residual decreased monotonically. The CGNR algorithm was also tested, and it was as robust as CGMN, but slower.

...read moreread less

Journal Article•DOI•

Computing sparse Hessians with automatic differentiation

[...]

Andrea Walther¹•Institutions (1)

Dresden University of Technology¹

19 Jan 2008-ACM Transactions on Mathematical Software

TL;DR: A new approach for computing a sparsity pattern for a Hessian is presented: nonlinearity information is propagated through the function evaluation yielding the nonzero structure.

...read moreread less

Abstract: A new approach for computing a sparsity pattern for a Hessian is presented: nonlinearity information is propagated through the function evaluation yielding the nonzero structure. A complexity analysis of the proposed algorithm is given. Once the sparsity pattern is available, coloring algorithms can be applied to compute a seed matrix. To evaluate the product of the Hessian and the seed matrix, a vector version for evaluating second order adjoints is analysed. New drivers of ADOL-C are provided implementing the presented algorithms. Runtime analyses are given for some problems of the CUTE collection.

...read moreread less

Journal Article•DOI•

Cache efficient bidiagonalization using BLAS 2.5 operators

[...]

Gary W. Howell¹, James Demmel², Charles T. Fulton³, Sven Hammarling⁴, Karen Marmol⁵ - Show less +1 more•Institutions (5)

North Carolina State University¹, University of California, Berkeley², Florida Institute of Technology³, University of Manchester⁴, Harris Corporation⁵

16 May 2008-ACM Transactions on Mathematical Software

TL;DR: This paper reorganizes the sequence of operations for Householder bidiagonalization of a general m × n matrix, so that two (_GEMV) vector-matrix multiplications can be done with one pass of the unreduced trailing part of the matrix through cache.

...read moreread less

Abstract: On cache based computer architectures using current standard algorithms, Householder bidiagonalization requires a significant portion of the execution time for computing matrix singular values and vectors. In this paper we reorganize the sequence of operations for Householder bidiagonalization of a general m × n matrix, so that two (_GEMV) vector-matrix multiplications can be done with one pass of the unreduced trailing part of the matrix through cache. Two new BLAS operations approximately cut in half the transfer of data from main memory to cache, reducing execution times by up to 25 per cent. We give detailed algorithm descriptions and compare timings with the current LAPACK bidiagonalization algorithm.

...read moreread less

Journal Article•DOI•

An Implementation and Evaluation of the AMLS Method for Sparse Eigenvalue Problems

[...]

Weiguo Gao¹, Xiaoye S. Li², Chao Yang², Zhaojun Bai³•Institutions (3)

Fudan University¹, Lawrence Berkeley National Laboratory², University of California, Davis³

01 Jul 2008-ACM Transactions on Mathematical Software

TL;DR: It is shown that the AMLS method, when implemented carefully, outperforms the traditional method in broad application areas when large numbers of eigenvalues are sought, with relatively low accuracy.

...read moreread less

Abstract: We describe an efficient implementation and present a performance study of an automated multi-level substructuring (AMLS) method for sparse eigenvalue problems. We assess the time and memory requirements associated with the key steps of the algorithm, and compare it with the shift-and-invert Lanczos algorithm. Our eigenvalue problems come from two very different application areas: accelerator cavity design and normal-mode vibrational analysis of polyethylene particles. We show that the AMLS method, when implemented carefully, outperforms the traditional method in broad application areas when large numbers of eigenvalues are sought, with relatively low accuracy.

...read moreread less

Journal Article•DOI•

Algorithm 882: Near-Best Fixed Pole Rational Interpolation with Applications in Spectral Methods

[...]

Joris Van Deun¹, Karl Deckers², Adhemar Bultheel², J. A. C. Weideman³•Institutions (3)

University of Antwerp¹, Katholieke Universiteit Leuven², Stellenbosch University³

01 Jul 2008-ACM Transactions on Mathematical Software

TL;DR: A numerical procedure to compute the nodes and weights in rational Gauss-Chebyshev quadrature formulas is presented, finding that under certain conditions on the poles, these nodes are near best for rational interpolation with prescribed poles.

...read moreread less

Abstract: We present a numerical procedure to compute the nodes and weights in rational Gauss-Chebyshev quadrature formulas. Under certain conditions on the poles, these nodes are near best for rational interpolation with prescribed poles (in the same sense that Chebyshev points are near best for polynomial interpolation). As an illustration, we use these interpolation points to solve a differential equation with an interior boundary layer using a rational spectral method.The algorithm to compute the interpolation points (and, if required, the quadrature weights) is implemented as a Matlab program.

...read moreread less

Journal Article•DOI•

On the design of interfaces to sparse direct solvers

[...]

Marzio Sala, Kendall S. Stanley¹, Michael A. Heroux²•Institutions (2)

Oberlin College¹, Sandia National Laboratories²

01 Mar 2008-ACM Transactions on Mathematical Software

TL;DR: The design of general, flexible, consistent, reusable, and efficient interfaces to software libraries for the direct solution of systems of linear equations on both serial and distributed memory architectures is discussed.

...read moreread less

Abstract: We discuss the design of general, flexible, consistent, reusable, and efficient interfaces to software libraries for the direct solution of systems of linear equations on both serial and distributed memory architectures. We introduce a set of abstract classes to access the linear system matrix elements and their distribution, access vector elements, and control the solution of the linear system. We describe a concrete implementation of the proposed interfaces, and report examples of applications and numerical results showing that the overhead induced by the object-oriented design is negligible under typical conditions of usage. We include examples of applications, and we comment on the advantages and limitations of the design.

...read moreread less

Journal Article•DOI•

Algorithm 870: A static geometric Medial Axis domain decomposition in 2D Euclidean space

[...]

Leonidas Linardakis¹, Nikos Chrisochoides¹•Institutions (1)

College of William & Mary¹

19 Jan 2008-ACM Transactions on Mathematical Software

TL;DR: The N-way decomposition is based on the “divide and conquer” algorithmic paradigm and on a smoothing procedure that eliminates the creation of any new artifacts in the subdomains.

...read moreread less

Abstract: We present a geometric domain decomposition method and its implementation, which produces good domain decompositions in terms of three basic criteria: (1) The boundary of the subdomains create good angles, that is, angles no smaller than a given tolerance Φ0, where the value of Φ0 is determined by the application which will use the domain decomposition. (2) The size of the separator should be relatively small compared to the area of the subdomains. (3) The maximum area of the subdomains should be close to the average subdomain area. The domain decomposition method uses an approximation of a Medial Axis as an auxiliary structure for constructing the boundary of the subdomains (separators). The N-way decomposition is based on the “divide and conquer” algorithmic paradigm and on a smoothing procedure that eliminates the creation of any new artifacts in the subdomains. This approach produces well shaped uniform and graded domain decompositions, which are suitable for parallel mesh generation.

...read moreread less