scispace - formally typeset
Search or ask a question

Showing papers in "ACM Transactions on Mathematical Software in 2008"


Journal ArticleDOI
TL;DR: CHOLMOD is a set of routines for factorizing sparse symmetric positive definite matrices of the form A or AAT, updating/downdating a sparse Cholesky factorization, solving linear systems, updating the solution to the triangular system Lx = b, and many other sparse matrix functions for both symmetric and unsymmetric matrices.
Abstract: CHOLMOD is a set of routines for factorizing sparse symmetric positive definite matrices of the form A or AAT, updating/downdating a sparse Cholesky factorization, solving linear systems, updating/downdating the solution to the triangular system Lx = b, and many other sparse matrix functions for both symmetric and unsymmetric matrices. Its supernodal Cholesky factorization relies on LAPACK and the Level-3 BLAS, and obtains a substantial fraction of the peak performance of the BLAS. Both real and complex matrices are supported. CHOLMOD is written in ANSI/ISO C, with both C and MATLABTM interfaces. It appears in MATLAB 7.2 as x = A\b when A is sparse symmetric positive definite, as well as in several other sparse matrix functions.

746 citations


Journal ArticleDOI
TL;DR: The basic principles that underlie the high-performance implementation of the matrix-matrix multiplication that is part of the widely used GotoBLAS library are presented.
Abstract: We present the basic principles that underlie the high-performance implementation of the matrix-matrix multiplication that is part of the widely used GotoBLAS library. Design decisions are justified by successively refining a model of architectures with multilevel memories. A simple but effective algorithm for executing this operation results. Implementations on a broad selection of architectures are shown to achieve near-peak performance.

711 citations


Journal ArticleDOI
TL;DR: A simple but highly effective approach for transforming high-performance implementations on cache-based architectures of matrix-Matrix multiplication into implementations of other commonly used matrix-matrix computations (the level-3 BLAS) is presented.
Abstract: A simple but highly effective approach for transforming high-performance implementations on cache-based architectures of matrix-matrix multiplication into implementations of other commonly used matrix-matrix computations (the level-3 BLAS) is presented. Exceptional performance is demonstrated on various architectures.

349 citations


Journal ArticleDOI
TL;DR: The software package SNOBFIT for bound- Constrained (and soft-constrained) noisy optimization of an expensive objective function is described and is made robust and flexible for practical use by allowing for hidden constraints, batch function evaluations, change of search regions, etc.
Abstract: The software package SNOBFIT for bound-constrained (and soft-constrained) noisy optimization of an expensive objective function is described. It combines global and local search by branching and local fits. The program is made robust and flexible for practical use by allowing for hidden constraints, batch function evaluations, change of search regions, etc.

234 citations


Journal ArticleDOI
TL;DR: The efficiency of SparsePOP to approximate optimal solutions of POPs is increased, and larger-scale POPs can be handled.
Abstract: SparsePOP is a Matlab implementation of the sparse semidefinite programming (SDP) relaxation method for approximating a global optimal solution of a polynomial optimization problem (POP) proposed by Waki et al. [2006]. The sparse SDP relaxation exploits a sparse structure of polynomials in POPs when applying “a hierarchy of LMI relaxations of increasing dimensions” Lasserre [2006]. The efficiency of SparsePOP to approximate optimal solutions of POPs is thus increased, and larger-scale POPs can be handled.

208 citations


Journal ArticleDOI
TL;DR: This work demonstrates the use of template overloading forward AD tools in C++ applications that attempts to achieve an optimal balance of implementation and computational efficiency by selecting the appropriate components of the target algorithms for AD and analytical derivation.
Abstract: Computationally efficient and accurate derivatives are important to the success of many different types of numerical methods. Automatic differentation (AD) approaches compute truncation-free derivatives and can be efficient in many cases. Although present AD tools can provide a convenient implementation mechanism, the computational efficiency rarely compares to analytically derived versions that have been carefully implemented. The focus of this work is to combine the strength of these methods into a hybrid strategy that attempts to achieve an optimal balance of implementation and computational efficiency by selecting the appropriate components of the target algorithms for AD and analytical derivation. Although several AD approaches can be considered, our focus is on the use of template overloading forward AD tools in C++ applications. We demonstrate this hybrid strategy for a system of partial differential equations in gas dynamics. These methods apply however to other systems of differentiable equations, including DAEs and ODEs.

198 citations


Journal ArticleDOI
TL;DR: The Open/ADF tool allows the evaluation of derivatives of functions defined by a Fortran program, and supports various code reversal schemes with hierarchical checkpointing at the subroutine level for the generation of adjoint codes.
Abstract: The Open/ADF tool allows the evaluation of derivatives of functions defined by a Fortran program. The derivative evaluation is performed by a Fortran code resulting from the analysis and transformation of the original program that defines the function of interest. Open/ADF has been designed with a particular emphasis on modularity, flexibility, and the use of open source components. While the code transformation follows the basic principles of automatic differentiation, the tool implements new algorithmic approaches at various levels, for example, for basic block preaccumulation and call graph reversal. Unlike most other automatic differentiation tools, Open/ADF uses components provided by the Open/AD framework, which supports a comparatively easy extension of the code transformations in a language-independent fashion. It uses code analysis results implemented in the OpenAnalysis component. The interface to the language-independent transformation engine is an XML-based format, specified through an XML schema. The implemented transformation algorithms allow efficient derivative computations using locally optimized cross-country sequences of vertex, edge, and face elimination steps. Specifically, for the generation of adjoint codes, Open/ADF supports various code reversal schemes with hierarchical checkpointing at the subroutine level. As an example from geophysical fluid dynamics, a nonlinear time-dependent scalable, yet simple, barotropic ocean model is considered. OpenAD/F's reverse mode is applied to compute sensitivities of some of the model's transport properties with respect to gridded fields such as bottom topography as independent (control) variables.

170 citations


Journal ArticleDOI
TL;DR: The introduced algorithms are based on an extension to Filippov's method to stabilise the sliding flow together with accurate detection of the entrance and exit of sliding regions.
Abstract: This article describes how to use smooth solvers for simulation of a class of piecewise smooth systems of ordinary differential equations, called Filippov systems, with discontinuous vector fields. In these systems constrained motion along a discontinuity surface (so-called sliding) is possible and requires special treatment numerically. The introduced algorithms are based on an extension to Filippov's method to stabilise the sliding flow together with accurate detection of the entrance and exit of sliding regions. The methods are implemented in a general way in MATLAB and sufficient details are given to enable users to modify the code to run on arbitrary examples. Here, the method is used to compute the dynamics of three example systems, a dry-friction oscillator, a relay feedback system and a model of an oil well drill-string.

141 citations


Journal ArticleDOI
Daniel Kressner1
TL;DR: This article discusses the efficient implementation of Hammarling's method and proposes among other algorithmic improvements a block variant, which is demonstrated to perform significantly better than existing implementations.
Abstract: This article is concerned with the efficient numerical solution of the Lyapunov equation ATX + XA = -C with a stable matrix A and a symmetric positive semidefinite matrix C of possibly small rank. We discuss the efficient implementation of Hammarling's method and propose among other algorithmic improvements a block variant, which is demonstrated to perform significantly better than existing implementations. An extension to the discrete-time Lyapunov equation ATXA - X = -C is also described.

139 citations


Journal ArticleDOI
TL;DR: The approach presented here can apply not only to conventional processors but also to exotic technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the Cell BE processor.
Abstract: By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. These ideas can be applied to sparse multifrontal and supernodal direct techniques and sparse iterative techniques such as Krylov subspace methods. The approach presented here can apply not only to conventional processors but also to exotic technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the Cell BE processor.

100 citations


Journal ArticleDOI
TL;DR: This work states different algorithms for each of these sweeps of the inversion of a Symmetric Positive Definite matrix as well as algorithms that compute the result in a single sweep and outperforms the current ScaLAPACK implementation by 20-30 percent due to improved load-balance on a distributed memory architecture.
Abstract: We study the high-performance implementation of the inversion of a Symmetric Positive Definite (SPD) matrix on architectures ranging from sequential processors to Symmetric MultiProcessors to distributed memory parallel computers. This inversion is traditionally accomplished in three “sweeps”: a Cholesky factorization of the SPD matrix, the inversion of the resulting triangular matrix, and finally the multiplication of the inverted triangular matrix by its own transpose. We state different algorithms for each of these sweeps as well as algorithms that compute the result in a single sweep. One algorithm outperforms the current ScaLAPACK implementation by 20-30 percent due to improved load-balance on a distributed memory architecture.

Journal ArticleDOI
TL;DR: This article studies high performance implementations of basic linear algebra routines over word size prime fields: especially matrix multiplication, with the goal being to provide an exact alternate to the numerical BLAS library.
Abstract: In the past two decades, some major efforts have been made to reduce exact (e.g. integer, rational, polynomial) linear algebra problems to matrix multiplication in order to provide algorithms with optimal asymptotic complexity. To provide efficient implementations of such algorithms one need to be careful with the underlying arithmetic. It is well known that modular techniques such as the Chinese remainder algorithm or the p-adic lifting allow very good practical performance, especially when word size arithmetic is used. Therefore, finite field arithmetic becomes an important core for efficient exact linear algebra libraries. In this article, we study high performance implementations of basic linear algebra routines over word size prime fields: especially matrix multiplication; our goal being to provide an exact alternate to the numerical BLAS library. We show that this is made possible by a careful combination of numerical computations and asymptotically faster algorithms. Our kernel has several symbolic linear algebra applications enabled by diverse matrix multiplication reductions: symbolic triangularization, system solving, determinant, and matrix inverse implementations are thus studied.

Journal Article
TL;DR: DSDP implements the dual-scaling algorithm for semidefinite programming, written entirely in ANSI C, and has been used in many applications and tested for efficiency, robustness, and ease of use.
Abstract: DSDP implements the dual-scaling algorithm for semidefinite programming. The source code for this interior-point algorithm, written entirely in ANSI C, is freely available. The solver can be used as a subroutine library, as a function within the Matlab environment, or as an executable that reads and writes to data files. Initiated in 1997, DSDP has developed into an efficient and robust general-purpose solver for semidefinite programming. Its features include a convergence proof with polynomially bounded worst-case complexity, primal and dual feasible solutions when they exist, certificates of infeasibility when solutions do not exist, initial points that can be feasible or infeasible, relatively low memory requirements for an interior-point method, sparse and lowrank data structures, extensibility that allows applications to customize the solver and improve its performance, a subroutine library that enables it to be linked to larger applications, scalable performance for large problems on parallel architectures, and a well-documented interface and examples of its use. The package has been used in many applications and tested for efficiency, robustness, and ease of use.

Journal ArticleDOI
TL;DR: This work shows how to compute an LU factorization of a matrix when the factors of a leading principle submatrix are already known, and incorporates pivoting akin to partial pivoting, a strategy it calls incremental pivoting.
Abstract: We show how to compute an LU factorization of a matrix when the factors of a leading principle submatrix are already known. The approach incorporates pivoting akin to partial pivoting, a strategy we call incremental pivoting. An implementation using the Formal Linear Algebra Methods Environment (FLAME) application programming interface (API) is described. Experimental results demonstrate practical numerical stability and high performance on an Intel Itanium2 processor-based server.

Journal ArticleDOI
TL;DR: The algorithms and user interface of a Matlab program, Fie, are presented that solves numerically Fredholm integral equations of the second kind on an interval [a, b] to a specified, modest accuracy.
Abstract: We present here the algorithms and user interface of a Matlab program, Fie, that solves numerically Fredholm integral equations of the second kind on an interval [a,b] to a specified, modest accuracy. The kernel function K(s,t) is moderately smooth on [a,b] ×[a,b] except possibly across the diagonal s = t. If the interval is finite, provides for kernel functions that behave in a variety of ways across the diagonal, that is, K(s,t) may be smooth, have a discontinuity in a low-order derivative, have a logarithmic singularity, or have an algebraic singularity. Fie also solves a large class of integral equations with moderately smooth kernel function on [0, ∞ ).

Journal ArticleDOI
TL;DR: A MATLAB 6.0 implementation of the LSTRS method, designed for large-scale quadratic problems with one norm constraint, and of the MATLAB software, version 1.2, are presented.
Abstract: A MATLAB 6.0 implementation of the LSTRS method is presented. LSTRS was described in Rojas et al. l2000r. LSTRS is designed for large-scale quadratic problems with one norm constraint. The method is based on a reformulation of the trust-region subproblem as a parameterized eigenvalue problem, and consists of an iterative procedure that finds the optimal value for the parameter. The adjustment of the parameter requires the solution of a large-scale eigenvalue problem at each step. LSTRS relies on matrix-vector products only and has low and fixed storage requirements, features that make it suitable for large-scale computations. In the MATLAB implementation, the Hessian matrix of the quadratic objective function can be specified either explicitly, or in the form of a matrix-vector multiplication routine. Therefore, the implementation preserves the matrix-free nature of the method. A description of the LSTRS method and of the MATLAB software, version 1.2, is presented. Comparisons with other techniques and applications of the method are also included. A guide for using the software and examples are provided.

Journal ArticleDOI
TL;DR: A collection of MATLAB classes for computing and using spherical harmonic transforms is presented and the use of the spectral synthesis and analysis algorithms using fast Fourier transforms and Legendre transforms with the associated Legendre functions is demonstrated.
Abstract: A collection of MATLAB classes for computing and using spherical harmonic transforms is presented. Methods of these classes compute differential operators on the sphere and are used to solve simple partial differential equations in a spherical geometry. The spectral synthesis and analysis algorithms using fast Fourier transforms and Legendre transforms with the associated Legendre functions are presented in detail. A set of methods associated with a spectral_field class provides spectral approximation to the differential operators ∇ c, ∇ ×, ∇, and ∇2 in spherical geometry. Laplace inversion and Helmholtz equation solvers are also methods for this class. The use of the class and methods in MATLAB is demonstrated by the solution of the barotropic vorticity equation on the sphere. A survey of alternative algorithms is given and implementations for parallel high performance computers are discussed in the context of global climate and weather models.

Journal ArticleDOI
TL;DR: This article presents an efficient algorithm for solving a relevant subclass of this combinatorial optimization problem that aims to minimize the number of arithmetic operations performed by the elimination algorithm and relies on the ability to compute vertex covers in bipartite graphs in polynomial time.
Abstract: The source transformation tool for automatic differentiation of Fortran programs ADIFOR uses a preaccumulation technique to speed up tangent-linear codes significantly compared to the standard forward mode. Reverse mode automatic differentiation is applied to all scalar assignments to generate efficient code for the computation of local gradients. It has been well known for some time that reverse mode is not necessarily the optimal choice for the computation of these statement-level gradients as it does not minimize the number of operations required. This article presents an efficient algorithm for the solution of this combinatorial optimization problem. The corresponding software is freely available for downloading on our website. Developers of software for automatic differentiation are invited to integrate the algorithm into their tools.Gradients of scalar multivariate functions can be computed by elimination methods on the linearized computational graph. The combinatorial optimization problem that aims to minimize the number of arithmetic operations performed by the elimination algorithm is known to be NP-complete. In this article we present a polynomial algorithm for solving a relevant subclass of this problem's instances. The proposed method relies on the ability to compute vertex covers in bipartite graphs in polynomial time. A simplified version of this graph algorithm is used in a research prototype of the differentiation-enabled NAGWare Fortran compiler for the preaccumulation of local gradients of scalar assignments in the context of automatic generation of efficient tangent-linear code for numerical programs.

Journal ArticleDOI
TL;DR: An algorithm and a software are presented for the parallel constrained Delaunay mesh generation in two dimensions based on the decomposition of the original mesh generation problem into N smaller subproblems which are meshed in parallel.
Abstract: Delaunay refinement is a widely used method for the construction of guaranteed quality triangular and tetrahedral meshes We present an algorithm and a software for the parallel constrained Delaunay mesh generation in two dimensions Our approach is based on the decomposition of the original mesh generation problem into N smaller subproblems which are meshed in parallel The parallel algorithm is asynchronous with small messages which can be aggregated and exhibits low communication costs On a heterogeneous cluster of more than 100 processors our implementation can generate over one billion triangles in less than 3 minutes, while the single-node performance is comparable to that of the fastest to our knowledge sequential guaranteed quality Delaunay meshing library (the Triangle)

Journal ArticleDOI
TL;DR: An infrastructure for exhaustively testing the symmetric tridiagonal eigensolvers implemented in LAPACK is described, which consists of a selection of carefully chosen test matrices with particular idiosyncrasies and a portable testing framework that allows for easy testing and data processing.
Abstract: LAPACK is often mentioned as a positive example of a software library that encapsulates complex, robust, and widely used numerical algorithms for a wide range of applications. At installation time, the user has the option of running a (limited) number of test cases to verify the integrity of the installation process. On the algorithm developer's side, however, more exhaustive tests are usually performed to study algorithm behavior on a variety of problem settings and also computer architectures. In this process, difficult test cases need to be found that reflect particular challenges of an application or push algorithms to extreme behavior. These tests are then assembled into a comprehensive collection, therefore making it possible for any new or competing algorithm to be stressed in a similar way. This article describes an infrastructure for exhaustively testing the symmetric tridiagonal eigensolvers implemented in LAPACK. It consists of two parts: a selection of carefully chosen test matrices with particular idiosyncrasies and a portable testing framework that allows for easy testing and data processing. The tester facilitates experiments with algorithmic choices, parameter and threshold studies, and performance comparisons on different architectures.

Journal ArticleDOI
TL;DR: DSDP implements the dual-scaling algorithm for semidefinite programming, written entirely in ANSI C, and has been used in many applications and tested for efficiency, robustness, and ease of use.
Abstract: DSDP implements the dual-scaling algorithm for semidefinite programming. The source code for this interior-point algorithm, written entirely in ANSI C, is freely available under an open source license. The solver can be used as a subroutine library, as a function within the Matlab environment, or as an executable that reads and writes to data files. Initiated in 1997, DSDP has developed into an efficient and robust general-purpose solver for semidefinite programming. Its features include a convergence proof with polynomially bounded worst-case complexity, primal and dual feasible solutions when they exist, certificates of infeasibility when solutions do not exist, initial points that can be feasible or infeasible, relatively low memory requirements for an interior-point method, sparse and low-rank data structures, extensibility that allows applications to customize the solver and improve its performance, a subroutine library that enables it to be linked to larger applications, scalable performance for large problems on parallel architectures, and a well-documented interface and examples of its use. The package has been used in many applications and tested for efficiency, robustness, and ease of use.

Journal ArticleDOI
TL;DR: An unexpected and rather erratic behavior of the LAPACK software implementation of the QR factorization with Businger-Golub column pivoting is reported and a new, equally efficient, and provably numerically safe partial-column norm-updating strategy is provided.
Abstract: This article reports an unexpected and rather erratic behavior of the LAPACK software implementation of the QR factorization with Businger-Golub column pivoting. It is shown that, due to finite precision arithmetic, the software implementation of the factorization can catastrophically fail to produce a properly structured triangular factor, thus leading to a potentially severe underestimate of a matrix's numerical rank. The 30-year old problem, dating back to LINPACK, has (undetectedly) badly affected many computational routines and software packages, as well as the study of rank-revealing QR factorizations. We combine computer experiments and numerical analysis to isolate, analyze, and fix the problem. Our modification of the current LAPACK xGEQP3 routine is already included in the LAPACK 3.1.0 release. The modified routine is numerically more robust and with a negligible overhead. We also provide a new, equally efficient, and provably numerically safe partial-column norm-updating strategy.

Journal ArticleDOI
TL;DR: A new algorithm to evaluate the basis functions of the Argyris finite element and their derivatives is proposed, an efficient way to calculate the matrix which gives the change of coordinates between the bases of theArgyis element for the reference and for an arbitrary triangle is presented.
Abstract: In this work we propose a new algorithm to evaluate the basis functions of the Argyris finite element and their derivatives. The main novelty here is an efficient way to calculate the matrix which gives the change of coordinates between the bases of the Argyis element for the reference and for an arbitrary triangle. This matrix is factored as the product of two rectangular matrices with a strong block structure which makes their computation very easy. We show and comment on an implementation of this algorithm in Matlab. Two numerical experiments, an interpolation of a smooth function on a triangle and the finite-element solution of the Dirichlet problem for the biLaplacian, are presented in the last section to check the performance of our implementation.

Journal ArticleDOI
TL;DR: The SSOR algorithm, applied to the normal equations, can be accelerated by the conjugate gradient algorithm (CG), and the resulting algorithm, called CGMN, is error-reducing and in theory it always converges even when the equation system is inconsistent and/or nonsquare.
Abstract: Given a linear system Ax = b, one can construct a related “normal equations” system AATy = b, x = ATy. Bjorck and Elfving have shown that the SSOR algorithm, applied to the normal equations, can be accelerated by the conjugate gradient algorithm (CG). The resulting algorithm, called CGMN, is error-reducing and in theory it always converges even when the equation system is inconsistent and/or nonsquare. SSOR on the normal equations is equivalent to the Kaczmarz algorithm (KACZ), with a fixed relaxation parameter, run in a double (forward and backward) sweep on the original equations. CGMN was tested on nine well-known large and sparse linear systems obtained by central-difference discretization of elliptic convection-diffusion partial differential equations (PDEs). Eight of the PDEs were strongly convection-dominated, and these are known to produce very stiff systems with large off-diagonal elements. CGMN was compared with some of the foremost state-of-the art Krylov subspace methods: restarted GMRES, Bi-CGSTAB, and CGS. These methods were tested both with and without various preconditioners. CGMN converged in all the cases, while none of the preceding algorithm/preconditioner combinations achieved this level of robustness. Furthermore, on varying grid sizes, there was only a gradual increase in the number of iterations as the grid was refined. On the eight convection-dominated cases, the initial convergence rate of CGMN was better than all the other combinations of algorithms and preconditioners, and the residual decreased monotonically. The CGNR algorithm was also tested, and it was as robust as CGMN, but slower.

Journal ArticleDOI
TL;DR: A new approach for computing a sparsity pattern for a Hessian is presented: nonlinearity information is propagated through the function evaluation yielding the nonzero structure.
Abstract: A new approach for computing a sparsity pattern for a Hessian is presented: nonlinearity information is propagated through the function evaluation yielding the nonzero structure. A complexity analysis of the proposed algorithm is given. Once the sparsity pattern is available, coloring algorithms can be applied to compute a seed matrix. To evaluate the product of the Hessian and the seed matrix, a vector version for evaluating second order adjoints is analysed. New drivers of ADOL-C are provided implementing the presented algorithms. Runtime analyses are given for some problems of the CUTE collection.

Journal ArticleDOI
TL;DR: This paper reorganizes the sequence of operations for Householder bidiagonalization of a general m × n matrix, so that two (_GEMV) vector-matrix multiplications can be done with one pass of the unreduced trailing part of the matrix through cache.
Abstract: On cache based computer architectures using current standard algorithms, Householder bidiagonalization requires a significant portion of the execution time for computing matrix singular values and vectors. In this paper we reorganize the sequence of operations for Householder bidiagonalization of a general m × n matrix, so that two (_GEMV) vector-matrix multiplications can be done with one pass of the unreduced trailing part of the matrix through cache. Two new BLAS operations approximately cut in half the transfer of data from main memory to cache, reducing execution times by up to 25 per cent. We give detailed algorithm descriptions and compare timings with the current LAPACK bidiagonalization algorithm.

Journal ArticleDOI
TL;DR: It is shown that the AMLS method, when implemented carefully, outperforms the traditional method in broad application areas when large numbers of eigenvalues are sought, with relatively low accuracy.
Abstract: We describe an efficient implementation and present a performance study of an automated multi-level substructuring (AMLS) method for sparse eigenvalue problems. We assess the time and memory requirements associated with the key steps of the algorithm, and compare it with the shift-and-invert Lanczos algorithm. Our eigenvalue problems come from two very different application areas: accelerator cavity design and normal-mode vibrational analysis of polyethylene particles. We show that the AMLS method, when implemented carefully, outperforms the traditional method in broad application areas when large numbers of eigenvalues are sought, with relatively low accuracy.

Journal ArticleDOI
TL;DR: A numerical procedure to compute the nodes and weights in rational Gauss-Chebyshev quadrature formulas is presented, finding that under certain conditions on the poles, these nodes are near best for rational interpolation with prescribed poles.
Abstract: We present a numerical procedure to compute the nodes and weights in rational Gauss-Chebyshev quadrature formulas. Under certain conditions on the poles, these nodes are near best for rational interpolation with prescribed poles (in the same sense that Chebyshev points are near best for polynomial interpolation). As an illustration, we use these interpolation points to solve a differential equation with an interior boundary layer using a rational spectral method.The algorithm to compute the interpolation points (and, if required, the quadrature weights) is implemented as a Matlab program.

Journal ArticleDOI
TL;DR: The design of general, flexible, consistent, reusable, and efficient interfaces to software libraries for the direct solution of systems of linear equations on both serial and distributed memory architectures is discussed.
Abstract: We discuss the design of general, flexible, consistent, reusable, and efficient interfaces to software libraries for the direct solution of systems of linear equations on both serial and distributed memory architectures. We introduce a set of abstract classes to access the linear system matrix elements and their distribution, access vector elements, and control the solution of the linear system. We describe a concrete implementation of the proposed interfaces, and report examples of applications and numerical results showing that the overhead induced by the object-oriented design is negligible under typical conditions of usage. We include examples of applications, and we comment on the advantages and limitations of the design.

Journal ArticleDOI
TL;DR: The N-way decomposition is based on the “divide and conquer” algorithmic paradigm and on a smoothing procedure that eliminates the creation of any new artifacts in the subdomains.
Abstract: We present a geometric domain decomposition method and its implementation, which produces good domain decompositions in terms of three basic criteria: (1) The boundary of the subdomains create good angles, that is, angles no smaller than a given tolerance Φ0, where the value of Φ0 is determined by the application which will use the domain decomposition. (2) The size of the separator should be relatively small compared to the area of the subdomains. (3) The maximum area of the subdomains should be close to the average subdomain area. The domain decomposition method uses an approximation of a Medial Axis as an auxiliary structure for constructing the boundary of the subdomains (separators). The N-way decomposition is based on the “divide and conquer” algorithmic paradigm and on a smoothing procedure that eliminates the creation of any new artifacts in the subdomains. This approach produces well shaped uniform and graded domain decompositions, which are suitable for parallel mesh generation.