scispace - formally typeset
Search or ask a question

Showing papers in "Siam Journal on Scientific and Statistical Computing in 1992"


Journal ArticleDOI
TL;DR: Numerical experiments indicate that the new variant of Bi-CG, named Bi- CGSTAB, is often much more efficient than CG-S, so that in some cases rounding errors can even result in severe cancellation effects in the solution.
Abstract: Recently the Conjugate Gradients-Squared (CG-S) method has been proposed as an attractive variant of the Bi-Conjugate Gradients (Bi-CG) method. However, it has been observed that CG-S may lead to a rather irregular convergence behaviour, so that in some cases rounding errors can even result in severe cancellation effects in the solution. In this paper, another variant of Bi-CG is proposed which does not seem to suffer from these negative effects. Numerical experiments indicate also that the new variant, named Bi-CGSTAB, is often much more efficient than CG-S.

4,722 citations


Journal ArticleDOI
TL;DR: In this paper, a branch-and-bound algorithm for linear bilevel programming is proposed, where necessary optimality conditions expressed in terms of tightness of the follower's constraints are used to fathom or simplify subproblems, branch and obtain penalties similar to those used in mixed-integer programming.
Abstract: A new branch-and-bound algorithm for linear bilevel programming is proposed. Necessary optimality conditions expressed in terms of tightness of the follower’s constraints are used to fathom or simplify subproblems, branch and obtain penalties similar to those used in mixed-integer programming. Computational results are reported and compare favorably to those of previous methods. Problems with up to 150 constraints, 250 variables controlled by the leader, and 150 variables controlled by the follower have been solved.

433 citations


Journal ArticleDOI
TL;DR: Conjugate gradient-type iterations which are based on a variant of the nonsymmetric Lanczos algorithm for complex symmetric matrices are investigated and a new approach with iterates defined by a quasi-minimal residual property is proposed, which presents several advantages over the standard biconjugates method.
Abstract: We consider conjugate gradient type methods for the solution of large sparse linear system Ax equals b with complex symmetric coefficient matrices A equals A(T). Such linear systems arise in important applications, such as the numerical solution of the complex Helmholtz equation. Furthermore, most complex non-Hermitian linear systems which occur in practice are actually complex symmetric. We investigate conjugate gradient type iterations which are based on a variant of the nonsymmetric Lanczos algorithm for complex symmetric matrices. We propose a new approach with iterates defined by a quasi-minimal residual property. The resulting algorithm presents several advantages over the standard biconjugate gradient method. We also include some remarks on the obvious approach to general complex linear systems by solving equivalent real linear systems for the real and imaginary parts of x. Finally, numerical experiments for linear systems arising from the complex Helmholtz equation are reported.

384 citations


Journal ArticleDOI
TL;DR: This paper takes a new look at numerical techniques for solving parabolic equations by the method of lines by approximate the action of the evolution operator on a given state vector by means of a projection process onto a Krylov subspace.
Abstract: This paper takes a new look at numerical techniques for solving parabolic equations by the method of lines. The main motivation for the proposed approach is the possibility of exploiting a high degree of parallelism in a simple manner. The basic idea of the method is to approximate the action of the evolution operator on a given state vector by means of a projection process onto a Krylov subspace. Thus the resulting approximation consists of applying an evolution operator of very small dimension to a known vector, which is, in turn, computed accurately by exploiting high-order rational Chebyshev and Pade approximations to the exponential. Because the rational approximation is only applied to a small matrix, the only operations required with the original large matrix are matrix-by-vector multiplications and, as a result, the algorithm can easily be parallelized and vectorized. Further parallelism is introduced by expanding the rational approximations into partial fractions. Some relevant approximation and ...

342 citations


Journal ArticleDOI
TL;DR: An implementation is presented of the fast multipole method, which uses approximations based on Poisson’s formula, and results are given that show the importance of good level selection.
Abstract: An implementation is presented of the fast multipole method, which uses approximations based on Poisson’s formula. Details for the implementation in both two and three dimensions are given. Also discussed is how the multigrid aspect of the fast multipole method can be exploited to yield efficient programming procedures. The issue of the selection of an appropriate refinement level for the method is addressed. Computational results are given that show the importance of good level selection. An efficient technique that can be used to determine an optimal level to choose for the method is presented.

207 citations


Journal ArticleDOI
TL;DR: A domain decomposition algorithm based on a hybrid variational principle is developed for the parallel finite element solution of selfadjoint elliptic partial differential equations, which requires fewer interprocessor communications than conventional Schur methods.
Abstract: A domain decomposition algorithm based on a hybrid variational principle is developed for the parallel finite element solution of selfadjoint elliptic partial differential equations. The spatial domain is partitioned into a set of totally disconnected subdomains, each assigned to an individual processor. Lagrange multipliers are introduced to enforce compatibility at the interface points. Within each subdomain, the singularity due to the disconnection is resolved in a two-step procedure. First, the null space component of each local operator is eliminated from the local problem. Next, its contribution to the local solution is related to the Lagrange multipliers through an orthogonality condition. Finally, a conjugate projected gradient algorithm is developed for the solution of the coupled system of local null space components and Lagrange multipliers. When implemented on local memory multiprocessors, the proposed hybrid method requires fewer interprocessor communications than conventional Schur methods. ...

192 citations


Journal ArticleDOI
TL;DR: This is illustrated by showing how the rank revealing QR factorization can be used to compute solutions to rank deficient least squares problems, to perform subset selection, to compute matrix approximations of given rank, and to solve total least square problems.
Abstract: The rank revealing QR factorization of a rectangular matrix can sometimes be used as a reliable and efficient computational alternative to the singular value decomposition for problems that involve rank determination. This is illustrated by showing how the rank revealing QR factorization can be used to compute solutions to rank deficient least squares problems, to perform subset selection, to compute matrix approximations of given rank, and to solve total least squares problems.

181 citations


Journal ArticleDOI
TL;DR: The results of the above methods at various values of the time-step size are compared in order to explore the numerical stability of the computation.
Abstract: This paper describes thee different numerical methods for the computation of flows with moving immersed elastic boundaries. A two-dimensional incompressible fluid and a boundary in the form of a simple closed curve are considered. The inertia is assumed to be negligible and the Stokes equations are solved. The three methods are explicit, approximate-implicit, and implicit. The first two have been used before, but the implicit method is new in the context of flows with moving immersed boundaries. They differ only with respect to the computation of the boundary force. The results of the above methods at various values of the time-step size are compared in order to explore the numerical stability of the computation.

168 citations


Journal ArticleDOI
TL;DR: A modification of the truncated SVD method is presented, which solves the more general problem, where L is a general matrix with full row rank, and it is shown how this can be accomplished with little extra computational effort.
Abstract: The truncated singular value decomposition (SVD) method is useful for solving the standard-form regularization problem: $\min ||{\bf x}||_2 $ subject to $\min ||A{\bf x} - {\bf b}||_2 $. This paper...

117 citations


Journal ArticleDOI
TL;DR: The solution of block system A_{mn} x = b by the preconditioned conjugate gradient method where A is an m- by-m block matrix with n-by-n Toeplitz blocks is studied.
Abstract: The solution of block system $A_{mn} x = b$ by the preconditioned conjugate gradient method where $A_{mn} $ is an m-by-m block matrix with n-by-n Toeplitz blocks is studied. The preconditioner $c_F...

102 citations


Journal ArticleDOI
TL;DR: A row partitioning approach is described that yields parallel implementations suitable for a wide range of computer architectures, requires only a few vectors of extra storage, and allows computing the necessary projections with small errors.
Abstract: Three conjugate gradient accelerated row projection (RP) methods for nonsymmetric linear systems are presented and their properties described. One method is based on Kaczmarz’s method and has an iteration matrix that is the product of orthogonal projectors; another is based on Cimmino’s method and has an iteration matrix that is the sum of orthogonal projectors. A new RP method, which requires fewer matrix-vector operations, explicitly reduces the problem size, is error reducing in the two-norm, and consistently produces better solutions than other RP algorithms, is also introduced. Using comparisons with the method of conjugate gradient applied to the normal equations, the properties of RP methods are explained.A row partitioning approach is described that yields parallel implementations suitable for a wide range of computer architectures, requires only a few vectors of extra storage, and allows computing the necessary projections with small errors. Numerical testing verifies the robustness of this appro...

Journal ArticleDOI
TL;DR: A more unified approach to the combined process of adaptive refinement and multigrid solution which can be used with high order finite elements and requires only $O(N)$ operations, even for highly nonuniform grids, and is defined for any finite element space.
Abstract: Many elliptic partial differential equations can be solved numerically with near optimal efficiency through the uses of adaptive refinement and multigrid solution techniques. This paper presents a more unified approach to the combined process of adaptive refinement and multigrid solution which can be used with high order finite elements. Refinement is achieved by the bisection of pairs of triangles, corresponding to the addition of one or more basis functions to the approximation space. An approximation of the resulting change in the solution is used as an error indicator. The multigrid iteration uses red-black Gauss-Seidel relaxation with local black relaxations. The grid transfers use the change between the nodal and hierarchical bases. This multigrid iteration requires only $O(N)$ operations, even for highly nonuniform grids, and is defined for any finite element space. The full multigrid method is an optimal blending of the processes of adaptive refinement and multigrid iteration. To minimize the numb...

Journal ArticleDOI
TL;DR: Schwarz Splitting has been proposed as an extension of SAM in numerical linear algebra, and a generalized SS is presented in this paper, which allows utilization of the flexibi...
Abstract: A classic mathematical technique, the Schwarz Alternating Method (SAM), has recently attracted much attention from researchers in the field of parallel computations, as well as theoreticians. Its advantages in parallelism, wide applicability and great flexibility in implementation make SAM a competitive choice in parallel computations. However, the computational performance of the classical SAM and its modern extensions strongly depend on the amount of overlap between the neighboring subregions. Introducing a large overlap has changed the image of SAM from an impractical theoretical technique to a rewarding numerical approach. However, the duplication of work in these overlapped regions is undesirable. Reducing the amount of overlap without affecting the speed of convergence has become an important performance issue.Schwarz Splitting (SS) has been proposed as an extension of SAM in numerical linear algebra, and a generalized SS is presented in this paper. The new approach allows utilization of the flexibi...

Journal ArticleDOI
TL;DR: It is shown how the free parameters of these Runge–Kutta methods can be used either to minimize the continuous truncation error coefficients or to maximize the stability region.
Abstract: Continuous, explicit Runge–Kutta methods with the minimal number of stages are considered. These methods are continuously differentiable if and only if one of the stages is the FSAL evaluation. A characterization of a subclass of these methods is developed for orders 3, 4, and 5. It is shown how the free parameters of these methods can be used either to minimize the continuous truncation error coefficients or to maximize the stability region. As a representative for these methods the fifth-order method with minimized error coefficients is chosen, supplied with an error estimation method, and analysed by using the DETEST software. The results are compared with a similar implementation of the Dormand–Prince 5(4) pair with interpolant, showing a significant advantage in the new method for the chosen problems.

Journal ArticleDOI
TL;DR: It is concluded that a regular data parallel architecture with a supernodal, multifrontal algorithm whose inner loop performs several dense factorizations simultaneously on a two-dimensional grid of processors is the most promising alternative.
Abstract: This paper develops and compares several fine-grained parallel algorithms to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a...

Journal ArticleDOI
TL;DR: It is shown that a triangulation of a set of n points in the plane that minimizes the maximum angle can be computed in time $O (n^2 \log n)$ and space $O(n)$.
Abstract: It is shown that a triangulation of a set of n points in the plane that minimizes the maximum angle can be computed in time $O(n^2 \log n)$ and space $O(n)$. The algorithm is fairly easy to implement and is based on the edge-insertion scheme that iteratively improves an arbitrary initial triangulation. It can be extended to the case where edges are prescribed, and, within the same time- and space-bounds, it can lexicographically minimize the sorted angle vector if the point set is in general position. Experimental results on the efficiency of the algorithm and the quality of the triangulations obtained are included.

Journal ArticleDOI
TL;DR: Three mixed finite-element methods for approximating Maxwell’s equations are compared and it is suggested that all three methods could be the basis of a successful three-dimensional code.
Abstract: Three mixed finite-element methods for approximating Maxwell’s equations are compared. A dispersion analysis provides a Courant–Friedrichs–Lewy (CFL) bound that is necessary for convergence when a uniform mesh is used. The dispersion analysis also allows a comparison of the stability properties of the methods. Superconvergence at the interpolation points is proved for uniform grids, and demonstrated by three numerical examples. All three methods are shown to be able to handle discontinuous media without modification of the finite-element spaces. Since all three methods have three-dimensional counterparts, this study suggests that all three methods could be the basis of a successful three-dimensional code.

Journal ArticleDOI
TL;DR: This paper describes implementations of eight algorithms of Newton and quasi-Newton type for solving large sparse systems of nonlinear equations, using a symbolic manipulation as well as a static data structure introduced recently by George and Ng.
Abstract: This paper describes implementations of eight algorithms of Newton and quasi-Newton type for solving large sparse systems of nonlinear equations. For linear algebra calculations, a symbolic manipulation is used, as well as a static data structure introduced recently by George and Ng, which allows a partial pivoting strategy for solving linear systems. A numerical comparison of the implemented methods is presented.

Journal ArticleDOI
TL;DR: It is shown here that a structured orthogonal factorization technique can be used to solve this system of linear equations, and hence the overall problem, in an efficient, parallel, and stable way.
Abstract: Some of the most widely used algorithms for two-point boundary value ordinary differential equations, namely, finite-difference and collocation methods and standard multiple shooting, proceed by setting up and solving a structured system of linear equations. It is well known that the linear system can be set up efficiently in parallel; we show here that a structured orthogonal factorization technique can be used to solve this system, and hence the overall problem, in an efficient, parallel, and stable way.

Journal ArticleDOI
TL;DR: A block version of Cimmino’s algorithm for solving general sets of consistent sparse linear equations is described and it is shown how the basic method can be accelerated by using the conjugate gradient (CG) algorithm.
Abstract: A block version of Cimmino’s algorithm for solving general sets of consistent sparse linear equations is described. The case of matrices in block tridiagonal form is emphasized because it is assumed that the general case can be reduced to this form by permutations. It is shown how the basic method can be accelerated by using the conjugate gradient (CG) algorithm. This acceleration is very dependent on a partitioning of the original system and several possible partitionings are discussed. Underdetermined systems corresponding to the subproblems of the partitioned system are solved using the Harwell sparse symmetric indefinite solver MA27 on an augmented system. These systems are independent and can be solved in parallel. An analysis of the iteration matrix for the conjugate gradient acceleration leads to the consideration of rather unusual and novel scalings of the matrix that alter the spectrum of the iteration matrix to reduce the number of CG iterations.The various aspects of this algorithm have been te...

Journal ArticleDOI
TL;DR: It is shown that a three-dimensional Delaunay triangulation does not generally produce a discretisation satisfying the condition of positive interior connection condition, and it is generally not possible to produce aThree-dimensional triangulations that satisfies the positive interior connected condition.
Abstract: The second-order diffusion operator given by $\mathcal{L}u = abla \cdot ({\bf K} abla u)$ is studied in terms of finite element numerical solutions. When a standard Galerkin finite element approximation of the operator is arranged in a specific manner, a condition of positive connection values is imposed that is necessary to produce an M-matrix in a linear system when boundary nodes are properly handled. In two dimensions, the condition is achieved when a Delaunay triangulation is imposed. However, it is shown that a three-dimensional Delaunay triangulation does not generally produce a discretisation satisfying the condition. Further, it is generally not possible to produce a three-dimensional triangulation that satisfies the positive interior connection condition.

Journal ArticleDOI
TL;DR: An algorithm that overcomes the limitations of the standard marching schemes by solving the problem at all the time levels simultaneously is discussed and its performance is compared to that of the best standard solvers.
Abstract: The numerical solution of a parabolic partial differential equation is usually calculated by a timestepping method. This precludes the efficient use of vectorization and parallelism if the problem to be solved on each time level is not very large. In this paper an algorithm that overcomes the limitations of the standard marching schemes by solving the problem at all the time levels simultaneously is discussed. The method is applicable to linear and nonlinear problems on arbitrary domains. It can be used to solve initial-boundary value problems as well as time-periodic equations. We have implemented the method on an Intel iPSC/2-VX hypercube. The numerical properties of the method are illustrated by two numerical examples and its performance is compared to that of the best standard solvers.

Journal ArticleDOI
TL;DR: In this paper, a generalized prime factor fast Fourier transform (FFT) algorithm was proposed, which can self-sort and in-place simultaneously and has a lower operation count than conventional FFT algorithms.
Abstract: Prime factor fast Fourier transform (FFT) algorithms have two important advantages: they can be simultaneously self-sorting and in-place, and they have a lower operation count than conventional FFT algorithms. The major disadvantage of the prime factor FFT has been that it was only applicable to a limited set of values of the transform length N. This paper presents a generalized prime factor FFT, which is applicable for any $N = 2^p 3^q 5^r $, while maintaining both the self-sorting in-place capability and the lower operation count. Timing experiments on the Cray Y-MP demonstrate the advantages of the new algorithm.

Journal ArticleDOI
TL;DR: Certain algebraic properties of the corresponding matrices of the derived finite difference schemes are verified, thus allowing the recently proposed algebraic theory for the Bramble–Ewing–Pasciak–Schatz (BEPS) and Fact Adaptive Composite (FAC) two-grid preconditioners to apply.
Abstract: Based on approximation of the balance equation, finite difference schemes on triangular cell-centered grids are derived. A priori estimates and error analyses are provided. For certain regular triangulations, $\mathbb{O}(h^2 )$ convergence in the discrete $\mathbb{H}^1 $-norm is established. Also, finite difference schemes on triangular cell-centered grids with local refinement are derived with accuracy $\mathbb{O}(h^{1/2 + \alpha } )$, where $\alpha = 0$ for a simple symmetric scheme, $\alpha = 1$ for a nonsymmetric and a more accurate symmetric scheme, and $\alpha = \frac{3}{2}$ for a more accurate nonsymmetric scheme.Certain algebraic properties of the corresponding matrices of the derived finite difference schemes are verified, thus allowing the recently proposed algebraic theory for the Bramble–Ewing–Pasciak–Schatz (BEPS) and Fact Adaptive Composite (FAC) two-grid preconditioners to apply.Numerical experiments that demonstrate the accuracy of the difference schemes and the fast convergence of the two...

Journal ArticleDOI
TL;DR: The use of data-dependent triangulations that depend on the given function values at the data points is discussed, and some data- dependent criteria for optimizing a triangulation are presented and compared to the Delaunay criterion.
Abstract: Given a set V of data points in $R^2 $ with corresponding data values, the problem of adaptive piecewise polynomial approximation is to choose a subset of points of V, to create a triangulation of this subset, and to define a piecewise linear surface over the triangulation such that the deviation of this surface from the data set is no more than a prescribed error tolerance. A typical numerical scheme starts with some initial triangulation and adds more points (and triangles) as necessary until the resulting piecewise linear surface satisfies the error bound. In this paper two ingredients of such schemes are discussed. The first problem is that of constructing a suitable triangulation of a subset of points. The use of data-dependent triangulations that depend on the given function values at the data points is discussed, and some data-dependent criteria for optimizing a triangulation are presented and compared to the Delaunay criterion leading to the well-known Delaunay triangulation traditionally used for...

Journal ArticleDOI
TL;DR: A statistical analysis and the results of numerical experiments are presented to show that the numerical accuracy of this new implementation of TINVIT is competitive with that of the implementations of the divide-and-conquer and QL methods.
Abstract: The EISPACK routine TINVIT is an implementation of inverse iteration for computing eigenvectors of real symmetric tridiagonal matrices. Experiments have shown that the eigenvectors computed with TINVIT are numerically less accurate than those from implementations of Cuppen’s divide-and-conquer method (TREEQL) and of the QL method (TQL2). The loss of accuracy can be attributed to TINVIT’s choice of starting vectors and to its iteration stopping criterion.This paper introduces a new implementation of TINVIT that computes each eigenvector from a different random starting vector and performs an additional iteration after the stopping criterion is satisfied. A statistical analysis and the results of numerical experiments with matrices of order up to 525 are presented to show that the numerical accuracy of this new implementation is competitive with that of the implementations of the divide-and-conquer and QL methods. The extension of this implementation to larger order matrices is discussed, albeit in less detail.

Journal ArticleDOI
TL;DR: Schreiber and Van Loan as mentioned in this paper presented a modification of the block Householder method based on the compact WY representation, which is modified in order to introduce more matrix-matrix operations.
Abstract: This paper presents a modification of the block Householder method based on the compact WY representation [R. Schreiber and C. Van Loan, SIAM J. Sci. Statist. Comput., 10 (1989), pp. 52–57]. It is modified in order to introduce more matrix–matrix operations.

Journal ArticleDOI
TL;DR: This paper focuses on the application of the principle of judicious use of relatively few high accuracy computations to iterative linear methods, which can be similarly applied to a number of other “restarted” iterativelinear methods as well.
Abstract: Consideration of an abstract improvement algorithm leads to the following principle, which is similar to that underlying iterative refinement: By making judicious use of relatively few high accuracy computations, high accuracy solutions can be obtained very efficiently by the algorithm. This principle is applied specifically to ${\text{GMRES}}(m)$ here; it can be similarly applied to a number of other “restarted” iterative linear methods as well. Results are given for numerical experiments in solving a discretized linear elliptic boundary value problem and in computing a step of an inexact Newton method using finite differences for a discretized nonlinear elliptic boundary value problem.

Journal ArticleDOI
TL;DR: Various aspects of the moving mesh problem are investigated for the solution of partial differential equations (PDEs) in one space dimension, and it is shown that equidistribution implicitly corresponds to finding a solution to a PDE involving a new set of computational coordinates.
Abstract: Various aspects of the moving mesh problem are investigated for the solution of partial differential equations (PDEs) in one space dimension In particular, methods based (explicitly or implicitly) upon an equidistribution principle are studied It is shown that equidistribution implicitly corresponds to finding a solution to a PDE involving a new set of computational coordinates Implementation of a discrete version of equidistribution to compute a moving mesh corresponds to solving a weak form of the PDE The stability of equidistribution is discussed, and it is argued that stability can be significantly affected by the way in which this solution process is carried out Simple moving mesh methods are constructed using this framework, and numerical examples are given to illustrate their robustness

Journal ArticleDOI
TL;DR: A semicoarsening multigrid algorithm suitable for use on single instruction multiple data (SIMD) architectures has been implemented on the CM-2 and its actual performance is compared with its performance on some other machines, both parallel and nonparallel.
Abstract: A semicoarsening multigrid algorithm suitable for use on single instruction multiple data (SIMD) architectures has been implemented on the CM-2. The method performs well for strongly anisotropic problems and for problems with coefficients jumping by orders of magnitude across internal interfaces. The parallel efficiency of this method is analyzed, and its actual performance is compared with its performance on some other machines, both parallel and nonparallel.