scispace - formally typeset
Search or ask a question

Showing papers in "SIAM Journal on Scientific Computing in 2011"


Journal ArticleDOI
TL;DR: The new form gives a clear and convenient way to implement all basic operations efficiently, and the efficiency is demonstrated by the computation of the smallest eigenvalue of a 19-dimensional operator.
Abstract: A simple nonrecursive form of the tensor decomposition in $d$ dimensions is presented. It does not inherently suffer from the curse of dimensionality, it has asymptotically the same number of parameters as the canonical decomposition, but it is stable and its computation is based on low-rank approximation of auxiliary unfolding matrices. The new form gives a clear and convenient way to implement all basic operations efficiently. A fast rounding procedure is presented, as well as basic linear algebra operations. Examples showing the benefits of the decomposition are given, and the efficiency is demonstrated by the computation of the smallest eigenvalue of a 19-dimensional operator.

2,127 citations


Journal ArticleDOI
TL;DR: This paper proposes and investigates two classes of algorithms derived from either the primal or the dual form of $\ell_1$-problems, and presents numerical results to emphasize two practically important but perhaps overlooked points: that algorithm speed should be evaluated relative to appropriate solution accuracy; and that when erroneous measurements possibly exist, the $ell-1-fidelity should generally be preferable to the $\ell-2$-f fidelity.
Abstract: In this paper, we propose and study the use of alternating direction algorithms for several $\ell_1$-norm minimization problems arising from sparse solution recovery in compressive sensing, including the basis pursuit problem, the basis pursuit denoising problems of both unconstrained and constrained forms, and others. We present and investigate two classes of algorithms derived from either the primal or the dual form of $\ell_1$-problems. The construction of the algorithms consists of two main steps: (1) to reformulate an $\ell_1$-problem into one having blockwise separable objective functions by adding new variables and constraints; and (2) to apply an exact or inexact alternating direction method to the augmented Lagrangian function of the resulting problem. The derived alternating direction algorithms can be regarded as first-order primal-dual algorithms because both primal and dual variables are updated at every iteration. Convergence properties of these algorithms are established or restated when they already exist. Extensive numerical experiments are performed, using randomized partial Walsh-Hadamard sensing matrices, to demonstrate the versatility and effectiveness of the proposed approach. Moreover, we present numerical results to emphasize two practically important but perhaps overlooked points: (i) that algorithm speed should be evaluated relative to appropriate solution accuracy; and (ii) that when erroneous measurements possibly exist, the $\ell_1$-fidelity should generally be preferable to the $\ell_2$-fidelity.

983 citations


Journal ArticleDOI
TL;DR: This work presents scalable algorithms for parallel adaptive mesh refinement and coarsening (AMR), partitioning, and 2:1 balancing on computational domains composed of multiple connected two-dimensional quadtrees or three-dimensional octrees, referred to as a forest of octrees.
Abstract: We present scalable algorithms for parallel adaptive mesh refinement and coarsening (AMR), partitioning, and 2:1 balancing on computational domains composed of multiple connected two-dimensional quadtrees or three-dimensional octrees, referred to as a forest of octrees. By distributing the union of octants from all octrees in parallel, we combine the high scalability proven previously for adaptive single-octree algorithms with the geometric flexibility that can be achieved by arbitrarily connected hexahedral macromeshes, in which each macroelement is the root of an adapted octree. A key concept of our approach is an encoding scheme of the interoctree connectivity that permits arbitrary relative orientations between octrees. Based on this encoding we develop interoctree transformations of octants. These form the basis for high-level parallel octree algorithms, which are designed to interact with an application code such as a numerical solver for partial differential equations. We have implemented and tested these algorithms in the p4est software library. We demonstrate the parallel scalability of p4est on its own and in combination with two geophysics codes. Using p4est we generate and adapt multioctree meshes with up to $5.13\times10^{11}$ octants on as many as 220,320 CPU cores and execute the 2:1 balance algorithm in less than 10 seconds per million octants per process.

648 citations


Journal ArticleDOI
TL;DR: It is shown that the sums of the form $\sum_{k=0}^p \varphi_k(A)u_k$ that arise in exponential integrators can be expressed in terms of a single exponential of a matrix of dimension $n+p$ built by augmenting $A$ with additional rows and columns, and the algorithm of this paper can be employed.
Abstract: A new algorithm is developed for computing $e^{tA}B$, where $A$ is an $n\times n$ matrix and $B$ is $n\times n_0$ with $n_0 \ll n$. The algorithm works for any $A$, its computational cost is dominated by the formation of products of $A$ with $n\times n_0$ matrices, and the only input parameter is a backward error tolerance. The algorithm can return a single matrix $e^{tA}B$ or a sequence $e^{t_kA}B$ on an equally spaced grid of points $t_k$. It uses the scaling part of the scaling and squaring method together with a truncated Taylor series approximation to the exponential. It determines the amount of scaling and the Taylor degree using the recent analysis of Al-Mohy and Higham [SIAM J. Matrix Anal. Appl., 31 (2009), pp. 970-989], which provides sharp truncation error bounds expressed in terms of the quantities $\|A^k\|^{1/k}$ for a few values of $k$, where the norms are estimated using a matrix norm estimator. Shifting and balancing are used as preprocessing steps to reduce the cost of the algorithm. Numerical experiments show that the algorithm performs in a numerically stable fashion across a wide range of problems, and analysis of rounding errors and of the conditioning of the problem provides theoretical support. Experimental comparisons with MATLAB codes based on Krylov subspace, Chebyshev polynomial, and Laguerre polynomial methods show the new algorithm to be sometimes much superior in terms of computational cost and accuracy. An important application of the algorithm is to exponential integrators for ordinary differential equations. It is shown that the sums of the form $\sum_{k=0}^p \varphi_k(A)u_k$ that arise in exponential integrators, where the $\varphi_k$ are related to the exponential function, can be expressed in terms of a single exponential of a matrix of dimension $n+p$ built by augmenting $A$ with additional rows and columns, and the algorithm of this paper can therefore be employed.

478 citations


Journal ArticleDOI
TL;DR: This paper focuses on an RBF-QR formulation for node sets in one, two, and three dimensions that is stable for arbitrarily small shape parameters.
Abstract: Radial basis function (RBF) approximation is an extremely powerful tool for representing smooth functions in nontrivial geometries since the method is mesh-free and can be spectrally accurate. A perceived practical obstacle is that the interpolation matrix becomes increasingly ill-conditioned as the RBF shape parameter becomes small, corresponding to flat RBFs. Two stable approaches that overcome this problem exist: the Contour-Pade method and the RBF-QR method. However, the former is limited to small node sets, and the latter has until now been formulated only for the surface of the sphere. This paper focuses on an RBF-QR formulation for node sets in one, two, and three dimensions. The algorithm is stable for arbitrarily small shape parameters. It can be used for thousands of node points in two dimensions and still more in three dimensions. A sample MATLAB code for the two-dimensional case is provided.

377 citations


Journal ArticleDOI
TL;DR: A novel algorithm for NMF based on the ANLS framework that builds upon the block principal pivoting method for the nonnegativity-constrained least squares problem that overcomes a limitation of the active set method is presented.
Abstract: Nonnegative matrix factorization (NMF) is a dimension reduction method that has been widely used for numerous applications, including text mining, computer vision, pattern discovery, and bioinformatics. A mathematical formulation for NMF appears as a nonconvex optimization problem, and various types of algorithms have been devised to solve the problem. The alternating nonnegative least squares (ANLS) framework is a block coordinate descent approach for solving NMF, which was recently shown to be theoretically sound and empirically efficient. In this paper, we present a novel algorithm for NMF based on the ANLS framework. Our new algorithm builds upon the block principal pivoting method for the nonnegativity-constrained least squares problem that overcomes a limitation of the active set method. We introduce ideas that efficiently extend the block principal pivoting method within the context of NMF computation. Our algorithm inherits the convergence property of the ANLS framework and can easily be extended to other constrained NMF formulations. Extensive computational comparisons using data sets that are from real life applications as well as those artificially generated show that the proposed algorithm provides state-of-the-art performance in terms of computational speed.

352 citations


Journal ArticleDOI
TL;DR: An iterative method LSMR is presented for solving linear systems $Ax=b$ and least-squares problems $\min \|Ax-b\|_2$ with $A$ being sparse or a fast linear operator, and it is observed in practice that compared to LSQR it is safer to terminate L SMR early.
Abstract: An iterative method LSMR is presented for solving linear systems $Ax=b$ and least-squares problems $\min \|Ax-b\|_2$, with $A$ being sparse or a fast linear operator. LSMR is based on the Golub-Kahan bidiagonalization process. It is analytically equivalent to the MINRES method applied to the normal equation $A^T\! Ax = A^T\! b$, so that the quantities $\|A^T\! r_k\|$ are monotonically decreasing (where $r_k = b - Ax_k$ is the residual for the current iterate $x_k$). We observe in practice that $\|r_k\|$ also decreases monotonically, so that compared to LSQR (for which only $\|r_k\|$ is monotonic) it is safer to terminate LSMR early. We also report some experiments with reorthogonalization.

339 citations


Journal ArticleDOI
TL;DR: The proposed two-step online method for interpolating projection-based linear parametric reduced-order models (ROMs) is illustrated by applications in mechanical and aeronautical engineering and its robustness is demonstrated by its ability to handle the case where the sampled parameter set values exhibit a mode veering phenomenon.
Abstract: A two-step online method is proposed for interpolating projection-based linear parametric reduced-order models (ROMs) in order to construct a new ROM for a new set of parameter values. The first step of this method transforms each precomputed ROM into a consistent set of generalized coordinates. The second step interpolates the associated linear operators on their appropriate matrix manifold. Real-time performance is achieved by precomputing inner products between the reduced-order bases underlying the precomputed ROMs. The proposed method is illustrated by applications in mechanical and aeronautical engineering. In particular, its robustness is demonstrated by its ability to handle the case where the sampled parameter set values exhibit a mode veering phenomenon.

301 citations


Journal ArticleDOI
TL;DR: This work adapts one of these randomized methods for principal component analysis (PCA) for use with data sets that are too large to be stored in random-access memory (RAM), and reports on the performance of the algorithm.
Abstract: Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy—even on parallel processors—unlike the classical (deterministic) alternatives. We adapt one of these randomized methods for use with data sets that are too large to be stored in random-access memory (RAM). (The traditional terminology is that our procedure works efficiently out-of-core.) We illustrate the performance of the algorithm via several numerical examples. For example, we report on the PCA of a data set stored on disk that is so large that less than a hundredth of it can fit in our computer's RAM.

281 citations


Journal ArticleDOI
TL;DR: This work provides a unifying projection-based framework for structure-preserving interpolatory model reduction of parameterized linear dynamical systems, i.e., systems having a structured dependence on parameters that the authors wish to retain in the reduced-order model.
Abstract: We provide a unifying projection-based framework for structure-preserving interpolatory model reduction of parameterized linear dynamical systems, ie, systems having a structured dependence on parameters that we wish to retain in the reduced-order model The parameter dependence may be linear or nonlinear and is retained in the reduced-order model Moreover, we are able to give conditions under which the gradient and Hessian of the system response with respect to the system parameters is matched in the reduced-order model We provide a systematic approach built on established interpolatory $\mathcal{H}_2$ optimal model reduction methods that will produce parameterized reduced-order models having high fidelity throughout a parameter range of interest For single input/single output systems with parameters in the input/output maps, we provide reduced-order models that are optimal with respect to an $\mathcal{H}_2\otimes\mathcal{L}_2$ joint error measure The capabilities of these approaches are illustrated by several numerical examples from technical applications

218 citations


Journal ArticleDOI
TL;DR: The novelty of the proposed numerical schemes is that, using either the finite difference method or the Laplace transform to handle the Caputo time fractional derivative, the solution of the TSFDE-2D is written in terms of a matrix function vector product at each time step, where $\mathbf{b}$ is a suitably defined vector.
Abstract: In this paper, a time-space fractional diffusion equation in two dimensions (TSFDE-2D) with homogeneous Dirichlet boundary conditions is considered. The TSFDE-2D is obtained from the standard diffusion equation by replacing the first-order time derivative with the Caputo fractional derivative $_{t}D_{*}^{\gamma}$, $\gamma\in(0,1)$, and the second-order space derivatives with the fractional Laplacian $-(-\Delta)^{\alpha/2}$, $\alpha\in(1,2]$. Using the matrix transfer technique proposed by Ilic et al. [M. Ilic, F. Liu, I. Turner, and V. Anh, Fract. Calc. Appl. Anal., 9 (2006), pp. 333-349], the TSFDE-2D is transformed into a time fractional differential system as $_{t}D_{*}^{\gamma}\mathbf{u} = -K_\alpha\mathbf{A}^{\alpha/2}\mathbf{u}$, where $\mathbf{A}$ is the approximate matrix representation of $(-\Delta)$. Traditional approximation of $\mathbf{A}^{\alpha/2}$ requires diagonalization of $\mathbf{A}$, which is very time-consuming for large sparse matrices. The novelty of our proposed numerical schemes is that, using either the finite difference method or the Laplace transform to handle the Caputo time fractional derivative, the solution of the TSFDE-2D is written in terms of a matrix function vector product $f(\mathbf{A})\mathbf{b}$ at each time step, where $\mathbf{b}$ is a suitably defined vector. Depending on the method used to generate the matrix $\mathbf{A}$, the product $f(\mathbf{A})\mathbf{b}$ can be approximated using either the preconditioned Lanczos method when $\mathbf{A}$ is symmetric or the $\mathbf{M}$-Lanzcos method when $\mathbf{A}$ is nonsymmetric, which are powerful techniques for solving large linear systems. We give error bounds for the new methods and illustrate their roles in solving the TSFDE-2D. We also derive the analytical solution of the TSFDE-2D in terms of the Mittag-Leffler function. Finally, numerical results are presented to verify the proposed numerical solution strategies.

Journal ArticleDOI
TL;DR: It is demonstrated that the number of forward PDE solves required for an accurate low-rank approximation is independent of the problem dimension, which permits scalable estimation of the uncertainty in large-scale ill-posed linear inverse problems at a small multiple of the cost of solving the forward problem.
Abstract: We consider the problem of estimating the uncertainty in large-scale linear statistical inverse problems with high-dimensional parameter spaces within the framework of Bayesian inference. When the noise and prior probability densities are Gaussian, the solution to the inverse problem is also Gaussian and is thus characterized by the mean and covariance matrix of the posterior probability density. Unfortunately, explicitly computing the posterior covariance matrix requires as many forward solutions as there are parameters and is thus prohibitive when the forward problem is expensive and the parameter dimension is large. However, for many ill-posed inverse problems, the Hessian matrix of the data misfit term has a spectrum that collapses rapidly to zero. We present a fast method for computation of an approximation to the posterior covariance that exploits the low-rank structure of the preconditioned (by the prior covariance) Hessian of the data misfit. Analysis of an infinite-dimensional model convection-diffusion problem, and numerical experiments on large-scale three-dimensional convection-diffusion inverse problems with up to 1.5 million parameters, demonstrate that the number of forward PDE solves required for an accurate low-rank approximation is independent of the problem dimension. This permits scalable estimation of the uncertainty in large-scale ill-posed linear inverse problems at a small multiple (independent of the problem dimension) of the cost of solving the forward problem.

Journal ArticleDOI
TL;DR: It is found that the use of the time adaptivity cannot only resolve the steady-state solutions but also the dynamical changes of the solution accurately and efficiently.
Abstract: This paper is concerned with the numerical simulations for the dynamics of the molecular beam epitaxy (MBE) model. The numerical simulations of the MBE models require long time computations, and therefore large time-stepping methods become necessary. In this work, we consider some unconditionally energy stable finite difference schemes, which will be used in the time adaptivity strategies. It is found that the use of the time adaptivity cannot only resolve the steady-state solutions but also the dynamical changes of the solution accurately and efficiently. The adaptive time step is selected based on the energy variation or the change of the roughness of the solution. The numerical experiments demonstrated that the CPU time is significantly saved for long time simulations.

Journal ArticleDOI
TL;DR: A novel hybrid sequential design strategy is proposed which uses a Monte Carlo-based approximation of a Voronoi tessellation for exploration and local linear approximations of the simulator for exploitation, and can be used in heterogeneous modeling environments, where multiple model types are used at the same time.
Abstract: Many complex real-world systems can be accurately modeled by simulations. However, high-fidelity simulations may take hours or even days to compute. Because this can be impractical, a surrogate model is often used to approximate the dynamic behavior of the original simulator. This model can then be used as a cheap, drop-in replacement for the simulator. Because simulations can be very expensive, the data points, which are required to build the model, must be chosen as optimally as possible. Sequential design strategies offer a huge advantage over one-shot experimental designs because they can use information gathered from previous data points in order to determine the location of new data points. Each sequential design strategy must perform a trade-off between exploration and exploitation, where the former involves selecting data points in unexplored regions of the design space, while the latter suggests adding data points in regions which were previously identified to be interesting (for example, highly nonlinear regions). In this paper, a novel hybrid sequential design strategy is proposed which uses a Monte Carlo-based approximation of a Voronoi tessellation for exploration and local linear approximations of the simulator for exploitation. The advantage of this method over other sequential design methods is that it is independent of the model type, and can therefore be used in heterogeneous modeling environments, where multiple model types are used at the same time. The new method is demonstrated on a number of test problems, showing that it is a robust, competitive, and efficient sequential design strategy.

Journal ArticleDOI
TL;DR: Numerical illustrations for the $M$-dimensional parametric elliptic PDEs resulting from sPDEs on parameter spaces of dimensions $M\leq100$ indicate the advantages of employing low-rank tensor-structured matrix formats in the numerical solution of such problems.
Abstract: We investigate the convergence rate of approximations by finite sums of rank-1 tensors of solutions of multiparametric elliptic PDEs. Such PDEs arise, for example, in the parametric, deterministic reformulation of elliptic PDEs with random field inputs, based, for example, on the $M$-term truncated Karhunen-Loeve expansion. Our approach could be regarded as either a class of compressed approximations of these solutions or as a new class of iterative elliptic problem solvers for high-dimensional, parametric, elliptic PDEs providing linear scaling complexity in the dimension $M$ of the parameter space. It is based on rank-reduced, tensor-formatted separable approximations of the high-dimensional tensors and matrices involved in the iterative process, combined with the use of spectrally equivalent low-rank tensor-structured preconditioners to the parametric matrices resulting from a finite element discretization of the high-dimensional parametric, deterministic problems. Numerical illustrations for the $M$-dimensional parametric elliptic PDEs resulting from sPDEs on parameter spaces of dimensions $M\leq100$ indicate the advantages of employing low-rank tensor-structured matrix formats in the numerical solution of such problems.

Journal ArticleDOI
TL;DR: The Gauss DGSEM is typically more accurate than the Gauss-Lobatto variant, needing fewer points per wavelength for a given accuracy while on the other hand being more restricted in the explicit time step choice.
Abstract: We examine the dispersion and dissipation properties of the Gauss and Gauss-Lobatto discontinuous Galerkin spectral element methods (DGSEMs), for linear wave propagation problems. We show that the inherent underintegration in the Gauss-Lobatto variant can be interpreted as a modal filtering of the highest polynomial mode. This in turn has a drastic impact on the dispersion and dissipation relations of the Gauss-Lobatto DGSEM compared to the Gauss variant. We show that the Gauss DGSEM is typically more accurate than the Gauss-Lobatto variant, needing fewer points per wavelength for a given accuracy while on the other hand being more restricted in the explicit time step choice. We show that the spectra of the DGSEM operators depend on the boundary conditions applied and that the ratio of the time step restrictions of the two schemes depends on the choice of boundary conditions.

Journal ArticleDOI
TL;DR: In this paper, a general procedure for constructing conservative numerical integrators for time-dependent partial differential equations is presented, which allows for an arbitrary number of dependent and independent variables with derivatives of any order.
Abstract: A general procedure for constructing conservative numerical integrators for time-dependent partial differential equations is presented. In particular, linearly implicit methods preserving a time discretized version of the invariant are developed for systems of partial differential equations with polynomial nonlinearities. The framework is rather general and allows for an arbitrary number of dependent and independent variables with derivatives of any order. It is proved formally that second order convergence is obtained. The procedure is applied to a test case, and numerical experiments are provided.

Journal ArticleDOI
TL;DR: This work introduces a class of parallel preconditioners for the FSI problem obtained by exploiting the block-structure of the linear system and shows that the construction and evaluation of the devised preconditionser is modular.
Abstract: The increasing computational load required by most applications and the limits in hardware performances affecting scientific computing contributed in the last decades to the development of parallel software and architectures. In fluid-structure interaction (FSI) for haemodynamic applications, parallelization and scalability are key issues (see [L. Formaggia, A. Quarteroni, and A. Veneziani, eds., Cardiovascular Mathematics: Modeling and Simulation of the Circulatory System, Modeling, Simulation and Applications 1, Springer, Milan, 2009]). In this work we introduce a class of parallel preconditioners for the FSI problem obtained by exploiting the block-structure of the linear system. We stress the possibility of extending the approach to a general linear system with a block-structure, then we provide a bound in the condition number of the preconditioned system in terms of the conditioning of the preconditioned diagonal blocks, and finally we show that the construction and evaluation of the devised preconditioner is modular. The preconditioners are tested on a benchmark three-dimensional (3D) geometry discretized in both a coarse and a fine mesh, as well as on two physiological aorta geometries. The simulations that we have performed show an advantage in using the block preconditioners introduced and confirm our theoretical results.

Journal ArticleDOI
TL;DR: It is shown, in particular, that the popular hybrid GS algorithm has multigrid smoothing properties which are independent of the number of processors in many practical applications, provided that the problem size per processor is large enough.
Abstract: This paper investigates the properties of smoothers in the context of algebraic multigrid (AMG) running on parallel computers with potentially millions of processors. The development of multigrid smoothers in this case is challenging, because some of the best relaxation schemes, such as the Gauss-Seidel (GS) algorithm, are inherently sequential. Based on the sharp two-grid multigrid theory from [R. D. Falgout and P. S. Vassilevski, SIAM J. Numer. Anal., 42 (2004), pp. 1669-1693] and [R. D. Falgout, P. S. Vassilevski, and L. T. Zikatanov, Numer. Linear Algebra Appl., 12 (2005), pp. 471-494] we characterize the smoothing properties of a number of practical candidates for parallel smoothers, including several $C$-$F$, polynomial, and hybrid schemes. We show, in particular, that the popular hybrid GS algorithm has multigrid smoothing properties which are independent of the number of processors in many practical applications, provided that the problem size per processor is large enough. This is encouraging news for the scalability of AMG on ultraparallel computers. We also introduce the more robust $\ell_1$ smoothers, which are always convergent and have already proven essential for the parallel solution of some electromagnetic problems [T. Kolev and P. Vassilevski, J. Comput. Math., 27 (2009), pp. 604-623].

Journal ArticleDOI
TL;DR: This work develops an algebraic multigrid (AMG) setup scheme based on the bootstrap framework for multiscale scientific computation that uses a weighted least squares definition of interpolation, based on a set of test vectors computed by a bootstrap setup cycle and then improved by aMultigrid eigensolver and a local residual-based adaptive relaxation process.
Abstract: We develop an algebraic multigrid (AMG) setup scheme based on the bootstrap framework for multiscale scientific computation. Our approach uses a weighted least squares definition of interpolation, based on a set of test vectors that are computed by a bootstrap setup cycle and then improved by a multigrid eigensolver and a local residual-based adaptive relaxation process. To emphasize the robustness, efficiency, and flexibility of the individual components of the proposed approach, we include extensive numerical results of the method applied to scalar elliptic partial differential equations discretized on structured meshes. As a first test problem, we consider the Laplace equation discretized on a uniform quadrilateral mesh, a problem for which multigrid is well understood. Then, we consider various more challenging variable coefficient systems coming from covariant finite-difference approximations of the two-dimensional gauge Laplacian system, a commonly used model problem in AMG algorithm development for linear systems arising in lattice field theory computations.

Journal ArticleDOI
TL;DR: This paper presents an algorithm to generate, store, and traverse a hierarchy of Cartesian grids represented by a $(k=3)-spacetree, a generalization of the well-known octree concept, and it also shows the correctness of the approach.
Abstract: Almost all approaches to solving partial differential equations (PDEs) are based upon a spatial discretization of the computational domain—a grid. This paper presents an algorithm to generate, store, and traverse a hierarchy of $d$-dimensional Cartesian grids represented by a $(k=3)$-spacetree, a generalization of the well-known octree concept, and it also shows the correctness of the approach. These grids may change their adaptive structure throughout the traversal. The algorithm uses $2d+4$ stacks as data structures for both cells and vertices, and the storage requirements for the pure grid reduce to one bit per vertex for both the complete grid connectivity structure and the multilevel grid relations. Since the traversal algorithm uses only stacks, the algorithm's cache hit rate is continually higher than 99.9 percent, and the runtime per vertex remains almost constant; i.e., it does not depend on the overall number of vertices or the adaptivity pattern. We use the algorithmic approach as the fundamental concept for a mesh management for $d$-dimensional PDEs and for a matrix-free PDE solver represented by a compact discrete $3^d$-point operator. In the latter case, one can implement a Jacobi smoother, a Krylov solver, or a geometric multigrid scheme within the presented traversal scheme which inherits the low memory requirements and the good memory access characteristics directly.

Journal ArticleDOI
TL;DR: A computational framework is presented for the simulation of eukaryotic cell migration and chemotaxis and the simulated behavior is compared with experimental results of migrating Dictyostelium discoideum cells.
Abstract: A computational framework is presented for the simulation of eukaryotic cell migration and chemotaxis. An empirical pattern formation model, based on a system of nonlinear reaction-diffusion equations, is approximated on an evolving cell boundary using an arbitrary Lagrangian Eulerian surface finite element method (ALE-SFEM). The solution state is used to drive a mechanical model of the protrusive and retractive forces exerted on the cell boundary. Movement of the cell is achieved using a level set method. Results are presented for cell migration with and without chemotaxis. The simulated behavior is compared with experimental results of migrating Dictyostelium discoideum cells.

Journal ArticleDOI
TL;DR: This work derives preconditioned MINRES-QLP, new stopping rules, and better estimates of the solution and residual norms, the matrix norm, and the condition number.
Abstract: CG, SYMMLQ, and MINRES are Krylov subspace methods for solving symmetric systems of linear equations. When these methods are applied to an incompatible system (that is, a singular symmetric least-squares problem), CG could break down and SYMMLQ's solution could explode, while MINRES would give a least-squares solution but not necessarily the minimum-length (pseudoinverse) solution. This understanding motivates us to design a MINRES-like algorithm to compute minimum-length solutions to singular symmetric systems. MINRES uses QR factors of the tridiagonal matrix from the Lanczos process (where $R$ is upper-tridiagonal). MINRES-QLP uses a QLP decomposition (where rotations on the right reduce $R$ to lower-tridiagonal form). On ill-conditioned systems (singular or not), MINRES-QLP can give more accurate solutions than MINRES. We derive preconditioned MINRES-QLP, new stopping rules, and better estimates of the solution and residual norms, the matrix norm, and the condition number.

Journal ArticleDOI
TL;DR: The development and validation of a modified simulation procedure which allows more accurate calculations with a smaller mean number of particles in the grid cells, making the modified DSMC method an effective numerical tool for both steady and unsteady gas flow calculations on fine multidimensional grids.
Abstract: The direct simulation Monte Carlo (DSMC) analysis of two- and three-dimensional rarefied gas flows requires computational resources of very large proportions. One of the major causes for this is that, along with the multidimensional computational mesh, the standard DSMC approach also requires a large number of particles in each cell of the mesh in order to obtain sufficiently accurate results. This paper presents the development and validation of a modified simulation procedure which allows more accurate calculations with a smaller mean number of particles ($\langle N\rangle\sim1$) in the grid cells. In the new algorithm, the standard DSMC collision scheme is replaced by a two-step collision procedure based on “Bernoulli trials” scheme (or its simplified version proposed by the author), which is applied twice to the cells (or subcells) of a dual grid within a time step. The modified algorithm uses a symmetric Strang splitting scheme that improves the accuracy of the splitting scheme to $O(\tau^2)$ with respect to the time step $\tau$, making the modified DSMC method an effective numerical tool for both steady and unsteady gas flow calculations on fine multidimensional grids. The latter is particularly important for simulation of vortical and unstable rarefied gas flows. The modified simulation scheme might also be useful for DSMC calculations within the subcell areas of a multilevel computational grid.

Journal ArticleDOI
TL;DR: The tensor-truncated version of the SCF iteration using the traditional direct inversion in the iterative subspace scheme enhanced by the multilevel acceleration with the grid-dependent termination criteria at each discretization level implies that the overall computational cost scales almost linearly in the univariate problem size $n$.
Abstract: In this paper, we describe a novel method for a robust and accurate iterative solution of the self-consistent Hartree-Fock equation in $\mathbb{R}^3$ based on the idea of tensor-structured computation of the electron density and the nonlinear Hartree and (nonlocal) exchange operators at all steps of the iterative process. We apply the self-consistent field (SCF) iteration to the Galerkin discretization in a set of low separation rank basis functions that are solely specified by the respective values on a three-dimensional Cartesian grid. The approximation error is estimated by $O(h^3)$, where $h=O(n^{-1})$ is the mesh size of an $n\times n\times n$ tensor grid, while the numerical complexity to compute the Galerkin matrices scales linearly in $n\log n$. We propose the tensor-truncated version of the SCF iteration using the traditional direct inversion in the iterative subspace scheme enhanced by the multilevel acceleration with the grid-dependent termination criteria at each discretization level. This implies that the overall computational cost scales almost linearly in the univariate problem size $n$. Numerical illustrations are presented for the all electron case of H$_2$O and the pseudopotential case of CH$_4$ and CH$_3$OH molecules. The proposed scheme is not restricted to a priori given rank-1 basis sets, allowing analytically integrable convolution transform with the Newton kernel that opens further perspectives for promotion of the tensor-structured methods in computational quantum chemistry.

Journal ArticleDOI
TL;DR: The elements are based on Bernstein polynomials and are the first to achieve optimal complexity for the standard finite element spaces on simplicial elements.
Abstract: Algorithms are presented that enable the element matrices for the standard finite element space, consisting of continuous piecewise polynomials of degree $n$ on simplicial elements in $\mathbb{R}^d$, to be computed in optimal complexity $\mathcal{O}(n^{2d})$. The algorithms (i) take into account numerical quadrature; (ii) are applicable to nonlinear problems; and (iii) do not rely on precomputed arrays containing values of one-dimensional basis functions at quadrature points (although these can be used if desired). The elements are based on Bernstein polynomials and are the first to achieve optimal complexity for the standard finite element spaces on simplicial elements.

Journal ArticleDOI
TL;DR: This work construct coarse-grid space using the low-frequency modes of the subdomain Dirichlet-to-Neumann maps and apply the obtained two-level preconditioners to the extended or the original linear system arising from an overlapping domain decomposition.
Abstract: Coarse-grid correction is a key ingredient of scalable domain decomposition methods. In this work we construct coarse-grid space using the low-frequency modes of the subdomain Dirichlet-to-Neumann maps and apply the obtained two-level preconditioners to the extended or the original linear system arising from an overlapping domain decomposition. Our method is suitable for parallel implementation, and its efficiency is demonstrated by numerical examples on problems with large heterogeneities for both manual and automatic partitionings.

Journal ArticleDOI
TL;DR: This paper considers an iterative process that smooths an associated value for nearby vertices, and presents a measure of the local connection strength, called the algebraic distance, which is attractive in that the process is simple, linear, and easily parallelized.
Abstract: Measuring the connection strength between a pair of vertices in a graph is one of the most important concerns in many graph applications. Simple measures such as edge weights may not be sufficient for capturing the effects associated with short paths of lengths greater than one. In this paper, we consider an iterative process that smooths an associated value for nearby vertices, and we present a measure of the local connection strength (called the algebraic distance; see [D. Ron, I. Safro, and A. Brandt, Multiscale Model. Simul., 9 (2011), pp. 407-423]) based on this process. The proposed measure is attractive in that the process is simple, linear, and easily parallelized. An analysis of the convergence property of the process reveals that the local neighborhoods play an important role in determining the connectivity between vertices. We demonstrate the practical effectiveness of the proposed measure through several combinatorial optimization problems on graphs and hypergraphs.

Journal ArticleDOI
TL;DR: It is shown that superconvergent functional estimates remain viable in the presence of curvilinear multiblock grids with interfaces and that functional estimates can be constructed that are $2s-order accurate.
Abstract: Diagonal-norm summation-by-parts (SBP) operators can be used to construct time-stable high-order accurate finite-difference schemes. However, to achieve both stability and accuracy, these operators must use $s$-order accurate boundary closures when the interior scheme is $2s$-order accurate. The boundary closure limits the solution to $(s+1)$-order global accuracy. Despite this bound on solution accuracy, we show that functional estimates can be constructed that are $2s$-order accurate. This superconvergence requires dual-consistency, which depends on the SBP operators, the boundary condition implementation, and the discretized functional. The theory is developed for scalar hyperbolic and elliptic partial differential equations in one dimension. In higher dimensions, we show that superconvergent functional estimates remain viable in the presence of curvilinear multiblock grids with interfaces. The generality of the theoretical results is demonstrated using a two-dimensional Poisson problem and a nonlinear hyperbolic system—the Euler equations of fluid mechanics.

Journal ArticleDOI
TL;DR: This paper develops some truly implementable inexact ADMs whose inexactness criteria controlling the accuracy of the ADM subproblems are easily implementable and the convergence of the new inexacts ADMs will be proved.
Abstract: In the image processing community, there have recently been many restoration and reconstruction problems that can be reformulated into linearly constrained convex programming models whose objective functions have separable structures. These favorable reformulations have promoted impressive applications of the alternating direction method (ADM) in the field of image processing. At each iteration, the computation of ADM is dominated by solving two subproblems exactly. However, in many restoration and reconstruction applications, it is either impossible or extremely expensive to obtain exact solutions of these ADM subproblems. This fact urges the development on inexact versions of ADM, which allow the generated ADM subproblems to be solved approximately subject to certain inexactness criteria. In this paper, we develop some truly implementable inexact ADMs whose inexactness criteria controlling the accuracy of the ADM subproblems are easily implementable. The convergence of the new inexact ADMs will be proved. Numerical results on several image processing problems will be given to illustrate the effectiveness of the proposed inexact ADMs.