scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Solving Lattice QCD systems of equations using mixed precision solvers on GPUs

01 Sep 2010-Computer Physics Communications (North-Holland)-Vol. 181, Iss: 9, pp 1517-1528
TL;DR: A new mixed precision approach for Krylov solvers using reliable updates is developed which allows for full double precision accuracy while using only single or half precision arithmetic for the bulk of the computation.
About: This article is published in Computer Physics Communications.The article was published on 2010-09-01 and is currently open access. It has received 422 citations till now. The article focuses on the topics: Extended precision & General-purpose computing on graphics processing units.
Citations
More filters
Journal ArticleDOI
TL;DR: The rapid evolution of GPU architectures-from graphics processors to massively parallel many-core multiprocessors, recent developments in GPU computing architectures, and how the enthusiastic adoption of CPU+GPU coprocessing is accelerating parallel applications are described.
Abstract: GPU computing is at a tipping point, becoming more widely used in demanding consumer applications and high-performance computing. This article describes the rapid evolution of GPU architectures-from graphics processors to massively parallel many-core multiprocessors, recent developments in GPU computing architectures, and how the enthusiastic adoption of CPU+GPU coprocessing is accelerating parallel applications.

962 citations


Cites methods from "Solving Lattice QCD systems of equa..."

  • ...Table 3 lists some representative applications along with the runtime speedups obtained for the whole application using CPUþGPU coprocessing over CPU alone, as measured by application developers.(12-22) The speedups using GeForce 8800, Tesla T8, GeForce GTX 280, Tesla T10, and GeForce GTX 285 range from 9 to more than 130 , with the higher speedups reflecting applications where more of the work ran in parallel on the GPU....

    [...]

Journal ArticleDOI
TL;DR: In this article, a spectrum of highly excited charmonium mesons up to around 4.5 GeV calculated using dynamical lattice QCD is presented, and the results in light of experimental observations, identify the lightest "supermultiplet" of hybrid mesons and comment on the phenomenological implications of the spectrum of exotic mesons.
Abstract: We present a spectrum of highly excited charmonium mesons up to around 4.5 GeV calculated using dynamical lattice QCD. Employing novel computational techniques and the variational method with a large basis of carefully constructed operators, we extract and reliably identify the continuum spin of an extensive set of excited states, states with exotic quantum numbers (0+−, 1−+, 2+−) and states with high spin. Calculations are performed on two lattice volumes with pion mass ≈ 400 MeV and the mass determinations have high statistical precision even for excited states. We discuss the results in light of experimental observations, identify the lightest ‘supermultiplet’ of hybrid mesons and comment on the phenomenological implications of the spectrum of exotic mesons.

221 citations


Cites methods from "Solving Lattice QCD systems of equa..."

  • ...the single-particle spectrum of charmonium up to 4:5 GeV is a timely contribution to this eort. Acknowledgments We thank our colleagues within the Hadron Spectrum Collaboration. Chroma [46] and QUDA [47,48] were used to perform this work on the Lonsdale cluster maintained by the Trinity Centre for High Performance Computing funded through grants from Science Foundation Ireland (SFI), at the SFI/HEA Iris...

    [...]

Journal ArticleDOI
TL;DR: In this paper, an excited spectrum of isoscalar mesons using lattice QCD is reported, with a range of light quark masses corresponding to pion masses down to $\ensuremath{\sim}400\text{ }\text { }\mathrm{MeV}$.
Abstract: We report on the extraction of an excited spectrum of isoscalar mesons using lattice QCD. Calculations on several lattice volumes are performed with a range of light quark masses corresponding to pion masses down to $\ensuremath{\sim}400\text{ }\text{ }\mathrm{MeV}$. The distillation method enables us to evaluate the required disconnected contributions with high statistical precision for a large number of meson interpolating fields. We find relatively little mixing between $\frac{1}{\sqrt{2}}(u\overline{u}+d\overline{d})$ and $s\overline{s}$ in most ${J}^{PC}$ channels; one notable exception is the pseudoscalar sector where the approximate $SU(3{)}_{F}$ octet, singlet structure of the $\ensuremath{\eta}$, ${\ensuremath{\eta}}^{\ensuremath{'}}$ is reproduced. We extract exotic ${J}^{PC}$ states, identified as hybrid mesons in which an excited gluonic field is coupled to a color-octet $q\overline{q}$ pair, along with nonexotic hybrid mesons embedded in a $q\overline{q}$-like spectrum.

169 citations

Journal ArticleDOI
TL;DR: Using a new quark-field construction algorithm and a large variational basis of operators, a highly excited isovector meson spectrum is extracted on dynamical anisotropic lattices, including, for the first time in lattice QCD, spin-four states.
Abstract: Using a new quark-field construction algorithm and a large variational basis of operators, we extract a highly excited isovector meson spectrum on dynamical anisotropic lattices. We show how carefully constructed operators can be used to reliably identify the continuum spin of extracted states, overcoming the reduced cubic symmetry of the lattice. Using this method we extract, with confidence, excited states, states with exotic quantum numbers (${0}^{+\ensuremath{-}}$, ${1}^{\ensuremath{-}+}$, and ${2}^{+\ensuremath{-}}$), and states of high spin, including, for the first time in lattice QCD, spin-four states.

146 citations


Cites methods from "Solving Lattice QCD systems of equa..."

  • ...Part of this work used the CUDA GPU implementation of a mixed-precision iterative linear system solver for the Dirac equation by Michael Clark and Ronald Babich [19]....

    [...]

Journal ArticleDOI
TL;DR: In this article, the phase shifts for the spin-triplet and spin-singlet channels were computed using lattice quantum chromo-dynamics using the Luscher finite-volume formalism.

109 citations

References
More filters
Nathan Bell1, Michael Garland1
01 Jan 2008
TL;DR: Data structures and algorithms for SpMV that are eciently implemented on the CUDA platform for the ne-grained parallel architecture of the GPU and develop methods to exploit several common forms of matrix structure while oering alternatives which accommodate greater irregularity are developed.
Abstract: The massive parallelism of graphics processing units (GPUs) oers tremendous performance in many high-performance computing applications. While dense linear algebra readily maps to such platforms, harnessing this potential for sparse matrix computations presents additional challenges. Given its role in iterative methods for solving sparse linear systems and eigenvalue problems, sparse matrix-vector multiplication (SpMV) is of singular importance in sparse linear algebra. In this paper we discuss data structures and algorithms for SpMV that are eciently implemented on the CUDA platform for the ne-grained parallel architecture of the GPU. Given the memory-bound nature of SpMV, we emphasize memory bandwidth eciency and compact storage formats. We consider a broad spectrum of sparse matrices, from those that are well-structured and regular to highly irregular matrices with large imbalances in the distribution of nonzeros per matrix row. We develop methods to exploit several common forms of matrix structure while oering alternatives which accommodate greater irregularity. On structured, grid-based matrices we achieve performance of 36 GFLOP/s in single precision and 16 GFLOP/s in double precision on a GeForce GTX 280 GPU. For unstructured nite-element matrices, we observe performance in excess of 15 GFLOP/s and 10 GFLOP/s in single and double precision respectively. These results compare favorably to prior state-of-the-art studies of SpMV methods on conventional multicore processors. Our double precision SpMV performance is generally two and a half times that of a Cell BE with 8 SPEs and more than ten times greater than that of a quad-core Intel Clovertown system.

795 citations


"Solving Lattice QCD systems of equa..." refers background or methods in this paper

  • ...On the GTX 280, the library in [8] could achieve up to 30 Gflops of sustained single precision performance for matrices with similar structure as the one discussed here....

    [...]

  • ...) for which sparse matrix–vector GPU libraries are available [8]....

    [...]

Journal ArticleDOI
01 Mar 2005
TL;DR: Chroma is an open source C++ based software system developed using the software infrastructure of the US SciDAC initiative that interfaces with output from the BAGEL assembly generator for optimised lattice fermion kernels on some architectures.
Abstract: We describe aspects of the Chroma software for lattice QCD calculations. Chroma is an open source C++ based software system developed using the software infrastructure of the US SciDAC initiative. Chroma interfaces with output from the BAGEL assembly generator for optimised lattice fermion kernels on some architectures. It can be run on workstations, clusters and the QCDOC supercomputer.

597 citations


"Solving Lattice QCD systems of equa..." refers methods in this paper

  • ...[19] R....

    [...]

  • ...The linear solver developed in this work has become the mainstay of our open source QUDA library [18], which we have interfaced to the common lattice QCD packages (Chroma [19, 20], CPS [21], QDP/C [22]) for easy integration with current QCD calculations....

    [...]

Journal ArticleDOI
TL;DR: The AND and NOT operatimts are transf0rlned to multiplication and subtraction operations as described in (1) and (3).
Abstract: the symbol f'l denotes an AND operation and the symbol • denotes a multiplication operation. (2) The result of an OR operation w i t h any number of Boolean variables is the same as the (arithmetic) addition of tile x, y, z integer variables after the following t e s t is made: (a) If the sum is equal to zero, the result is correct; (b) If the sum is larger thart zero, the answer is a 1; i.e. where the symbol O denotes an OR (}peration and the symbol-{-denotes an addition operation. (3) The result of a NOT operation with a Bo{}lean variable is the same as subtracting an integer variable x from 1; i.e. i = (1-x) (3) because if A = x = l, t h e n A = 1-1 = 0 ; a n d i f A = x = 0, then /[ = I-0 = I. The FOUTm~N program in Figure 1 illustrates the method presented. It simulates the logic of a full-adder as described by the folh}wing two Boolean flmctions: (5) where K~ , K.a and Ka are the two i n p u t bits attd previous carry to be added, L is the output carry, and M is the output sum. htteger variables were chosen for compatibility with the FOEn'R~N language. The AND and NOT operatimts are transf0rlned to multiplication and subtraction operations as described in (1) and (3). The OR operation needs a control IF s t a t e m e n t after the arithmetic addition is performed in order to restore the value of tile variable to unity. This may be simplified by using a li'unction subprogram to calculate tile result of the OIL operation, thus eliminating the need for repetition of the IF s t a t e m e n t s. It was not done in this example because of the limitatien of the FOremAN colnpiier in the 1620 Model 1 computer where this p r o g r a m was checked out, and where the use of subprograms is not pernfitted.1] proposed the use o f accumulators to evaluate a sum of the form S = when N is large and all the y's are of rougMy the s a m (magnitude. His intention was to alleviate tile accumula~tion 0/ rounding or truncation errors which otherwise occurs wlmn S i~ …

457 citations


"Solving Lattice QCD systems of equa..." refers methods in this paper

  • ...On first generation CUDA devices this poses a problem since double precision is not implemented, so schemes such as Kahan summation [13] are required to reduce the accumulation of errors....

    [...]

Journal ArticleDOI
TL;DR: The architecture and programming model of modern graphics cards for the lattice practitioner with the goal of exploiting these chips for Monte Carlo simulations is outlined.

140 citations

Journal ArticleDOI
TL;DR: This paper proposes a more restrictive strategy for accumulating groups of updates for updating the residual and the approximation, and it is shown that this may improve the accuracy significantly, while maintaining speed of convergence.
Abstract: Many iterative methods for solving linear equationsAx=b aim for accurate approximations tox, and they do so by updating residuals iteratively. In finite precision arithmetic, these computed residuals may be inaccurate, that is, they may differ significantly from the (true) residuals that correspond to the computed approximations. In this paper we will propose variants on Neumaier's strategy, originally proposed for CGS, and explain its success. In particular, we will propose a more restrictive strategy for accumulating groups of updates for updating the residual and the approximation, and we will show that this may improve the accuracy significantly, while maintaining speed of convergence. This approach avoids restarts and allows for more reliable stopping criteria. We will discuss updating conditions and strategies that are efficient, lead to accurate residuals, and are easy to implement. For CGS and Bi-CG these strategies are particularly attractive, but they may also be used to improve Bi-CGSTAB, BiCGstab(l), as well as other methods.

108 citations


"Solving Lattice QCD systems of equa..." refers background or methods in this paper

  • ...The cure advocated in [5] is that of reliable updates: here a parameter δ is introduced, and if the magnitude of the iterated residual decreases by δ compared to the magnitude of all previous residuals, the iterated residual is replaced by the true residual....

    [...]

  • ...9 Here we have simplified the approach given in [5] such that we perform a reliable residual update whenever the norm of the residual decreases by a factor δ relative to the maximum of the residual since the last update....

    [...]

  • ...Residual drift and possible cures have been studied previously in different contexts [5], namely where the drift is caused by the erratic convergence of BiCGstab which induces rounding errors....

    [...]

  • ...In this work we introduce a new method for using mixed precision in the context of Krylov solvers, repurposing the reliable updates scheme of [5]....

    [...]