Showing papers by "Garth N. Wells published in 2022"

PDF

Open Access

Journal Article•DOI•

Basix: a runtime finite element basis evaluation library

[...]

Matthew W. Scroggs, Igor A. Baratta, Christopher Richardson, Garth N. Wells

25 May 2022-The Journal of Open Source Software

32 citations

Proceedings Article•DOI•

Performance analysis of matrix-free conjugate gradient kernels using SYCL

[...]

Igor A. Baratta, Christopher Richardson, Garth N. Wells

10 May 2022

TL;DR: SYCL built-in reductions on the GPU and all operations for one of the SYCL implementations on the CPU exhibit high latency, and this latency limits performance at problem sizes that can in cases be representative of full application simulations, and can degrade strong scaling performance.

...read moreread less

Abstract: We examine the performance of matrix-free SYCL implementations of the conjugate gradient method for solving sparse linear systems of equations. Performance is tested on an NVIDIA A100-80GB device and a dual socket Intel Ice Lake CPU node using different SYCL implementations, and compared to CUDA BLAS (cuBLAS) implementations on the A100 GPU and MKL implementations on the CPU node. All considered kernels in the matrix-free implementation are memory bandwidth limited, and a simple performance model is applied to estimate the asymptotic memory bandwidth and the latency. Our experiments show that in most cases the considered SYCL implementations match the asymptotic performance of the reference implementations. However, for smaller but practically relevant problem sizes latency is observed to have a significant impact on performance. For some cases the SYCL latency is reasonably close to the reference (cuBLAS/MKL) implementation latency, but in other cases it is more than one order of magnitude greater. In particular, SYCL built-in reductions on the GPU and all operations for one of the SYCL implementations on the CPU exhibit high latency, and this latency limits performance at problem sizes that can in cases be representative of full application simulations, and can degrade strong scaling performance.

...read moreread less