scispace - formally typeset
Search or ask a question

Showing papers by "Krzysztof Kurowski published in 2015"


Journal ArticleDOI
TL;DR: A method is proposed, which ensures a comprehensive analysis of the resource consumption of the EULAG algorithm, including data transfers between host and global memory, global and shared memories, as well as GPU occupancy, which shows a promising increase in terms of computational efficiency.
Abstract: The goal of this study is to adapt the multiscale fluid solver EULerian or LAGrangian framewrok EULAG to future graphics processing units GPU platforms. The EULAG model has the proven record of successful applications, and excellent efficiency and scalability on conventional supercomputer architectures. Currently, the model is being implemented as the new dynamical core of the COSMO weather prediction framework. Within this study, two main modules of EULAG, namely the multidimensional positive definite advection transport algorithm MPDATA and the variational generalized conjugate residual, elliptic pressure solver Generalized Conjugate Residual GCR are analyzed and optimized. In this paper, a method is proposed, which ensures a comprehensive analysis of the resource consumption including registers, shared, and global memories. This method allows us to identify bottlenecks of the algorithm, including data transfers between host and global memory, global and shared memories, as well as GPU occupancy. We put the emphasis on providing a fixed memory access pattern, padding as well as organizing computation in the MPDATA algorithm. The testing and validation of the new GPU implementation have been carried out based on modeling decaying turbulence of a homogeneous incompressible fluid in a triply-periodic cube. Simulations performed using the standard version of EULAG and its new GPU implementation give similar solutions. Preliminary results show a promising increase in terms of computational efficiency. Copyright © 2014 John Wiley & Sons, Ltd.

25 citations


Journal ArticleDOI
TL;DR: This paper aims to present a high-level stencil framework implemented for the EULerian or LAGrangian model (EULAG) that efficiently utilises multi- and many-cores architectures.
Abstract: The recent advent of novel multi- and many-core architectures forces application programmers to deal with hardware-specific implementation details and to be familiar with software optimisation techniques to benefit from new high-performance computing machines. Extra care must be taken for communication-intensive algorithms, which may be a bottleneck for forthcoming era of exascale computing. This paper aims to present a high-level stencil framework implemented for the EULerian or LAGrangian model (EULAG) that efficiently utilises multi- and many-cores architectures. Only an efficient usage of both many-core processors (CPUs) and graphics processing units (GPUs) with the flexible data decomposition method can lead to the maximum performance that scales the communication-intensive Generalized Conjugate Residual (GCR) elliptic solver with preconditioner.

4 citations