scispace - formally typeset
Open AccessJournal ArticleDOI

PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation

Reads0
Chats0
TLDR
This article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems.
Abstract
High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that supports this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.

read more

Citations
More filters

MEG and EEG data analysis with MNE-Python

TL;DR: MNE-Python is an open-source software package that addresses this challenge by providing state-of-the-art algorithms implemented in Python that cover multiple methods of data preprocessing, source localization, statistical analysis, and estimation of functional connectivity between distributed brain regions.
Journal ArticleDOI

The fast azimuthal integration Python library: pyFAI .

TL;DR: This article details the geometry, peak-picking, calibration and integration procedures on multi- and many-core devices implemented in the Python library for high-performance azimuthal integration.
Journal ArticleDOI

Nengo: a Python tool for building large-scale functional brain models.

TL;DR: Nengo 2.0 is described, which is implemented in Python and uses simple and extendable syntax, simulates a benchmark model on the scale of Spaun 50 times faster than Nengo 1.4, and has a flexible mechanism for collecting simulation results.
Journal ArticleDOI

PyFR: An open source framework for solving advection–diffusion type problems on streaming architectures using the flux reconstruction approach

TL;DR: The Flux Reconstruction (FR) approach unifies various high-order schemes for unstructured grids within a single framework, and is thus able to run efficiently on modern streaming architectures, such as Graphical Processing Units (GPUs).
Journal ArticleDOI

A mixed-scale dense convolutional neural network for image analysis.

TL;DR: A network architecture based on using dilated convolutions to capture features at different image scales and densely connecting all feature maps with each other is introduced, able to achieve accurate results with relatively few parameters and consists of a single set of operations.
References
More filters
Journal ArticleDOI

Methods of Conjugate Gradients for Solving Linear Systems

TL;DR: An iterative algorithm is given for solving a system Ax=k of n linear equations in n unknowns and it is shown that this method is a special case of a very general method which also includes Gaussian elimination.
Journal ArticleDOI

The Design and Implementation of FFTW3

TL;DR: It is shown that such an approach can yield an implementation of the discrete Fourier transform that is competitive with hand-optimized libraries, and the software structure that makes the current FFTW3 version flexible and adaptive is described.
Journal ArticleDOI

A bridging model for parallel computation

TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.
Journal ArticleDOI

A Survey of General-Purpose Computation on Graphics Hardware

TL;DR: This report describes, summarize, and analyzes the latest research in mapping general‐purpose computation to graphics hardware.