scispace - formally typeset
Open Access

Bohrium: Unmodified NumPy Code on CPU, GPU, and Cluster

About
The article was published on 2013-01-01 and is currently open access. It has received 21 citations till now. The article focuses on the topics: NumPy & GPU cluster.

read more

Citations
More filters
Proceedings ArticleDOI

An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data

TL;DR: This work presents a GPU SpGEMM algorithm that particularly focuses on load balancing, memory pre-allocation for the result matrix, and parallel insert operations of the nonzero entries that is experimentally found to be the fastest GPU merge approach.
Posted ContentDOI

Productivity, Portability, Performance: Data-Centric Python

TL;DR: In this paper, the authors present a workflow that retains Python's high productivity while achieving portable performance across different architectures, including CPU, GPU, FPGA, and the Piz Daint supercomputer.
Proceedings ArticleDOI

Legate NumPy: accelerated and distributed array computing

TL;DR: Legate is introduced, a drop-in replacement for NumPy that requires only a single-line code change and can scale up to an arbitrary number of GPU accelerated nodes and achieve speed-ups of up to 10X on 1280 CPUs and 100X on 256 GPUs.
Journal ArticleDOI

Veros v0.1 – a fast and versatile ocean simulator in pure Python

TL;DR: A general circulation ocean model is translated from Fortran to Python, and it is found that even in a realistic setting the phase speeds of boundary waves matched the expectations based on theory and idealized models.
Proceedings ArticleDOI

Fusion of Parallel Array Operations

TL;DR: In this article, the problem of fusing array operations based on shape compatibility, data reuse, and minimizing for data reuse has been formulated as a static weighted graph partitioning problem, known as the Weighted Loop Fusion problem.