Open Access
Bohrium: Unmodified NumPy Code on CPU, GPU, and Cluster
About:
The article was published on 2013-01-01 and is currently open access. It has received 21 citations till now. The article focuses on the topics: NumPy & GPU cluster.read more
Citations
More filters
Proceedings ArticleDOI
An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data
Weifeng Liu,Brian Vinter +1 more
TL;DR: This work presents a GPU SpGEMM algorithm that particularly focuses on load balancing, memory pre-allocation for the result matrix, and parallel insert operations of the nonzero entries that is experimentally found to be the fastest GPU merge approach.
Posted ContentDOI
Productivity, Portability, Performance: Data-Centric Python
Alexandros Nikolaos Ziogas,Timo Schneider,Tal Ben-Nun,Alexandru Calotoiu,Tiziano De Matteis,Johannes de Fine Licht,Luca Lavarini,Torsten Hoefler +7 more
TL;DR: In this paper, the authors present a workflow that retains Python's high productivity while achieving portable performance across different architectures, including CPU, GPU, FPGA, and the Piz Daint supercomputer.
Proceedings ArticleDOI
Legate NumPy: accelerated and distributed array computing
Michael Bauer,Michael Garland +1 more
TL;DR: Legate is introduced, a drop-in replacement for NumPy that requires only a single-line code change and can scale up to an arbitrary number of GPU accelerated nodes and achieve speed-ups of up to 10X on 1280 CPUs and 100X on 256 GPUs.
Journal ArticleDOI
Veros v0.1 – a fast and versatile ocean simulator in pure Python
Dion Häfner,René Løwe Jacobsen,Carsten Eden,Mads Ruben Burgdorff Kristensen,Markus Jochum,Roman Nuterman,Brian Vinter +6 more
TL;DR: A general circulation ocean model is translated from Fortran to Python, and it is found that even in a realistic setting the phase speeds of boundary waves matched the expectations based on theory and idealized models.
Proceedings ArticleDOI
Fusion of Parallel Array Operations
TL;DR: In this article, the problem of fusing array operations based on shape compatibility, data reuse, and minimizing for data reuse has been formulated as a static weighted graph partitioning problem, known as the Weighted Loop Fusion problem.