L
Lena Oden
Researcher at Rolf C. Hagen Group
Publications - 26
Citations - 261
Lena Oden is an academic researcher from Rolf C. Hagen Group. The author has contributed to research in topics: Performance per watt & InfiniBand. The author has an hindex of 9, co-authored 21 publications receiving 211 citations. Previous affiliations of Lena Oden include Heidelberg University & Forschungszentrum Jülich.
Papers
More filters
Proceedings ArticleDOI
GGAS: Global GPU address spaces for efficient communication in heterogeneous clusters
Lena Oden,Holger Fröning +1 more
TL;DR: This work introduces global address spaces to facilitate direct communication between distributed GPUs without CPU involvement to avoid context switches and unnecessary copying, which dramatically reduces communication overhead.
Proceedings ArticleDOI
Infiniband-Verbs on GPU: A Case Study of Controlling an Infiniband Network Device from the GPU
TL;DR: The results show that complex networking protocols like IBVERBS are better handled by CPUs in spite of time penalties due to context switching, since overhead of work request generation cannot be parallelized and is not suitable with the high parallel programming model of GPUs.
Proceedings ArticleDOI
Why is MPI so slow?: analyzing the fundamental limits in implementing MPI-3.1
Ken Raffenetti,Abdelhalim Amer,Lena Oden,Charles J. Archer,Wesley Bland,Hajime Fujita,Yanfei Guo,Tomislav Janjusic,Dmitry Durnov,Michael Alan Blocksome,Min Si,Sangmin Seo,Akhil Langer,Gengbin Zheng,Masamichi Takagi,Paul Coffman,Jithin Jose,Sayantan Sur,Alexander Sannikov,Sergey Oblomov,Michael Chuvelev,Masayuki Hatanaka,Xin Zhao,Paul Fischer,Thilina Rathnayake,Matthew Otten,Misun Min,Pavan Balaji +27 more
TL;DR: This paper provides an in-depth analysis of the software overheads in the MPI performance-critical path and exposes mandatory performance overheads that are unavoidable based on theMPI-3.1 specification.
Proceedings ArticleDOI
Energy-efficient collective reduce and allreduce operations on distributed GPUs
TL;DR: Global GPU Address Spaces (GGAS) enable a direct GPU-to-GPU communication for heterogeneous clusters, which is completely in-line with the GPU's thread-collective execution model and does not require CPU assistance or staging copies in host memory.
Proceedings ArticleDOI
Lessons learned from comparing C-CUDA and Python-Numba for GPU-Computing
TL;DR: This paper compares the performance of Numba- CUDA and C -CUDA for different kinds of applications and suggests that C-CUDA applications still outperform the NumbA versions, especially for heavy computations.