D
Devendar Bureddy
Researcher at Ohio State University
Publications - 14
Citations - 582
Devendar Bureddy is an academic researcher from Ohio State University. The author has contributed to research in topics: InfiniBand & Supercomputer. The author has an hindex of 11, co-authored 14 publications receiving 510 citations. Previous affiliations of Devendar Bureddy include Mellanox Technologies.
Papers
More filters
Proceedings ArticleDOI
Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs
TL;DR: The proposed designs improve the latency of internode GPU-to-GPU communication using MPI Send/MPI Recv by 69% and 32% for 4Byte and 128KByte messages, respectively and boost the uni-directional bandwidth achieved by 2x and 35%, respectively.
Proceedings ArticleDOI
Optimizing MPI Communication on Multi-GPU Systems Using CUDA Inter-Process Communication
Sreeram Potluri,Hao Wang,Devendar Bureddy,Ashish Kumar Singh,Carlos Rosales,Dhabaleswar K. Panda +5 more
TL;DR: This paper proposes efficient designs for intra-node MPI communication on multi-GPU nodes, taking advantage of IPC capabilities provided in CUDA, and is the first paper to provide a comprehensive solution for MPI two-sided and one-sided GPU-to-GPU communication within a node, using CUDA IPC.
Journal ArticleDOI
GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation
TL;DR: The GPU-Aware MPI is proposed to support data communication from GPU to GPU using standard MPI, which unifies the separate memory spaces, and avoids explicit CPU-GPU data movement and CPU/GPU buffer management.
Scalable hierarchical aggregation protocol (SHArP): a hardware architecture for efficient data reduction
Richard L. Graham,Devendar Bureddy,Pak Lui,Hal Rosenstock,Gilad Shainer,Gil Bloch,Dror Goldenerg,Mike Dubman,Sasha Kotchubievsky,Vladimir Koushnir,Lion Levi,Alex Margolin,Tamir Ronen,Alexander Shpiner,Oded Wertheim,Eitan Zahavi +15 more
TL;DR: The SHArP technology designed to offload collective operation processing to the network is described, implemented in Mellanox's SwitchIB-2 ASIC, using innetwork trees to reduce data from a group of sources, and to distribute the result.
Book ChapterDOI
OMB-GPU: a micro-benchmark suite for evaluating MPI libraries on GPU clusters
TL;DR: The widely used OSU Micro-Benchmarks (OMB) suite is extended with benchmarks that evaluate performance of point-point, multi-pair and collective MPI communication for different GPU cluster configurations.