scispace - formally typeset
D

Devendar Bureddy

Researcher at Ohio State University

Publications -  14
Citations -  582

Devendar Bureddy is an academic researcher from Ohio State University. The author has contributed to research in topics: InfiniBand & Supercomputer. The author has an hindex of 11, co-authored 14 publications receiving 510 citations. Previous affiliations of Devendar Bureddy include Mellanox Technologies.

Papers
More filters
Proceedings ArticleDOI

Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs

TL;DR: The proposed designs improve the latency of internode GPU-to-GPU communication using MPI Send/MPI Recv by 69% and 32% for 4Byte and 128KByte messages, respectively and boost the uni-directional bandwidth achieved by 2x and 35%, respectively.
Proceedings ArticleDOI

Optimizing MPI Communication on Multi-GPU Systems Using CUDA Inter-Process Communication

TL;DR: This paper proposes efficient designs for intra-node MPI communication on multi-GPU nodes, taking advantage of IPC capabilities provided in CUDA, and is the first paper to provide a comprehensive solution for MPI two-sided and one-sided GPU-to-GPU communication within a node, using CUDA IPC.
Journal ArticleDOI

GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation

TL;DR: The GPU-Aware MPI is proposed to support data communication from GPU to GPU using standard MPI, which unifies the separate memory spaces, and avoids explicit CPU-GPU data movement and CPU/GPU buffer management.

Scalable hierarchical aggregation protocol (SHArP): a hardware architecture for efficient data reduction

TL;DR: The SHArP technology designed to offload collective operation processing to the network is described, implemented in Mellanox's SwitchIB-2 ASIC, using innetwork trees to reduce data from a group of sources, and to distribute the result.
Book ChapterDOI

OMB-GPU: a micro-benchmark suite for evaluating MPI libraries on GPU clusters

TL;DR: The widely used OSU Micro-Benchmarks (OMB) suite is extended with benchmarks that evaluate performance of point-point, multi-pair and collective MPI communication for different GPU cluster configurations.