R
Rong Shi
Researcher at Ohio State University
Publications - 9
Citations - 188
Rong Shi is an academic researcher from Ohio State University. The author has contributed to research in topics: InfiniBand & CUDA. The author has an hindex of 8, co-authored 9 publications receiving 165 citations.
Papers
More filters
Proceedings ArticleDOI
Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters
Rong Shi,Sreeram Potluri,Khaled Hamidouche,Jonathan Perkins,Mingzhe Li,Davide Rossetti,Dhabaleswar K. Panda +6 more
TL;DR: This is the first study to propose efficient designs for GPU communication for small message sizes, using eager protocol, and experimental results demonstrate up to 59% and 63% reduction in latency for GPU- to-GPU and CPU-to-GPU point-to -point communications, respectively.
Proceedings ArticleDOI
Evaluating Scalability Bottlenecks by Workload Extrapolation
Rong Shi,Yifan Gan,Yang Wang +2 more
TL;DR: This paper extrapolates the workload to a bottleneck node and develops PatternMiner, a semi-automatic tool to identify how workload patterns change with scale, which is able to emulate a cluster of up to 60,000 nodes with only 8 physical machines to evaluate NameNode and Resource Manager.
Proceedings ArticleDOI
High performance MPI library over SR-IOV enabled infiniband clusters
TL;DR: This is the first study to offer a high performance MPI library that supports efficient locality aware MPI communication over SR-IOV enabled InfiniBand clusters and can significantly improve the performance for point-to-point and collective operations.
Proceedings ArticleDOI
HAND: A Hybrid Approach to Accelerate Non-contiguous Data Movement Using MPI Datatypes on GPU Clusters
TL;DR: This is the first attempt to propose a hybrid and adaptive solution to integrate all existing schemes to optimize arbitrary non-contiguous data movement using MPI data types on GPU clusters.
Proceedings ArticleDOI
A scalable and portable approach to accelerate hybrid HPL on heterogeneous CPU-GPU clusters
TL;DR: This paper presents a novel two-level workload partitioning approach for HPL that distributes workload based on the compute power of CPU/GPU nodes across the cluster and takes advantage of asynchronous kernel launches and CUDA copies to overlap computation and CPU-GPU data movement.