P
Piyush Sao
Researcher at Oak Ridge National Laboratory
Publications - 20
Citations - 288
Piyush Sao is an academic researcher from Oak Ridge National Laboratory. The author has contributed to research in topics: Computer science & Solver. The author has an hindex of 6, co-authored 16 publications receiving 213 citations. Previous affiliations of Piyush Sao include Georgia Institute of Technology.
Papers
More filters
Proceedings ArticleDOI
A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems
TL;DR: This paper presents the first sparse direct solver for distributed memory systems comprising hybrid multicourse CPU and Intel Xeon Pico-processors, and introduces a novel algorithm, called HALO, which combines highly aggressive use of asynchrony with accelerated offload, lazy updates, and data shadowing.
Journal ArticleDOI
A distributed kernel summation framework for general-dimension machine learning
TL;DR: This is the first distributed implementation of kernel summation framework that can utilize various types of deterministic and probabilistic approximations that may be suitable for low and high‐dimensional problems with a large number of data points and a dynamic load balancing scheme to adjust work imbalances during the computation.
Proceedings ArticleDOI
A communication-avoiding 3D sparse triangular solver
TL;DR: This work presents a novel distributed memory algorithm to improve the strong scalability of the solution of a sparse triangular system, and implements the algorithm for use in SuperLU_DIST3D, using a hybrid MPI+OpenMP programming model.
Journal ArticleDOI
A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems
TL;DR: The 3D algorithm for sparse LU uses a three-dimensional MPI process grid, exploits elimination tree parallelism, and trades off increased memory for reduced per-process communication and asymptotic improvements for planar graphs and certain non-planar graphs.
Proceedings ArticleDOI
Scalable Knowledge Graph Analytics at 136 Petaflop/s
Ramakrishnan Kannan,Piyush Sao,Hao Lu,Drahomira Herrmannova,Vijay Thakkar,Robert M. Patton,Richard Vuduc,Thomas E. Potok +7 more
TL;DR: In this article, the authors presented a new high-performance algorithm and implementation of the Floyd-Warshall algorithm for distributed-memory parallel computers accelerated by GPUs, which they call DSNAPSHOT (Distributed Accelerated Semiring All-Pairs Shortest Path).