scispace - formally typeset
P

Piyush Sao

Researcher at Oak Ridge National Laboratory

Publications -  20
Citations -  288

Piyush Sao is an academic researcher from Oak Ridge National Laboratory. The author has contributed to research in topics: Computer science & Solver. The author has an hindex of 6, co-authored 16 publications receiving 213 citations. Previous affiliations of Piyush Sao include Georgia Institute of Technology.

Papers
More filters
Proceedings ArticleDOI

A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems

TL;DR: This paper presents the first sparse direct solver for distributed memory systems comprising hybrid multicourse CPU and Intel Xeon Pico-processors, and introduces a novel algorithm, called HALO, which combines highly aggressive use of asynchrony with accelerated offload, lazy updates, and data shadowing.
Journal ArticleDOI

A distributed kernel summation framework for general-dimension machine learning

TL;DR: This is the first distributed implementation of kernel summation framework that can utilize various types of deterministic and probabilistic approximations that may be suitable for low and high‐dimensional problems with a large number of data points and a dynamic load balancing scheme to adjust work imbalances during the computation.
Proceedings ArticleDOI

A communication-avoiding 3D sparse triangular solver

TL;DR: This work presents a novel distributed memory algorithm to improve the strong scalability of the solution of a sparse triangular system, and implements the algorithm for use in SuperLU_DIST3D, using a hybrid MPI+OpenMP programming model.
Journal ArticleDOI

A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems

TL;DR: The 3D algorithm for sparse LU uses a three-dimensional MPI process grid, exploits elimination tree parallelism, and trades off increased memory for reduced per-process communication and asymptotic improvements for planar graphs and certain non-planar graphs.
Proceedings ArticleDOI

Scalable Knowledge Graph Analytics at 136 Petaflop/s

TL;DR: In this article, the authors presented a new high-performance algorithm and implementation of the Floyd-Warshall algorithm for distributed-memory parallel computers accelerated by GPUs, which they call DSNAPSHOT (Distributed Accelerated Semiring All-Pairs Shortest Path).