scispace - formally typeset
Y

Yao Zhang

Researcher at University of California, Davis

Publications -  11
Citations -  1500

Yao Zhang is an academic researcher from University of California, Davis. The author has contributed to research in topics: Solver & CUDA. The author has an hindex of 9, co-authored 11 publications receiving 1441 citations.

Papers
More filters
Proceedings ArticleDOI

Scan primitives for GPU computing

TL;DR: Using the scan primitives, this work shows novel GPU implementations of quicksort and sparse matrix-vector multiply, and analyzes the performance of the scanPrimitives, several sort algorithms that use the scan Primitives, and a graphical shallow-water fluid simulation using the scan framework for a tridiagonal matrix solver.
Proceedings ArticleDOI

A quantitative performance analysis model for GPU architectures

TL;DR: A microbenchmark-based performance model is developed for NVIDIA GeForce 200-series GPUs that identifies GPU program bottlenecks and quantitatively analyzes performance, and thus allows programmers and architects to predict the benefits of potential program optimizations and architectural improvements.
Proceedings ArticleDOI

Fast tridiagonal solvers on the GPU

TL;DR: To combine the benefits of the basic algorithms, this work proposes hybrid CR+PCR and CR+RD algorithms, which improve the performance of PCR, RD and CR by 21%, 31% and 61% respectively.
Proceedings ArticleDOI

Parallel lossless data compression on the GPU

TL;DR: This work utilizing a two-level hierarchical sort for BWT, design a novel scan-based parallel MTF algorithm, and implement a parallel reduction scheme to build the Huffman tree to parallelize the bzip2 compression pipeline.
Proceedings ArticleDOI

An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU

TL;DR: This work shows two ways to effectively prune the tuning space and thus avoid an impractical exhaustive search and demonstrates that auto-tuning is a powerful tool that improves the performance by up to 5x, saves 17% and 32% of execution time on average respectively over static and dynamic tuning, and enables the multi-stage solver to outperform the Intel MKL tridiagonal solver on many parallel tridiagons by 6-11x.