scispace - formally typeset
Y

Yunquan Zhang

Researcher at Chinese Academy of Sciences

Publications -  97
Citations -  1190

Yunquan Zhang is an academic researcher from Chinese Academy of Sciences. The author has contributed to research in topics: Speedup & Computer science. The author has an hindex of 16, co-authored 86 publications receiving 989 citations. Previous affiliations of Yunquan Zhang include Fudan University.

Papers
More filters
Proceedings ArticleDOI

AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs

TL;DR: A template-based optimization framework, AUGEM, is presented, which can automatically generate fully optimized assembly code for several dense linear algebra kernels, such as GEMM, GEMV, AXPY and DOT, on varying multi-core CPUs without requiring any manual interference from developers.
Proceedings ArticleDOI

yaSpMV: yet another SpMV framework on GPUs

TL;DR: A new SpMV format is devised, called blocked compressed common coordinate (BCCOO), which uses bit flags to store the row indices in a blocked common coordinate format so as to alleviate the bandwidth problem and an auto-tuning framework is introduced to choose optimization parameters based on the characteristics of input sparse matrices and target hardware platforms.
Journal ArticleDOI

Parallel Processing Systems for Big Data: A Survey

TL;DR: This survey paper will give a high-level overview of the existing parallel data processing systems categorized by the data input as batch processing, stream processing, graph processing, and machine learning processing and introduce representative projects in each category.
Proceedings ArticleDOI

StreamScan: fast scan algorithms for GPUs without global barrier synchronization

TL;DR: StreamScan is a novel approach to implement scan on GPUs with only one computation phase, and the main idea is to restrict synchronization to only adjacent workgroups, and thereby eliminating global barrier synchronization completely.
Proceedings ArticleDOI

A Parallel Shortest Path Algorithm Based on Graph-Partitioning and Iterative Correcting

TL;DR: A parallel shortest path algorithm based on graph partitioning and iterative correcting is proposed in this article, which achieves a 15-fold speedup on 16 processors in an IBM cluster over real road networks.