Y
Yunquan Zhang
Researcher at Chinese Academy of Sciences
Publications - 97
Citations - 1190
Yunquan Zhang is an academic researcher from Chinese Academy of Sciences. The author has contributed to research in topics: Speedup & Computer science. The author has an hindex of 16, co-authored 86 publications receiving 989 citations. Previous affiliations of Yunquan Zhang include Fudan University.
Papers
More filters
Proceedings ArticleDOI
AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs
TL;DR: A template-based optimization framework, AUGEM, is presented, which can automatically generate fully optimized assembly code for several dense linear algebra kernels, such as GEMM, GEMV, AXPY and DOT, on varying multi-core CPUs without requiring any manual interference from developers.
Proceedings ArticleDOI
yaSpMV: yet another SpMV framework on GPUs
TL;DR: A new SpMV format is devised, called blocked compressed common coordinate (BCCOO), which uses bit flags to store the row indices in a blocked common coordinate format so as to alleviate the bandwidth problem and an auto-tuning framework is introduced to choose optimization parameters based on the characteristics of input sparse matrices and target hardware platforms.
Journal ArticleDOI
Parallel Processing Systems for Big Data: A Survey
Yunquan Zhang,Ting Cao,Shigang Li,Xinhui Tian,Liang Yuan,Haipeng Jia,Athanasios V. Vasilakos +6 more
TL;DR: This survey paper will give a high-level overview of the existing parallel data processing systems categorized by the data input as batch processing, stream processing, graph processing, and machine learning processing and introduce representative projects in each category.
Proceedings ArticleDOI
StreamScan: fast scan algorithms for GPUs without global barrier synchronization
TL;DR: StreamScan is a novel approach to implement scan on GPUs with only one computation phase, and the main idea is to restrict synchronization to only adjacent workgroups, and thereby eliminating global barrier synchronization completely.
Proceedings ArticleDOI
A Parallel Shortest Path Algorithm Based on Graph-Partitioning and Iterative Correcting
Yuxin Tang,Yunquan Zhang,Hu Chen +2 more
TL;DR: A parallel shortest path algorithm based on graph partitioning and iterative correcting is proposed in this article, which achieves a 15-fold speedup on 16 processors in an IBM cluster over real road networks.