scispace - formally typeset
G

Guangming Tan

Researcher at Chinese Academy of Sciences

Publications -  121
Citations -  1529

Guangming Tan is an academic researcher from Chinese Academy of Sciences. The author has contributed to research in topics: Parallel algorithm & Speedup. The author has an hindex of 17, co-authored 105 publications receiving 1235 citations.

Papers
More filters
Proceedings ArticleDOI

Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform

TL;DR: The implementations of the Smith-Waterman algorithm for both DNA and protein sequences on the XD1000 platform are presented and a multistage PE (processing element) design is brought forward which significantly reduces the FPGA resource usage and hence allows more parallelism to be exploited.
Proceedings ArticleDOI

SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication

TL;DR: A Sparse Matrix-vector multiplication Auto-Tuning system (SMAT) to bridge the gap between specific optimizations and general-purpose usage and automatically determines the optimal format and implementation for any input sparse matrix at runtime.
Proceedings ArticleDOI

Fast implementation of DGEMM on Fermi GPU

TL;DR: This paper presents a thorough experience on tuning double-precision matrix-matrix multiplication (DGEM-M) on the Fermi GPU architecture and chooses an optimal algorithm with blocking in both shared memory and registers to satisfy the constraints of the Fermani memory hierarchy.
Proceedings ArticleDOI

A parallel dynamic programming algorithm on a multi-core architecture

TL;DR: This paper presents a programming and execution model for multi-core architectures with memory hierarchy, and proposes a parallel pipelined algorithm for filling the dynamic programming matrix by decomposing the computation operators.
Proceedings ArticleDOI

Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning

TL;DR: The toolchain is an attempt to automatically crack different GPU ISA encodings and build an assembler adaptively for the purpose of performance enhancements to applications on GPUs.