G
George L. Yuan
Researcher at University of British Columbia
Publications - 7
Citations - 2407
George L. Yuan is an academic researcher from University of British Columbia. The author has contributed to research in topics: Instruction set & Memory controller. The author has an hindex of 6, co-authored 7 publications receiving 2252 citations.
Papers
More filters
Proceedings ArticleDOI
Analyzing CUDA workloads using a detailed GPU simulator
TL;DR: In this paper, the performance of non-graphics applications written in NVIDIA's CUDA programming model is evaluated on a microarchitecture performance simulator that runs NVIDIA's parallel thread execution (PTX) virtual instruction set.
Proceedings ArticleDOI
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow
TL;DR: It is shown that a realistic hardware implementation that dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes improves performance by an average of 20.7% for an estimated area increase of 4.7%.
Proceedings ArticleDOI
Complexity effective memory access scheduling for many-core accelerator architectures
TL;DR: This paper proposes a complexity-effective solution to DRAM request scheduling which recovers most of the performance loss incurred by a naive in-order first-in first-out (FIFO) DRAM Scheduler compared to an aggressive out-of-order DRAM scheduler.
Proceedings ArticleDOI
StoreGPU: exploiting graphics processing units to accelerate distributed storage systems
TL;DR: StoreGPU is designed, a library that accelerates a number of hashing based primitives popular in distributed storage system implementations that enable up to eight-fold performance gains on synthetic benchmarks as well as on a high-level application: the online similarity detection between large data files.
Journal ArticleDOI
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware
TL;DR: This article proposes dynamic warp formation and scheduling, a mechanism for more efficient SIMD branch execution on GPUs that dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes.