scispace - formally typeset
G

George L. Yuan

Researcher at University of British Columbia

Publications -  7
Citations -  2407

George L. Yuan is an academic researcher from University of British Columbia. The author has contributed to research in topics: Instruction set & Memory controller. The author has an hindex of 6, co-authored 7 publications receiving 2252 citations.

Papers
More filters
Proceedings ArticleDOI

Analyzing CUDA workloads using a detailed GPU simulator

TL;DR: In this paper, the performance of non-graphics applications written in NVIDIA's CUDA programming model is evaluated on a microarchitecture performance simulator that runs NVIDIA's parallel thread execution (PTX) virtual instruction set.
Proceedings ArticleDOI

Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow

TL;DR: It is shown that a realistic hardware implementation that dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes improves performance by an average of 20.7% for an estimated area increase of 4.7%.
Proceedings ArticleDOI

Complexity effective memory access scheduling for many-core accelerator architectures

TL;DR: This paper proposes a complexity-effective solution to DRAM request scheduling which recovers most of the performance loss incurred by a naive in-order first-in first-out (FIFO) DRAM Scheduler compared to an aggressive out-of-order DRAM scheduler.
Proceedings ArticleDOI

StoreGPU: exploiting graphics processing units to accelerate distributed storage systems

TL;DR: StoreGPU is designed, a library that accelerates a number of hashing based primitives popular in distributed storage system implementations that enable up to eight-fold performance gains on synthetic benchmarks as well as on a high-level application: the online similarity detection between large data files.
Journal ArticleDOI

Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware

TL;DR: This article proposes dynamic warp formation and scheduling, a mechanism for more efficient SIMD branch execution on GPUs that dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes.