scispace - formally typeset
W

Wilson W. L. Fung

Researcher at University of British Columbia

Publications -  19
Citations -  2799

Wilson W. L. Fung is an academic researcher from University of British Columbia. The author has contributed to research in topics: CUDA & SIMD. The author has an hindex of 12, co-authored 19 publications receiving 2607 citations. Previous affiliations of Wilson W. L. Fung include Samsung.

Papers
More filters
Proceedings ArticleDOI

Analyzing CUDA workloads using a detailed GPU simulator

TL;DR: In this paper, the performance of non-graphics applications written in NVIDIA's CUDA programming model is evaluated on a microarchitecture performance simulator that runs NVIDIA's parallel thread execution (PTX) virtual instruction set.
Proceedings ArticleDOI

Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow

TL;DR: It is shown that a realistic hardware implementation that dynamically regroups threads into new warps on the fly following the occurrence of diverging branch outcomes improves performance by an average of 20.7% for an estimated area increase of 4.7%.
Proceedings ArticleDOI

Thread block compaction for efficient SIMT control flow

TL;DR: This paper proposes and evaluates the benefits of extending the sharing of resources in a block of warps, already used for scratchpad memory, to exploit control flow locality among threads, and shows that this compaction mechanism provides an average speedup of 22% over a baseline per-warp, stack-based reconvergence mechanism.
Proceedings ArticleDOI

Cache coherence for GPU architectures

TL;DR: This paper describes a time-based coherence framework for GPUs, called Temporal Coherence (TC), that exploits globally synchronized counters in single-chip systems to develop a streamlined GPU coherence protocol, called TC-Weak.
Proceedings ArticleDOI

Hardware transactional memory for GPU architectures

TL;DR: KILO TM is proposed, a novel hardware TM design for GPUs that scales to 1000s of concurrent transactions that uses word-level, value-based conflict detection to avoid broadcast communication and reduce on-chip storage overhead.