scispace - formally typeset
X

Xulong Tang

Researcher at University of Pittsburgh

Publications -  49
Citations -  666

Xulong Tang is an academic researcher from University of Pittsburgh. The author has contributed to research in topics: Computer science & Compiler. The author has an hindex of 10, co-authored 34 publications receiving 451 citations. Previous affiliations of Xulong Tang include Pennsylvania State University & University of Science and Technology of China.

Papers
More filters
Proceedings ArticleDOI

Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities

TL;DR: Two new runtime techniques are developed: a regression-based affinity prediction model and mechanism that accurately identifies which kernels would benefit from PIM and offloads them to GPU cores in memory, and a concurrent kernel management mechanism that uses the affinity Prediction model, a new kernel execution time prediction model, and kernel dependency information to decide which kernels to schedule concurrently on main GPU cores and the GPU core in memory.
Posted Content

YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design

TL;DR: This work proposes YOLObile framework, a real-time object detection on mobile devices via compression-compilation co-design, and proposes a novel block-punched pruning scheme for any kernel size.
Proceedings ArticleDOI

Controlled Kernel Launch for Dynamic Parallelism in GPUs

TL;DR: This work proposes SPAWN, a runtime framework that controls the dynamically-generated kernels, thereby directly reducing the associated launch overheads and queuing latency and achieves 69% and 57% speedup over the flat (non-DP) implementation and baseline DP, respectively.
Proceedings ArticleDOI

Data movement aware computation partitioning

TL;DR: The potential of compiler support in exploiting NDP in the context of emerging many core systems is explored and a novel compiler algorithm that partitions the computations in a given loop nest into sub computations and schedules the resulting sub computation on different cores with the goal of reducing the distance-to-data on the on-chip network is proposed.
Proceedings ArticleDOI

Opportunistic computing in GPU architectures

TL;DR: This paper develops two offloading techniques, called LLC-Compute and Omni- Compute, which employ simple bookkeeping hardware to enable GPU cores to compute instructions offloaded by other GPU cores.