Xulong Tang

Researcher at University of Pittsburgh

Publications - 49

Citations - 666

Xulong Tang is an academic researcher from University of Pittsburgh. The author has contributed to research in topics: Computer science & Compiler. The author has an hindex of 10, co-authored 34 publications receiving 451 citations. Previous affiliations of Xulong Tang include Pennsylvania State University & University of Science and Technology of China.

Papers

PDF

Open Access

More filters

Proceedings ArticleDOI

Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities

Ashutosh Pattnaik, +7 more

TL;DR: Two new runtime techniques are developed: a regression-based affinity prediction model and mechanism that accurately identifies which kernels would benefit from PIM and offloads them to GPU cores in memory, and a concurrent kernel management mechanism that uses the affinity Prediction model, a new kernel execution time prediction model, and kernel dependency information to decide which kernels to schedule concurrently on main GPU cores and the GPU core in memory.

...read moreread less

Posted Content

YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design

Yuxuan Cai, +7 more

- 12 Sep 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work proposes YOLObile framework, a real-time object detection on mobile devices via compression-compilation co-design, and proposes a novel block-punched pruning scheme for any kernel size.

...read moreread less

Proceedings ArticleDOI

Controlled Kernel Launch for Dynamic Parallelism in GPUs

Xulong Tang, +8 more

TL;DR: This work proposes SPAWN, a runtime framework that controls the dynamically-generated kernels, thereby directly reducing the associated launch overheads and queuing latency and achieves 69% and 57% speedup over the flat (non-DP) implementation and baseline DP, respectively.

...read moreread less

Proceedings ArticleDOI

Data movement aware computation partitioning

Xulong Tang, +3 more

TL;DR: The potential of compiler support in exploiting NDP in the context of emerging many core systems is explored and a novel compiler algorithm that partitions the computations in a given loop nest into sub computations and schedules the resulting sub computation on different cores with the goal of reducing the distance-to-data on the on-chip network is proposed.

...read moreread less

Proceedings ArticleDOI

Opportunistic computing in GPU architectures

Ashutosh Pattnaik, +7 more

TL;DR: This paper develops two offloading techniques, called LLC-Compute and Omni- Compute, which employ simple bookkeeping hardware to enable GPU cores to compute instructions offloaded by other GPU cores.

...read moreread less

Collapse