T
Tarek S. Abdelrahman
Researcher at University of Toronto
Publications - 88
Citations - 1680
Tarek S. Abdelrahman is an academic researcher from University of Toronto. The author has contributed to research in topics: Shared memory & Compiler. The author has an hindex of 18, co-authored 86 publications receiving 1609 citations. Previous affiliations of Tarek S. Abdelrahman include University of Iowa & University of Michigan.
Papers
More filters
Journal ArticleDOI
hiCUDA: High-Level GPGPU Programming
TL;DR: The hiCUDA}, a high-level directive-based language for CUDA programming is designed, which allows programmers to perform tedious tasks in a simpler manner and directly to the sequential code, thus speeding up the porting process.
Proceedings ArticleDOI
Reducing branch divergence in GPU programs
TL;DR: This work proposes two novel software-based optimizations, called iteration delaying and branch distribution that aim to reduce branch divergence, and shows that they improve the performance of the synthetic benchmarks and that of the real-world application by 12% and 16% respectively.
Proceedings ArticleDOI
hiCUDA: a high-level directive-based language for GPU programming
TL;DR: HiCUDA as mentioned in this paper is a high-level directive-based language for CUDA programming, which allows programmers to perform data transfer between the host memory and various components of the GPU memory.
Journal ArticleDOI
Fusion of loops for parallelism and locality
TL;DR: In this article, the authors present new techniques to allow fusion of loop nests in the presence of fusion-preventing dependences, maintain parallelism and allow the parallel execution of fused loops with minimal synchronization.
Proceedings ArticleDOI
The NUMAchine multiprocessor
R. Grindley,Tarek S. Abdelrahman,Stephen J. Brown,S. Caranci,D. DeVries,B. Gamsa,A. Grbic,Mitch Gusat,R. Ho,Orran Krieger,Guy G.F. Lemieux,Kelvin Loveless,N. Manjikian,P. McHardy,Sinisa Srbljic,Michael Stumm,Zvonko G. Vranesic,Zeljko Zilic +17 more
TL;DR: The design choices and the resulting performance of the NUMAchine multiprocessor system are documents using both simulation results and measurements on the prototype hardware.