A
Amir Kavyan Ziabari
Researcher at Northeastern University
Publications - 16
Citations - 380
Amir Kavyan Ziabari is an academic researcher from Northeastern University. The author has contributed to research in topics: Memory hierarchy & Network on a chip. The author has an hindex of 10, co-authored 16 publications receiving 322 citations. Previous affiliations of Amir Kavyan Ziabari include Advanced Micro Devices.
Papers
More filters
Proceedings ArticleDOI
Hetero-mark, a benchmark suite for CPU-GPU collaborative computing
Yifan Sun,Xiang Gong,Amir Kavyan Ziabari,Leiming Yu,Xiangyu Li,Saoni Mukherjee,Carter McCardwell,Alejandro Villegas,David Kaeli +8 more
TL;DR: The Hetero-Mark is proposed to help heterogeneous system programmers understand CPU-GPU collaborative computing and to provide guidance to computer architects in order to enhance the design of the runtime and the driver.
Proceedings ArticleDOI
MGPUSim: enabling multi-GPU performance modeling and optimization
Yifan Sun,Trinayan Baruah,Saiful A. Mojumder,Shi Dong,Xiang Gong,Shane Treadway,Yuhui Bao,Spencer Hance,Carter McCardwell,Vincent Zhao,Harrison Barclay,Amir Kavyan Ziabari,Zhongliang Chen,Rafael Ubal,José L. Abellán,John Kim,Ajay Joshi,David Kaeli +17 more
TL;DR: This work presents MGPUSim, a cycle-accurate, extensively validated, multi-GPU simulator, based on AMD's Graphics Core Next 3 (GCN3) instruction set architecture, and proposes the Locality API, an API extension that allows the GPU programmer to both avoid the complexity of multi- GPU programming, while precisely controlling data placement in the multi- GPUs memory.
Proceedings ArticleDOI
Asymmetric NoC Architectures for GPU Systems
TL;DR: An asymmetric NoC design tailored for a GPU's memory access pattern is explored, providing one network for L1-to-L2 communication and a second for L2- to-L1 traffic, showing that an asymmetric multi-network Cmesh provides the most energy-efficient communication fabric for the target GPU system.
Proceedings ArticleDOI
A comprehensive performance analysis of HSA and OpenCL 2.0
TL;DR: This paper provides the first comprehensive study of OpenCL 2.0 and HSA 1.0 execution, considering OpenCL 1.2 as the baseline, and finds that by using HSA signals, it can remove 92% of the overhead due to synchronous kernel launches.
Proceedings ArticleDOI
Profiling DNN Workloads on a Volta-based DGX-1 System
Saiful A. Mojumder,Marcia S Louis,Yifan Sun,Amir Kavyan Ziabari,José L. Abellán,John Kim,David Kaeli,Ajay Joshi +7 more
TL;DR: This work profile and analyze the training of five popular DNNs using 1, 2, 4 and 8 GPUs, and shows the breakdown of the training time across the FP+ BP stage and the WU stage to provide insights about the limiting factors of theTraining algorithm as well as to identify the bottlenecks in the multi-GPU system architecture.