scispace - formally typeset
T

Trinayan Baruah

Researcher at Northeastern University

Publications -  13
Citations -  186

Trinayan Baruah is an academic researcher from Northeastern University. The author has contributed to research in topics: Memory hierarchy & Instruction set. The author has an hindex of 5, co-authored 10 publications receiving 113 citations.

Papers
More filters
Proceedings ArticleDOI

MGPUSim: enabling multi-GPU performance modeling and optimization

TL;DR: This work presents MGPUSim, a cycle-accurate, extensively validated, multi-GPU simulator, based on AMD's Graphics Core Next 3 (GCN3) instruction set architecture, and proposes the Locality API, an API extension that allows the GPU programmer to both avoid the complexity of multi- GPU programming, while precisely controlling data placement in the multi- GPUs memory.
Proceedings ArticleDOI

Evaluating Performance Tradeoffs on the Radeon Open Compute Platform

TL;DR: This work compares the performance of different programming frameworks, including OpenCL, HC++, and HIP on both integrated APUs and discrete GPUs, using the Hetero-Mark and DNNMark benchmark suites and provides guidance on best practices to programmers when developing applications leveraging the ROC platform.
Proceedings ArticleDOI

GNNMark: A Benchmark Suite to Characterize Graph Neural Network Training on GPUs

TL;DR: GNNMark is presented, a feature-rich benchmark suite that covers the diversity present in GNN training workloads, datasets, and GNN frameworks that utilizes a variety of different graph-based data structures, including homogeneous graphs, dynamic graphs, and heterogeneous graphs commonly used in a number of application domains.
Proceedings ArticleDOI

Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems

TL;DR: Griffin introduces programmer-transparent modifications to both the IOMMU and GPU architecture, supporting efficient runtime page migration based on locality information, and employs a novel mechanism to detect and move pages at runtime between GPUs, increasing the frequency of resolving accesses locally, which in turn improves the performance.
Proceedings ArticleDOI

Characterizing the Microarchitectural Implications of a Convolutional Neural Network (CNN) Execution on GPUs

TL;DR: This paper analyzes the performance implications of a CNN model using microarchitectural details on a layer-by-layer basis, and characterize the memory access behavior in the context of a typical GPU memory hierarchy, considering hardware resource utilization associated with each primitive in the CNN model.