scispace - formally typeset
R

Rangharajan Venkatesan

Researcher at Nvidia

Publications -  54
Citations -  3138

Rangharajan Venkatesan is an academic researcher from Nvidia. The author has contributed to research in topics: Cache & Efficient energy use. The author has an hindex of 20, co-authored 50 publications receiving 2127 citations. Previous affiliations of Rangharajan Venkatesan include Purdue University.

Papers
More filters
Proceedings ArticleDOI

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

TL;DR: The Sparse CNN (SCNN) accelerator as discussed by the authors employs a dataflow that enables maintaining the sparse weights and activations in a compressed encoding, which eliminates unnecessary data transfers and reduces storage requirements.
Proceedings ArticleDOI

Timeloop: A Systematic Approach to DNN Accelerator Evaluation

TL;DR: Timeloop's underlying models and algorithms are described in detail and results from case studies enabled by Timeloop are shown, which reveal that dataflow and memory hierarchy co-design plays a critical role in optimizing energy efficiency.
Proceedings ArticleDOI

Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture

TL;DR: This work investigates and quantifies the costs and benefits of using MCMs with fine-grained chiplets for deep learning inference, an application area with large compute and on-chip storage requirements, and introduces three tiling optimizations that improve data locality.
Proceedings ArticleDOI

MACACO: modeling and analysis of circuits for approximate computing

TL;DR: The results show that MACACO can help a designer to systematically evaluate the impact of approximate circuits, and to choose between different approximate implementations, thereby facilitating the adoption of such circuits for approximate computing.
Posted Content

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

TL;DR: The Sparse CNN (SCNN) accelerator architecture is introduced, which improves performance and energy efficiency by exploiting thezero-valued weights that stem from network pruning during training and zero-valued activations that arise from the common ReLU operator.