scispace - formally typeset
H

Hardik Sharma

Researcher at Georgia Institute of Technology

Publications -  19
Citations -  1325

Hardik Sharma is an academic researcher from Georgia Institute of Technology. The author has contributed to research in topics: Speedup & Microarchitecture. The author has an hindex of 9, co-authored 15 publications receiving 962 citations.

Papers
More filters
Proceedings ArticleDOI

Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks

TL;DR: This work designs Bit Fusion, a bit-flexible accelerator that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers, and compares it to two state-of-the-art DNN accelerators, Eyeriss and Stripes.
Proceedings ArticleDOI

From high-level deep neural models to FPGAs

TL;DR: DnnWeaver is devised, a framework that automatically generates a synthesizable accelerator for a given DNN, FPGA pair from a high-level specification in Caffe that best matches the needs of the DNN while providing high performance and efficiency gains for the target FPGAs.
Proceedings ArticleDOI

TABLA: A unified template-based framework for accelerating statistical machine learning

TL;DR: TABLA provides a template-based framework that generates accelerators for a class of machine learning algorithms and rigorously compares the benefits of FPGA acceleration to multi-core CPUs and many-core GPUs using real hardware measurements.
Proceedings ArticleDOI

Neural acceleration for GPU throughput processors

TL;DR: This paper introduces a low overhead neurally accelerated architecture for GPUs, called NGPU, that enables scalable integration of neural accelerators for large number of GPU cores and devises a mechanism that controls the tradeoff between the quality of results and the benefits from neural acceleration.
Proceedings ArticleDOI

Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks

TL;DR: This paper defines Planaria1, a microarchitectural capability that can dynamically fission (break) into multiple smaller yet full-fledged DNN engines at runtime that enables spatially co-locating multiple DNN inference services on the same hardware, offering simultaneous multi-tenant DNN acceleration.