Bharat Kaul

Proceedings ArticleDOI

SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training

TL;DR: SIGMA is proposed, a flexible and scalable architecture that offers high utilization of all its processing elements (PEs) regardless of kernel shape and sparsity, and includes a novel reduction tree microarchitecture named Forwarding Adder Network (FAN).

...read moreread less

Proceedings ArticleDOI

ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks

Swagath Venkataramani, +10 more

TL;DR: SCALEDEEP is a dense, scalable server architecture, whose processing, memory and interconnect subsystems are specialized to leverage the compute and communication characteristics of DNNs, and primarily targets DNN training, as opposed to only inference or evaluation.

...read moreread less

Book ChapterDOI

Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-Out Classifiers

Apoorv Vyas, +5 more

TL;DR: The authors proposed an ensemble of classifiers to detect out-of-distribution (OOD) inputs using a margin-based loss over the softmax output which seeks to maintain at least a margin m between the average entropy of the OOD and in-dist distribution samples in conjunction with the standard cross-entropy loss.

...read moreread less

Posted Content

A Study of BFLOAT16 for Deep Learning Training

Dhiraj D. Kalamkar, +18 more

- 29 May 2019 -

arXiv: Learning

TL;DR: The results show that deep learning training using BFLOAT16 tensors achieves the same state-of-the-art (SOTA) results across domains as FP32 tensors in the same number of iterations and with no changes to hyper-parameters.

...read moreread less

Posted Content

Distributed Deep Learning Using Synchronous Stochastic Gradient Descent

Dipankar Das, +7 more

- 22 Feb 2016 -

arXiv: Distributed, Parallel, and Cluste...

TL;DR: A distributed multinode synchronous SGD algorithm is designed and implemented, without altering hyper parameters, or compressing data, or altering algorithmic behavior, and the generality of this approach is demonstrated via best-in-class 6.5X scaling for a 7-layer DNN on 16 nodes.

...read moreread less

Papers

SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training

ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks

Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-Out Classifiers

A Study of BFLOAT16 for Deep Learning Training

Distributed Deep Learning Using Synchronous Stochastic Gradient Descent