scispace - formally typeset
B

Bharat Kaul

Researcher at Intel

Publications -  46
Citations -  1881

Bharat Kaul is an academic researcher from Intel. The author has contributed to research in topics: Deep learning & Floating point. The author has an hindex of 17, co-authored 43 publications receiving 1351 citations.

Papers
More filters
Proceedings ArticleDOI

SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training

TL;DR: SIGMA is proposed, a flexible and scalable architecture that offers high utilization of all its processing elements (PEs) regardless of kernel shape and sparsity, and includes a novel reduction tree microarchitecture named Forwarding Adder Network (FAN).
Proceedings ArticleDOI

ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks

TL;DR: SCALEDEEP is a dense, scalable server architecture, whose processing, memory and interconnect subsystems are specialized to leverage the compute and communication characteristics of DNNs, and primarily targets DNN training, as opposed to only inference or evaluation.
Book ChapterDOI

Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-Out Classifiers

TL;DR: The authors proposed an ensemble of classifiers to detect out-of-distribution (OOD) inputs using a margin-based loss over the softmax output which seeks to maintain at least a margin m between the average entropy of the OOD and in-dist distribution samples in conjunction with the standard cross-entropy loss.
Posted Content

A Study of BFLOAT16 for Deep Learning Training

TL;DR: The results show that deep learning training using BFLOAT16 tensors achieves the same state-of-the-art (SOTA) results across domains as FP32 tensors in the same number of iterations and with no changes to hyper-parameters.
Posted Content

Distributed Deep Learning Using Synchronous Stochastic Gradient Descent

TL;DR: A distributed multinode synchronous SGD algorithm is designed and implemented, without altering hyper parameters, or compressing data, or altering algorithmic behavior, and the generality of this approach is demonstrated via best-in-class 6.5X scaling for a 7-layer DNN on 16 nodes.