B
Bharat Kaul
Researcher at Intel
Publications - 46
Citations - 1881
Bharat Kaul is an academic researcher from Intel. The author has contributed to research in topics: Deep learning & Floating point. The author has an hindex of 17, co-authored 43 publications receiving 1351 citations.
Papers
More filters
Proceedings ArticleDOI
SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training
Eric Qin,Ananda Samajdar,Hyoukjun Kwon,Vineet Nadella,Sudarshan Srinivasan,Dipankar Das,Bharat Kaul,Tushar Krishna +7 more
TL;DR: SIGMA is proposed, a flexible and scalable architecture that offers high utilization of all its processing elements (PEs) regardless of kernel shape and sparsity, and includes a novel reduction tree microarchitecture named Forwarding Adder Network (FAN).
Proceedings ArticleDOI
ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks
Swagath Venkataramani,Ashish Ranjan,Subarno Banerjee,Dipankar Das,Sasikanth Avancha,Ashok Jagannathan,Ajaya V. Durg,Dheemanth Nagaraj,Bharat Kaul,Pradeep Dubey,Anand Raghunathan +10 more
TL;DR: SCALEDEEP is a dense, scalable server architecture, whose processing, memory and interconnect subsystems are specialized to leverage the compute and communication characteristics of DNNs, and primarily targets DNN training, as opposed to only inference or evaluation.
Book ChapterDOI
Out-of-Distribution Detection Using an Ensemble of Self Supervised Leave-Out Classifiers
TL;DR: The authors proposed an ensemble of classifiers to detect out-of-distribution (OOD) inputs using a margin-based loss over the softmax output which seeks to maintain at least a margin m between the average entropy of the OOD and in-dist distribution samples in conjunction with the standard cross-entropy loss.
Posted Content
A Study of BFLOAT16 for Deep Learning Training
Dhiraj D. Kalamkar,Dheevatsa Mudigere,Naveen Mellempudi,Dipankar Das,Kunal Banerjee,Sasikanth Avancha,Dharma Teja Vooturi,Nataraj Jammalamadaka,Jianyu Huang,Hector Yuen,Jiyan Yang,Jongsoo Park,Alexander Heinecke,Evangelos Georganas,Sudarshan Srinivasan,Abhisek Kundu,Misha Smelyanskiy,Bharat Kaul,Pradeep Dubey +18 more
TL;DR: The results show that deep learning training using BFLOAT16 tensors achieves the same state-of-the-art (SOTA) results across domains as FP32 tensors in the same number of iterations and with no changes to hyper-parameters.
Posted Content
Distributed Deep Learning Using Synchronous Stochastic Gradient Descent
Dipankar Das,Sasikanth Avancha,Dheevatsa Mudigere,Karthikeyan Vaidynathan,Srinivas Sridharan,Dhiraj D. Kalamkar,Bharat Kaul,Pradeep Dubey +7 more
TL;DR: A distributed multinode synchronous SGD algorithm is designed and implemented, without altering hyper parameters, or compressing data, or altering algorithmic behavior, and the generality of this approach is demonstrated via best-in-class 6.5X scaling for a 7-layer DNN on 16 nodes.