S
Sunil Shukla
Researcher at IBM
Publications - 43
Citations - 913
Sunil Shukla is an academic researcher from IBM. The author has contributed to research in topics: Field-programmable gate array & Reconfigurable computing. The author has an hindex of 12, co-authored 41 publications receiving 671 citations. Previous affiliations of Sunil Shukla include University of Queensland & Karlsruhe Institute of Technology.
Papers
More filters
Journal ArticleDOI
A Scalable Multi-TeraOPS Core for AI Training and Inference
Sunil Shukla,Bruce M. Fleischer,Matthew M. Ziegler,Joel Abraham Silberman,Jinwook Oh,Vijayalakshmi Srinivasan,Jungwook Choi,Silvia Melitta Mueller,Ankur Agrawal,Tina Babinsky,Nianzheng Cao,Chia-Yu Chen,Pierce Chuang,Thomas W. Fox,George D. Gristede,Michael A. Guillorn,Howard M. Haynie,Michael J. Klaiber,Dongsoo Lee,Shih-Hsien Lo,Gary W. Maier,Michael R. Scheuermann,Swagath Venkataramani,Christos Vezyrtzis,Naigang Wang,Fanchieh Yee,Ching Zhou,Pong-Fei Lu,Brian W. Curran,Leland Chang,Kailash Gopalakrishnan +30 more
TL;DR: This letter presents a multi-TOPS AI accelerator core for deep learning training and inference that achieves >90% sustained utilization across the range of neural network topologies by employing a dataflow architecture to provide high throughput and an on-chip scratchpad hierarchy to meet the bandwidth demands of the compute units.
Proceedings ArticleDOI
QUKU: A FPGA Based Flexible Coarse Grain Architecture Design Paradigm using Process Networks
TL;DR: A modification of Kahn process network is used to solve the problem of finding an optimum architectural template for coarse grain array on per application basis by applying the model at architectural level in QUKU.
Proceedings ArticleDOI
A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference
Jinwook Oh,Sae Kyu Lee,Mingu Kang,Matthew M. Ziegler,Joel Abraham Silberman,Ankur Agrawal,Swagath Venkataramani,Bruce M. Fleischer,Michael A. Guillorn,Jungwook Choi,Wei Wang,Silvia Melitta Mueller,Shimon Ben-Yehuda,James J. Bonanno,Nianzheng Cao,Robert Casatuta,Chia-Yu Chen,Matthew Cohen,Erez Ophir,Thomas W. Fox,George D. Gristede,Howard M. Haynie,Vicktoria Ivanov,Siyu Koswatta,Shih-Hsien Lo,Martin Lutz,Gary W. Maier,Alex Mesh,Yevgeny Nustov,Scot H. Rider,Marcel Schaal,Michael R. Scheuermann,Xiao Sun,Naigang Wang,Fanchieh Yee,Ching Zhou,Vinay Velji Shah,Brian W. Curran,Vijayalakshmi Srinivasan,Pong-Fei Lu,Sunil Shukla,Kailash Gopalakrishnan,Leland Chang +42 more
TL;DR: A processor core is presented for AI training and inference products that achieves leading-edge compute efficiency for robust fp16 training via efficient heterogeneous 2-D systolic array-SIMD compute engines leveraging compact DLFloat16 FPUs.
Journal ArticleDOI
QUKU: A dual-layer reconfigurable architecture
TL;DR: The experimental results demonstrate that a dual layered reconfigurable architecture provides significant potential benefits in terms of flexibility, area and processing efficiency over existing reconfigured computing architectures for DSP.
Patent
Tightly coupled processor arrays using coarse grained reconfigurable architecture with iteration level commits
TL;DR: In this paper, an apparatus and method for supporting simultaneous multiple iterations (SMI) in a course-grained reconfigurable architecture (CGRA) is presented, which includes hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight, and a control unit including hardware structures to maintain synchronization and initiate and terminate loops within the PEs.