N
Nianzheng Cao
Researcher at IBM
Publications - 10
Citations - 326
Nianzheng Cao is an academic researcher from IBM. The author has contributed to research in topics: Low-power electronics & Dataflow architecture. The author has an hindex of 5, co-authored 10 publications receiving 142 citations.
Papers
More filters
Proceedings ArticleDOI
A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference
Bruce M. Fleischer,Sunil Shukla,Matthew M. Ziegler,Joel Abraham Silberman,Jinwook Oh,Vijavalakshmi Srinivasan,Jungwook Choi,Silvia Melitta Mueller,Ankur Agrawal,Tina Babinsky,Nianzheng Cao,Chia-Yu Chen,Pierce Chuang,Thomas W. Fox,George D. Gristede,Michael A. Guillorn,Howard M. Haynie,Michael J. Klaiber,Dongsoo Lee,Shih-Hsien Lo,Gary W. Maier,Michael R. Scheuermann,Swagath Venkataramani,Christos Vezyrtzis,Naigang Wang,Fanchieh Yee,Ching Zhou,Pong-Fei Lu,Brian W. Curran,Lel Chang,Kailash Gopalakrishnan +30 more
TL;DR: A multi-TOPS AI core is presented for acceleration of deep learning training and inference in systems from edge devices to data centers by employing a dataflow architecture and an on-chip scratchpad hierarchy.
Proceedings ArticleDOI
A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling
Ankur Agrawal,Sae Kyu Lee,Joel Abraham Silberman,Matthew M. Ziegler,Mingu Kang,Swagath Venkataramani,Nianzheng Cao,Bruce M. Fleischer,Michael A. Guillorn,Matthew Cohen,Silvia Melitta Mueller,Jinwook Oh,Martin Lutz,Jinwook Jung,Siyu Koswatta,Ching Zhou,Vidhi Zalani,James J. Bonanno,Robert Casatuta,Chia-Yu Chen,Jungwook Choi,Howard M. Haynie,Alyssa Herbert,Radhika Jain,Monodeep Kar,Kyu-hyoun Kim,Li Yulong,Zhibin Ren,Scot H. Rider,Marcel Schaal,Kerstin Schelm,Michael R. Scheuermann,Xiao Sun,Hung Tran,Naigang Wang,Wei Wang,Xin Zhang,Vinay Velji Shah,Brian W. Curran,Vijayalakshmi Srinivasan,Pong-Fei Lu,Sunil Shukla,Leland Chang,Kailash Gopalakrishnan +43 more
TL;DR: In this article, a 4-core AI chip in 7nm EUV technology is presented to exploit cutting-edge algorithmic advances for iso-accurate models in low-precision training and inference to achieve leading-edge power-performance.
Proceedings ArticleDOI
RaPiD: AI accelerator for ultra-low precision training and inference
Swagath Venkataramani,Vijayalakshmi Srinivasan,Wei Wang,Sanchari Sen,Jintao Zhang,Ankur Agrawal,Monodeep Kar,Shubham Jain,Alberto Mannari,Hoang Tran,Li Yulong,Eri Ogawa,Kazuaki Ishizaki,Hiroshi Inoue,Marcel Schaal,Mauricio J. Serrano,Jungwook Choi,Xiao Sun,Naigang Wang,Chia-Yu Chen,Allison Allain,James Bonano,Nianzheng Cao,Robert Casatuta,Matthew Cohen,Bruce M. Fleischer,Michael A. Guillorn,Howard M. Haynie,Jinwook Jung,Mingu Kang,Kyu-hyoun Kim,Siyu Koswatta,Sae Kyu Lee,Martin Lutz,Silvia Melitta Mueller,Jinwook Oh,Ashish Ranjan,Zhibin Ren,Scot H. Rider,Kerstin Schelm,Michael R. Scheuermann,Joel Abraham Silberman,Jie Yang,Vidhi Zalani,Xin Zhang,Ching Zhou,Matt Ziegler,Vinay Velji Shah,Moriyoshi Ohara,Pong-Fei Lu,Brian W. Curran,Sunil Shukla,Leland Chang,Kailash Gopalakrishnan +53 more
TL;DR: RaPiD1 as mentioned in this paper is a 4-core AI accelerator chip supporting a spectrum of precisions, namely, 16 and 8-bit floating-point and 4 and 2-bit fixed-point.
Proceedings ArticleDOI
A 45nm SOI embedded DRAM macro for POWER7TM 32MB on-chip L3 cache
John E. Barth,Don Plass,Erik A. Nelson,Charlie Hwang,Gregory J. Fredeman,Michael A. Sperling,Abraham Mathews,William Robert Reohr,Kavita Nair,Nianzheng Cao +9 more
TL;DR: This high performance DRAM macro is used to construct a large 32MB L3 cache on-chip, eliminating delay, area and power from the off-chip interface, simultaneously improving system performance, reducing cost, power and soft error vulnerability.
Journal ArticleDOI
Efficient AI System Design With Cross-Layer Approximate Computing
Swagath Venkataramani,Xiao Sun,Naigang Wang,Chia-Yu Chen,Jungwook Choi,Mingu Kang,Ankur Agarwal,Jinwook Oh,Shubham Jain,Tina Babinsky,Nianzheng Cao,Thomas W. Fox,Bruce M. Fleischer,George D. Gristede,Michael A. Guillorn,Howard M. Haynie,Hiroshi Inoue,Kazuaki Ishizaki,Michael J. Klaiber,Shih-Hsien Lo,Gary W. Maier,Silvia Melitta Mueller,Michael R. Scheuermann,Eri Ogawa,Marcel Schaal,Mauricio J. Serrano,Joel Abraham Silberman,Christos Vezyrtzis,Wei Wang,Fanchieh Yee,Jintao Zhang,Matthew M. Ziegler,Ching Zhou,Moriyoshi Ohara,Pong-Fei Lu,Brian W. Curran,Sunil Shukla,Vijayalakshmi Srinivasan,Leland Chang,Kailash Gopalakrishnan +39 more
TL;DR: RaPiD, a multi-tera operations per second (TOPS) AI hardware accelerator core that is built from the ground-up using AxC techniques across the stack including algorithms, architecture, programmability, and hardware, is presented.