Nianzheng Cao

Researcher at IBM

Publications - 10

Citations - 326

Nianzheng Cao is an academic researcher from IBM. The author has contributed to research in topics: Low-power electronics & Dataflow architecture. The author has an hindex of 5, co-authored 10 publications receiving 142 citations.

Papers

PDF

Open Access

More filters

Proceedings ArticleDOI

A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference

Bruce M. Fleischer, +30 more

TL;DR: A multi-TOPS AI core is presented for acceleration of deep learning training and inference in systems from edge devices to data centers by employing a dataflow architecture and an on-chip scratchpad hierarchy.

...read moreread less

Proceedings ArticleDOI

A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling

Ankur Agrawal, +43 more

TL;DR: In this article, a 4-core AI chip in 7nm EUV technology is presented to exploit cutting-edge algorithmic advances for iso-accurate models in low-precision training and inference to achieve leading-edge power-performance.

...read moreread less

Proceedings ArticleDOI

RaPiD: AI accelerator for ultra-low precision training and inference

Swagath Venkataramani, +53 more

TL;DR: RaPiD1 as mentioned in this paper is a 4-core AI accelerator chip supporting a spectrum of precisions, namely, 16 and 8-bit floating-point and 4 and 2-bit fixed-point.

...read moreread less

Proceedings ArticleDOI

A 45nm SOI embedded DRAM macro for POWER7TM 32MB on-chip L3 cache

John E. Barth, +9 more

TL;DR: This high performance DRAM macro is used to construct a large 32MB L3 cache on-chip, eliminating delay, area and power from the off-chip interface, simultaneously improving system performance, reducing cost, power and soft error vulnerability.

...read moreread less

Journal ArticleDOI

Efficient AI System Design With Cross-Layer Approximate Computing

Swagath Venkataramani, +39 more

TL;DR: RaPiD, a multi-tera operations per second (TOPS) AI hardware accelerator core that is built from the ground-up using AxC techniques across the stack including algorithms, architecture, programmability, and hardware, is presented.

...read moreread less