Kalin Ovtcharov

Proceedings ArticleDOI

A cloud-scale acceleration architecture

TL;DR: A new cloud architecture that uses reconfigurable logic to accelerate both network plane functions and applications, and is much more scalable than prior work which used secondary rack-scale networks for inter-FPGA communication.

...read moreread less

Proceedings ArticleDOI

A configurable cloud-scale DNN processor for real-time AI

Jeremy Fowers, +19 more

TL;DR: This paper describes the NPU architecture for Project Brainwave, a production-scale system for real-time AI, and achieves more than an order of magnitude improvement in latency and throughput over state-of-the-art GPUs on large RNNs at a batch size of 1.5 teraflops.

...read moreread less

Accelerating Deep Convolutional Neural Networks Using Specialized Hardware

Kalin Ovtcharov, +5 more

TL;DR: Hardware specialization in the form of GPGPUs, FPGAs, and ASICs offers a promising path towards major leaps in processing capability while achieving high energy efficiency, and combining multiple FPGA over a low-latency communication fabric offers further opportunity to train and evaluate models of unprecedented size and quality.

...read moreread less

Proceedings Article

Azure accelerated networking: SmartNICs in the public cloud

Daniel Firestone, +31 more

TL;DR: The design of AccelNet is presented, including the hardware/software codesign model, performance results on key workloads, and experiences and lessons learned from developing and deploying Accel net on FPGA-based Azure SmartNICs.

...read moreread less

Journal ArticleDOI

Serving DNNs in Real Time at Datacenter Scale with Project Brainwave

Eric S. Chung, +42 more

- 20 Apr 2018 -

IEEE Micro

TL;DR: Project Brainwave, Microsofts principal infrastructure for AI serving in real time, accelerates deep neural network inferencing in major services such as Bings intelligent search features and Azure by exploiting distributed model parallelism and pinning over low-latency hardware microservices.

...read moreread less

Papers

A cloud-scale acceleration architecture

A configurable cloud-scale DNN processor for real-time AI

Accelerating Deep Convolutional Neural Networks Using Specialized Hardware

Azure accelerated networking: SmartNICs in the public cloud

Serving DNNs in Real Time at Datacenter Scale with Project Brainwave