In-Datacenter Performance Analysis of a Tensor Processing Unit

doi:10.1145/3079856.3080246

Open AccessProceedings ArticleDOI

In-Datacenter Performance Analysis of a Tensor Processing Unit

- Vol. 45, Iss: 2, pp 1-12

TLDR

The Tensor Processing Unit (TPU) as discussed by the authors is a custom ASIC deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN) using a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS).

Abstract:

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU) --- deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters' NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X -- 30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X -- 80X higher. Moreover, using the CPU's GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.

In-Datacenter Performance Analysis of a Tensor Processing Unit

Citations

Mastering the game of Go without human knowledge

Searching for MobileNetV3

In-memory computing with resistive switching devices

Deep Learning in Mobile and Wireless Networking: A Survey

Deep Learning for IoT Big Data and Streaming Analytics: A Survey

References

ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory

Deep Learning with Limited Numerical Precision

Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks

A reconfigurable fabric for accelerating large-scale datacenter services

Related Papers (5)

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Going deeper with convolutions

Trending Questions (1)