TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory

doi:10.1145/3037697.3037702

Open AccessProceedings ArticleDOI

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory

Mingyu Gao, +4 more

- Vol. 45, Iss: 1, pp 751-764

Chats0

TLDR

The hardware architecture and software scheduling and partitioning techniques for TETRIS, a scalable NN accelerator using 3D memory, are presented and it is shown that despite the use of small SRAM buffers, the presence of3D memory simplifies dataflow scheduling for NN computations.

Abstract:

The high accuracy of deep neural networks (NNs) has led to the development of NN accelerators that improve performance by two orders of magnitude. However, scaling these accelerators for higher performance with increasingly larger NNs exacerbates the cost and energy overheads of their memory systems, including the on-chip SRAM buffers and the off-chip DRAM channels.This paper presents the hardware architecture and software scheduling and partitioning techniques for TETRIS, a scalable NN accelerator using 3D memory. First, we show that the high throughput and low energy characteristics of 3D memory allow us to rebalance the NN accelerator design, using more area for processing elements and less area for SRAM buffers. Second, we move portions of the NN computations close to the DRAM banks to decrease bandwidth pressure and increase performance and energy efficiency. Third, we show that despite the use of small SRAM buffers, the presence of 3D memory simplifies dataflow scheduling for NN computations. We present an analytical scheduling scheme that matches the efficiency of schedules derived through exhaustive search. Finally, we develop a hybrid partitioning scheme that parallelizes the NN computations over multiple accelerators. Overall, we show that TETRIS improves mthe performance by 4.1x and reduces the energy by 1.5x over NN accelerators with conventional, low-power DRAM memory systems.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

Vivienne Sze, +3 more

TL;DR: In this paper, the authors provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs, and discuss various hardware platforms and architectures that support DNN, and highlight key trends in reducing the computation cost of deep neural networks either solely via hardware design changes or via joint hardware and DNN algorithm changes.

...read moreread less

Proceedings ArticleDOI

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

Angshuman Parashar, +8 more

TL;DR: The Sparse CNN (SCNN) accelerator as discussed by the authors employs a dataflow that enables maintaining the sparse weights and activations in a compressed encoding, which eliminates unnecessary data transfers and reduces storage requirements.

...read moreread less

Proceedings ArticleDOI

Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks

Hardik Sharma, +6 more

TL;DR: This work designs Bit Fusion, a bit-flexible accelerator that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers, and compares it to two state-of-the-art DNN accelerators, Eyeriss and Stripes.

...read moreread less

Journal ArticleDOI

Scaling for edge inference of deep neural networks

Xiaowei Xu, +6 more

TL;DR: There are increasing gaps between the computational complexity and energy efficiency required for the continued scaling of deep neural networks and the hardware capacity actually available with current CMOS technology scaling, in situations where edge inference is required.

...read moreread less

Proceedings ArticleDOI

Timeloop: A Systematic Approach to DNN Accelerator Evaluation

Angshuman Parashar, +9 more

TL;DR: Timeloop's underlying models and algorithms are described in detail and results from case studies enabled by Timeloop are shown, which reveal that dataflow and memory hierarchy co-design plays a critical role in optimizing energy efficiency.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Journal ArticleDOI

Deep learning

Yann LeCun, +4 more

- 28 May 2015 -

Nature

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.

...read moreread less

Collapse

IEEE Journal of Solid-state Circuits

EIE: efficient inference engine on compressed deep neural network

Song Han, +6 more

DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning

Tianshi Chen, +6 more

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory

Citations

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks

Scaling for edge inference of deep neural networks

Timeloop: A Systematic Approach to DNN Accelerator Evaluation

References

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Deep learning

Related Papers (5)

In-Datacenter Performance Analysis of a Tensor Processing Unit

Deep Residual Learning for Image Recognition

Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

EIE: efficient inference engine on compressed deep neural network

DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning