Low precision arithmetic for deep learning.

Open AccessProceedings Article

Low precision arithmetic for deep learning.

TLDR

Goodfellow et al. as discussed by the authors simulate the training of a set of state-of-the-art neural networks, the Maxout networks, on three benchmark datasets: the MNIST, CIFAR10 and SVHN, with three distinct arithmetics: floating point, fixed point and dynamic fixed point.

Abstract:

We simulate the training of a set of state of the art neural networks, the Maxout networks (Goodfellow et al., 2013a), on three benchmark datasets: the MNIST, CIFAR10 and SVHN, with three distinct arithmetics: floating point, fixed point and dynamic fixed point. For each of those datasets and for each of those arithmetics, we assess the impact of the precision of the computations on the final error of the training. We find that very low precision computation is sufficient not just for running trained networks but also for training them. For example, almost state-of-the-art results were obtained on most datasets with 10 bits for computing activations and gradients, and 12 bits for storing updated parameters.

Citations

PDF

Open Access

More filters

Proceedings Article

Deep Learning with Limited Numerical Precision

Suyog Gupta, +3 more

TL;DR: In this article, the effect of limited precision data representation and computation on neural network training was studied, and it was shown that deep networks can be trained using only 16-bit wide fixed point number representation when using stochastic rounding, and incur little to no degradation in the classification accuracy.

...read moreread less

Proceedings ArticleDOI

Fast Algorithms for Convolutional Neural Networks

Andrew J. Lavin, +1 more

TL;DR: A new class of fast algorithms for convolutional neural networks is introduced using Winograd's minimal filtering algorithms, which compute minimal complexity convolution over small tiles, which makes them fast with small filters and small batch sizes.

...read moreread less

Proceedings Article

Gradient-based Hyperparameter Optimization through Reversible Learning

Dougal Maclaurin, +2 more

TL;DR: In this article, the authors compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire training procedure, allowing them to optimize thousands of hyperparameter, including step-size and momentum schedules, weight initialization distributions, richly parameterized regularization schemes, and neural network architectures.

...read moreread less

Proceedings Article

Compressing Neural Networks with the Hashing Trick

Wenlin Chen, +5 more

TL;DR: HashedNets as discussed by the authors uses a hash function to randomly group connection weights into hash buckets, and all connections within the same hash bucket share a single parameter value, which can be tuned to adjust to the weight sharing architecture with standard backprop during training.

...read moreread less

Proceedings ArticleDOI

Stripes: bit-serial deep neural network computing

Patrick Judd, +4 more

TL;DR: This work presents STR, a hardware accelerator that uses bit-serial computations to improve energy efficiency and performance and its area and power overhead are estimated at 5 percent and 12 percent respectively.

...read moreread less

Collapse

arXiv: Machine Learning

Low precision arithmetic for deep learning.

Citations

Deep Learning with Limited Numerical Precision

Fast Algorithms for Convolutional Neural Networks

Gradient-based Hyperparameter Optimization through Reversible Learning

Compressing Neural Networks with the Hashing Trick

Stripes: bit-serial deep neural network computing

Related Papers (5)

ImageNet Classification with Deep Convolutional Neural Networks

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Gradient-based learning applied to document recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Distilling the Knowledge in a Neural Network