scispace - formally typeset
Open AccessProceedings Article

Low precision arithmetic for deep learning.

TLDR
Goodfellow et al. as discussed by the authors simulate the training of a set of state-of-the-art neural networks, the Maxout networks, on three benchmark datasets: the MNIST, CIFAR10 and SVHN, with three distinct arithmetics: floating point, fixed point and dynamic fixed point.
Abstract
We simulate the training of a set of state of the art neural networks, the Maxout networks (Goodfellow et al., 2013a), on three benchmark datasets: the MNIST, CIFAR10 and SVHN, with three distinct arithmetics: floating point, fixed point and dynamic fixed point. For each of those datasets and for each of those arithmetics, we assess the impact of the precision of the computations on the final error of the training. We find that very low precision computation is sufficient not just for running trained networks but also for training them. For example, almost state-of-the-art results were obtained on most datasets with 10 bits for computing activations and gradients, and 12 bits for storing updated parameters.

read more

Citations
More filters
Proceedings Article

Deep Learning with Limited Numerical Precision

TL;DR: In this article, the effect of limited precision data representation and computation on neural network training was studied, and it was shown that deep networks can be trained using only 16-bit wide fixed point number representation when using stochastic rounding, and incur little to no degradation in the classification accuracy.
Proceedings ArticleDOI

Fast Algorithms for Convolutional Neural Networks

TL;DR: A new class of fast algorithms for convolutional neural networks is introduced using Winograd's minimal filtering algorithms, which compute minimal complexity convolution over small tiles, which makes them fast with small filters and small batch sizes.
Proceedings Article

Gradient-based Hyperparameter Optimization through Reversible Learning

TL;DR: In this article, the authors compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire training procedure, allowing them to optimize thousands of hyperparameter, including step-size and momentum schedules, weight initialization distributions, richly parameterized regularization schemes, and neural network architectures.
Proceedings Article

Compressing Neural Networks with the Hashing Trick

TL;DR: HashedNets as discussed by the authors uses a hash function to randomly group connection weights into hash buckets, and all connections within the same hash bucket share a single parameter value, which can be tuned to adjust to the weight sharing architecture with standard backprop during training.
Proceedings ArticleDOI

Stripes: bit-serial deep neural network computing

TL;DR: This work presents STR, a hardware accelerator that uses bit-serial computations to improve energy efficiency and performance and its area and power overhead are estimated at 5 percent and 12 percent respectively.
Related Papers (5)