Open AccessProceedings Article
Bayesian Compression for Deep Learning
Christos Louizos,Karen Ullrich,Max Welling +2 more
- Vol. 30, pp 3288-3298
Reads0
Chats0
TLDR
In this article, the authors use hierarchical priors to prune nodes instead of individual weights, and use the posterior uncertainties to determine the optimal fixed point precision to encode the weights.Abstract:
Compression and computational efficiency in deep learning have become a problem of great significance. In this work, we argue that the most principled and effective way to attack this problem is by adopting a Bayesian point of view, where through sparsity inducing priors we prune large parts of the network. We introduce two novelties in this paper: 1) we use hierarchical priors to prune nodes instead of individual weights, and 2) we use the posterior uncertainties to determine the optimal fixed point precision to encode the weights. Both factors significantly contribute to achieving the state of the art in terms of compression rates, while still staying competitive with methods designed to optimize for speed or energy efficiency.read more
Citations
More filters
Proceedings Article
Sampling-Free Variational Inference of Bayesian Neural Networks by Variance Backpropagation.
TL;DR: The authors proposed a new Bayesian neural net formulation that affords variational inference for which the evidence lower bound is analytically tractable subject to a tight approximation, which is a more effective approximation than the widely applied Monte Carlo sampling and CLT related techniques.
Posted Content
Indian Buffet Neural Networks for Continual Learning
TL;DR: This work places an Indian Buffet Process (IBP) prior over the neural structure of a Bayesian Neural Network, thus allowing the complexity of the BNN to increase and decrease automatically, and shows empirically that the method offers competitive results compared to Variational Continual Learning (VCL) in some settings.
Posted Content
Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural Networks
Yochai Zur,Chaim Baskin,Evgenii Zheltonozhskii,Brian Chmiel,Itay Evron,Alexander M. Bronstein,Avi Mendelson +6 more
TL;DR: This paper forms optimal arithmetic bit length allocation and neural network pruning as a NAS problem, searching for the configurations satisfying a computational complexity budget while maximizing the accuracy, using a differentiable search method based on the continuous relaxation of the search space proposed by Liu et al.
Posted Content
Walsh-Hadamard Variational Inference for Bayesian Deep Learning
TL;DR: In this paper, Walsh-Hadamard Variational Inference (WHVI) is proposed to reduce the parameterization and accelerate computations in over-parameterized models.
Posted Content
BiDet: An Efficient Binarized Object Detector
TL;DR: This paper proposes a binarized neural network learning method called BiDet that generalizes the information bottleneck (IB) principle to object detection, where the amount of information in the high-level feature maps is constrained and the mutual information between the feature maps and object detection is maximized.
References
More filters
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Journal ArticleDOI
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner +6 more
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Journal ArticleDOI
Regression Shrinkage and Selection via the Lasso
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Journal Article
Dropout: a simple way to prevent neural networks from overfitting
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.