Bayesian Compression for Deep Learning

Open AccessProceedings Article

Bayesian Compression for Deep Learning

Christos Louizos, +2 more

- Vol. 30, pp 3288-3298

Chats0

TLDR

In this article, the authors use hierarchical priors to prune nodes instead of individual weights, and use the posterior uncertainties to determine the optimal fixed point precision to encode the weights.

Abstract:

Compression and computational efficiency in deep learning have become a problem of great significance. In this work, we argue that the most principled and effective way to attack this problem is by adopting a Bayesian point of view, where through sparsity inducing priors we prune large parts of the network. We introduce two novelties in this paper: 1) we use hierarchical priors to prune nodes instead of individual weights, and 2) we use the posterior uncertainties to determine the optimal fixed point precision to encode the weights. Both factors significantly contribute to achieving the state of the art in terms of compression rates, while still staying competitive with methods designed to optimize for speed or energy efficiency.

Citations

PDF

Open Access

More filters

Posted Content

DiffPrune: Neural Network Pruning with Deterministic Approximate Binary Gates and L 0 Regularization.

Yaniv Shulman

- 07 Dec 2020 -

arXiv: Machine Learning

TL;DR: This work proposes a method for approximating a multivariate Bernoulli random variable by means of a deterministic and differentiable transformation of any real-valued multivariate random variable and offers a framework for unstructured or flexible structured model pruning.

...read moreread less

Journal ArticleDOI

Accelerating Convolutional Neural Network via Structured Gaussian Scale Mixture Models: A Joint Grouping and Pruning Approach

Tao Huang, +5 more

- 21 Feb 2020 -

IEEE Journal of Selected Topics in Signa...

TL;DR: A hybrid network compression technique for exploiting the prior knowledge of network parameters by Gaussian scale mixture (GSM) models and network pruning is formulated as a maximum a posteriori (MAP) estimation problem with a sparsity prior.

...read moreread less

Posted Content

Semi-Supervised Learning with Variational Bayesian Inference and Maximum Uncertainty Regularization

Kien Do, +2 more

- 03 Dec 2020 -

arXiv: Learning

TL;DR: Two generic methods for improving semi-supervised learning are proposed and a novel consistency loss called "maximum uncertainty regularization" (MUR) is proposed, which actively searches for "virtual" points situated beyond this region that cause the most uncertain class predictions.

...read moreread less

Posted Content

Plug-in, Trainable Gate for Streamlining Arbitrary Neural Networks

Jaedeok Kim, +3 more

- 24 Apr 2019 -

arXiv: Learning

TL;DR: In this paper, a trainable gate function is proposed to directly optimize loss functions that include non-differentiable discrete values such as 0-1 selection, which can be applied to pruning.

...read moreread less

Proceedings ArticleDOI

Overview of deep convolutional neural network pruning

Guang Li, +2 more

TL;DR: This paper divides the work into six aspects for a detailed analysis, combs the latest progress of deep neural network pruning technology from the perspective of different granular pruning and weight measurement standards, and points out the problems in the current research and analyzes Future research directions in the field of pruning.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Journal ArticleDOI

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Journal ArticleDOI

Regression Shrinkage and Selection via the Lasso

Robert Tibshirani

- 01 Jan 1996 -

Journal of the royal statistical society...

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

Journal Article

Dropout: a simple way to prevent neural networks from overfitting

Nitish Srivastava, +4 more

- 01 Jan 2014 -

Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less