Deep learning and the information bottleneck principle

doi:10.1109/ITW.2015.7133169

Open AccessProceedings ArticleDOI

Deep learning and the information bottleneck principle

- pp 1-5

TLDR

It is argued that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer.

Abstract:

Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN and obtain finite sample generalization bounds. The advantage of getting closer to the theoretical limit is quantifiable both by the generalization bound and by the network's simplicity. We argue that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer. The hierarchical representations at the layered network naturally correspond to the structural phase transitions along the information curve. We believe that this new insight can lead to new optimality bounds and deep learning algorithms.

Citations

PDF

Open Access

More filters

Posted Content

Opening the Black Box of Deep Neural Networks via Information

Ravid Shwartz-Ziv, +1 more

- 02 Mar 2017 -

arXiv: Learning

TL;DR: This work demonstrates the effectiveness of the Information-Plane visualization of DNNs and shows that the training time is dramatically reduced when adding more hidden layers, and the main advantage of the hidden layers is computational.

...read moreread less

Posted Content

When Does Label Smoothing Help

Rafael Muller, +2 more

- 06 Jun 2019 -

arXiv: Learning

TL;DR: It is shown empirically that in addition to improving generalization, label smoothing improves model calibration which can significantly improve beam-search and that if a teacher network is trained with label smoothed, knowledge distillation into a student network is much less effective.

...read moreread less

Proceedings Article

Mutual Information Neural Estimation.

Mohamed Ishmael Belghazi, +6 more

TL;DR: A Mutual Information Neural Estimator (MINE) is presented that is linearly scalable in dimensionality as well as in sample size, trainable through back-prop, and strongly consistent, and applied to improve adversarially trained generative models.

...read moreread less

Journal ArticleDOI

A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI

Erico Tjoa, +1 more

- 27 Oct 2021 -

IEEE Transactions on Neural Networks

TL;DR: A review on interpretabilities suggested by different research works and categorize them is provided, hoping that insight into interpretability will be born with more considerations for medical practices and initiatives to push forward data-based, mathematically grounded, and technically grounded medical education are encouraged.

...read moreread less

Posted Content

Deep Variational Information Bottleneck

Alexander A. Alemi, +3 more

- 01 Dec 2016 -

arXiv: Learning

TL;DR: It is shown that models trained with the VIB objective outperform those that are trained with other forms of regularization, in terms of generalization performance and robustness to adversarial attack.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Convolutional networks for images, speech, and time series

Yann LeCun, +3 more

The information bottleneck method

Naftali Tishby, +2 more

TL;DR: The variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning, as will be described in detail elsewhere.

...read moreread less

Journal ArticleDOI

Deterministic annealing for clustering, compression, classification, regression, and related optimization problems

Kenneth Rose

TL;DR: The deterministic annealing approach to clustering and its extensions has demonstrated substantial performance improvement over standard supervised and unsupervised learning methods in a variety of important applications including compression, estimation, pattern recognition and classification, and statistical regression.

...read moreread less

Journal ArticleDOI

Successive refinement of information

W.H.R. Equitz, +1 more

- 01 Mar 1991 -

IEEE Transactions on Information Theory

TL;DR: It is shown that in order to achieve optimal successive refinement the necessary and sufficient conditions are that the solutions of the rate distortion problem can be written as a Markov chain and all finite alphabet signals with Hamming distortion satisfy these requirements.

...read moreread less

Journal ArticleDOI

Statistical mechanics and phase transitions in clustering.

Kenneth Rose, +2 more

- 20 Aug 1990 -

Physical Review Letters

TL;DR: In this paper, a new approach to clustering based on statistical physics is presented, where the problem is formulated as fuzzy clustering and the association probability distribution is obtained by maximizing the entropy at a given average variance.

...read moreread less

Deep learning and the information bottleneck principle

Citations

Opening the Black Box of Deep Neural Networks via Information

When Does Label Smoothing Help

Mutual Information Neural Estimation.

A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI

Deep Variational Information Bottleneck

References

Convolutional networks for images, speech, and time series

The information bottleneck method

Deterministic annealing for clustering, compression, classification, regression, and related optimization problems

Successive refinement of information

Statistical mechanics and phase transitions in clustering.

Related Papers (5)

Deep Residual Learning for Image Recognition

Elements of information theory

Auto-Encoding Variational Bayes

Adam: A Method for Stochastic Optimization

Learning Multiple Layers of Features from Tiny Images