The Conditional Entropy Bottleneck

doi:10.3390/E22090999

Open AccessJournal ArticleDOI

The Conditional Entropy Bottleneck

Ian Fischer

- 08 Sep 2020 -

Entropy

- Vol. 22, Iss: 9, pp 999

TLDR

This article proposed the Minimum Necessary Information (MNI) criterion for evaluating the quality of a model, which extends the traditional measure of generalization as accuracy or related metrics on a held-out set.

Abstract:

Much of the field of Machine Learning exhibits a prominent set of failure modes, including vulnerability to adversarial examples, poor out-of-distribution (OoD) detection, miscalibration, and willingness to memorize random labelings of datasets. We characterize these as failures of robust generalization, which extends the traditional measure of generalization as accuracy or related metrics on a held-out set. We hypothesize that these failures to robustly generalize are due to the learning systems retaining too much information about the training data. To test this hypothesis, we propose the Minimum Necessary Information (MNI) criterion for evaluating the quality of a model. In order to train models that perform well with respect to the MNI criterion, we present a new objective function, the Conditional Entropy Bottleneck (CEB), which is closely related to the Information Bottleneck (IB). We experimentally test our hypothesis by comparing the performance of CEB models with deterministic models and Variational Information Bottleneck (VIB) models on a variety of different datasets and robustness challenges. We find strong empirical evidence supporting our hypothesis that MNI models improve on these problems of robust generalization.

Citations

PDF

Open Access

More filters

Posted Content

What Makes for Good Views for Contrastive Learning

Yonglong Tian, +5 more

- 20 May 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, the authors use empirical analysis to better understand the importance of view selection, and argue that we should reduce the mutual information (MI) between views while keeping task-relevant information intact.

...read moreread less

Proceedings Article

What Makes for Good Views for Contrastive Learning

Yonglong Tian, +5 more

TL;DR: This paper uses empirical analysis to better understand the importance of view selection, and argues that the mutual information (MI) between views should be reduced while keeping task-relevant information intact, and devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI.

...read moreread less

Journal Article

Action and Perception as Divergence Minimization

Danijar Hafner, +5 more

- 04 May 2021 -

arXiv: Artificial Intelligence

TL;DR: A unified objective for action and perception of intelligent agents is introduced, and interpreting the target distribution as a latent variable model suggests powerful world models as a path toward highly adaptive agents that seek large niches in their environments, rendering task rewards optional.

...read moreread less

Journal ArticleDOI

Learnability for the Information Bottleneck

Tailin Wu, +3 more

- 23 Sep 2019 -

Entropy

TL;DR: This paper shows that if β is improperly chosen, learning cannot happen—the trivial representation P(Z|X)=P(Z) becomes the global minimum of the IB objective, and proves several sufficient conditions for IB-Learnability, which provides theoretical guidance for choosing a good β.

...read moreread less

Posted Content

DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation

Alexandre Ramé, +1 more

- 14 Jan 2021 -

arXiv: Learning

TL;DR: This paper introduces a novel training criterion called DICE, which increases diversity by reducing spurious correlations among features and adversarially prevents features from being conditionally predictable from each other, to reduce simultaneous errors while protecting class information.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Journal ArticleDOI

A mathematical theory of communication

Claude E. Shannon

- 01 Jul 1948 -

Bell System Technical Journal

TL;DR: This final installment of the paper considers the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now.

...read moreread less

Book

Elements of information theory

Thomas M. Cover, +1 more

TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.

...read moreread less

Journal Article

Dropout: a simple way to prevent neural networks from overfitting

Nitish Srivastava, +4 more

- 01 Jan 2014 -

Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

...read moreread less

Collapse

The Conditional Entropy Bottleneck

Citations

What Makes for Good Views for Contrastive Learning

What Makes for Good Views for Contrastive Learning

Action and Perception as Divergence Minimization

Learnability for the Information Bottleneck

DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation

References

Adam: A Method for Stochastic Optimization

A mathematical theory of communication

Elements of information theory

Dropout: a simple way to prevent neural networks from overfitting

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Related Papers (5)

Learning Multiple Layers of Features from Tiny Images

Deep Residual Learning for Image Recognition

Auto-Encoding Variational Bayes

Gradient-based learning applied to document recognition

Estimating mutual information.