scispace - formally typeset
Open AccessJournal ArticleDOI

The Conditional Entropy Bottleneck

Ian Fischer
- 08 Sep 2020 - 
- Vol. 22, Iss: 9, pp 999
TLDR
This article proposed the Minimum Necessary Information (MNI) criterion for evaluating the quality of a model, which extends the traditional measure of generalization as accuracy or related metrics on a held-out set.
Abstract
Much of the field of Machine Learning exhibits a prominent set of failure modes, including vulnerability to adversarial examples, poor out-of-distribution (OoD) detection, miscalibration, and willingness to memorize random labelings of datasets. We characterize these as failures of robust generalization, which extends the traditional measure of generalization as accuracy or related metrics on a held-out set. We hypothesize that these failures to robustly generalize are due to the learning systems retaining too much information about the training data. To test this hypothesis, we propose the Minimum Necessary Information (MNI) criterion for evaluating the quality of a model. In order to train models that perform well with respect to the MNI criterion, we present a new objective function, the Conditional Entropy Bottleneck (CEB), which is closely related to the Information Bottleneck (IB). We experimentally test our hypothesis by comparing the performance of CEB models with deterministic models and Variational Information Bottleneck (VIB) models on a variety of different datasets and robustness challenges. We find strong empirical evidence supporting our hypothesis that MNI models improve on these problems of robust generalization.

read more

Citations
More filters
Posted Content

What Makes for Good Views for Contrastive Learning

TL;DR: In this article, the authors use empirical analysis to better understand the importance of view selection, and argue that we should reduce the mutual information (MI) between views while keeping task-relevant information intact.
Proceedings Article

What Makes for Good Views for Contrastive Learning

TL;DR: This paper uses empirical analysis to better understand the importance of view selection, and argues that the mutual information (MI) between views should be reduced while keeping task-relevant information intact, and devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI.
Journal Article

Action and Perception as Divergence Minimization

TL;DR: A unified objective for action and perception of intelligent agents is introduced, and interpreting the target distribution as a latent variable model suggests powerful world models as a path toward highly adaptive agents that seek large niches in their environments, rendering task rewards optional.
Journal ArticleDOI

Learnability for the Information Bottleneck

TL;DR: This paper shows that if β is improperly chosen, learning cannot happen—the trivial representation P(Z|X)=P(Z) becomes the global minimum of the IB objective, and proves several sufficient conditions for IB-Learnability, which provides theoretical guidance for choosing a good β.
Posted Content

DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation

Alexandre Ramé, +1 more
- 14 Jan 2021 - 
TL;DR: This paper introduces a novel training criterion called DICE, which increases diversity by reducing spurious correlations among features and adversarially prevents features from being conditionally predictable from each other, to reduce simultaneous errors while protecting class information.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI

A mathematical theory of communication

TL;DR: This final installment of the paper considers the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now.
Book

Elements of information theory

TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Journal Article

Dropout: a simple way to prevent neural networks from overfitting

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.