Topic

Information bottleneck method

About: Information bottleneck method is a research topic. Over the lifetime, 770 publications have been published within this topic receiving 18168 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

The information bottleneck method

[...]

Naftali Tishby, Fernando Pereira, William Bialek

24 Apr 2000

TL;DR: The variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning, as will be described in detail elsewhere.

...read moreread less

Abstract: We define the relevant information in a signal x ∈ X as being the information that this signal provides about another signal y ∈ Y . Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. Understanding the signal x requires more than just predicting y, it also requires specifying which features of X play a role in the prediction. We formalize this problem as that of finding a short code for X that preserves the maximum information about Y . That is, we squeeze the information that X provides about Y through a ‘bottleneck’ formed by a limited set of codewords X. This constrained optimization problem can be seen as a generalization of rate distortion theory in which the distortion measure d(x, x) emerges from the joint statistics of X and Y . This approach yields an exact set of self consistent equations for the coding rules X → X and X → Y . Solutions to these equations can be found by a convergent re–estimation method that generalizes the Blahut–Arimoto algorithm. Our variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning, as will be described in detail elsewhere.

...read moreread less

2,458 citations

Proceedings Article•DOI•

Deep learning and the information bottleneck principle

[...]

Naftali Tishby¹, Noga Zaslavsky¹•Institutions (1)

Hebrew University of Jerusalem¹

25 Jun 2015

TL;DR: It is argued that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer.

...read moreread less

Abstract: Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN and obtain finite sample generalization bounds. The advantage of getting closer to the theoretical limit is quantifiable both by the generalization bound and by the network's simplicity. We argue that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer. The hierarchical representations at the layered network naturally correspond to the structural phase transitions along the information curve. We believe that this new insight can lead to new optimality bounds and deep learning algorithms.

...read moreread less

1,187 citations

Posted Content•

Opening the Black Box of Deep Neural Networks via Information

[...]

Ravid Shwartz-Ziv, Naftali Tishby

02 Mar 2017-arXiv: Learning

TL;DR: This work demonstrates the effectiveness of the Information-Plane visualization of DNNs and shows that the training time is dramatically reduced when adding more hidden layers, and the main advantage of the hidden layers is computational.

...read moreread less

Abstract: Despite their great success, there is still no comprehensive theoretical understanding of learning with Deep Neural Networks (DNNs) or their inner organization. Previous work proposed to analyze DNNs in the \textit{Information Plane}; i.e., the plane of the Mutual Information values that each layer preserves on the input and output variables. They suggested that the goal of the network is to optimize the Information Bottleneck (IB) tradeoff between compression and prediction, successively, for each layer. In this work we follow up on this idea and demonstrate the effectiveness of the Information-Plane visualization of DNNs. Our main results are: (i) most of the training epochs in standard DL are spent on {\emph compression} of the input to efficient representation and not on fitting the training labels. (ii) The representation compression phase begins when the training errors becomes small and the Stochastic Gradient Decent (SGD) epochs change from a fast drift to smaller training error into a stochastic relaxation, or random diffusion, constrained by the training error value. (iii) The converged layers lie on or very close to the Information Bottleneck (IB) theoretical bound, and the maps from the input to any hidden layer and from this hidden layer to the output satisfy the IB self-consistent equations. This generalization through noise mechanism is unique to Deep Neural Networks and absent in one layer networks. (iv) The training time is dramatically reduced when adding more hidden layers. Thus the main advantage of the hidden layers is computational. This can be explained by the reduced relaxation time, as this it scales super-linearly (exponentially for simple diffusion) with the information compression from the previous layer.

...read moreread less

1,159 citations

Posted Content•

Deep Variational Information Bottleneck

[...]

Alexander A. Alemi¹, Ian Fischer¹, Joshua V. Dillon¹, Kevin Murphy¹•Institutions (1)

Google¹

01 Dec 2016-arXiv: Learning

TL;DR: It is shown that models trained with the VIB objective outperform those that are trained with other forms of regularization, in terms of generalization performance and robustness to adversarial attack.

...read moreread less

Abstract: We present a variational approximation to the information bottleneck of Tishby et al. (1999). This variational approach allows us to parameterize the information bottleneck model using a neural network and leverage the reparameterization trick for efficient training. We call this method "Deep Variational Information Bottleneck", or Deep VIB. We show that models trained with the VIB objective outperform those that are trained with other forms of regularization, in terms of generalization performance and robustness to adversarial attack.

...read moreread less

757 citations

Proceedings Article•DOI•

Variational Autoencoders for Collaborative Filtering

[...]

Dawen Liang¹, Rahul G. Krishnan², Matthew D. Hoffman³, Tony Jebara¹•Institutions (3)

Netflix¹, Massachusetts Institute of Technology², Google³

23 Apr 2018

TL;DR: In this article, a variational autoencoder (VAE) was extended to collaborative filtering for implicit feedback, and a generative model with multinomial likelihood and Bayesian inference for parameter estimation was proposed.

...read moreread less

Abstract: We extend variational autoencoders (VAEs) to collaborative filtering for implicit feedback. This non-linear probabilistic model enables us to go beyond the limited modeling capacity of linear factor models which still largely dominate collaborative filtering research.We introduce a generative model with multinomial likelihood and use Bayesian inference for parameter estimation. Despite widespread use in language modeling and economics, the multinomial likelihood receives less attention in the recommender systems literature. We introduce a different regularization parameter for the learning objective, which proves to be crucial for achieving competitive performance. Remarkably, there is an efficient way to tune the parameter using annealing. The resulting model and learning algorithm has information-theoretic connections to maximum entropy discrimination and the information bottleneck principle. Empirically, we show that the proposed approach significantly outperforms several state-of-the-art baselines, including two recently-proposed neural network approaches, on several real-world datasets. We also provide extended experiments comparing the multinomial likelihood with other commonly used likelihood functions in the latent factor collaborative filtering literature and show favorable results. Finally, we identify the pros and cons of employing a principled Bayesian inference approach and characterize settings where it provides the most significant improvements.

...read moreread less

637 citations

Collapse

Network Information

Performance

Metrics

978

Papers

24,318

Citations

No. of papers in the topic in previous years
Year	Papers
2023	62
2022	144
2021	163
2020	145
2019	97
2018	80

Information bottleneck method

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics