Minimizing Description Length in an Unsupervised Neural Network

Open Access

Minimizing Description Length in an Unsupervised Neural Network

TLDR

The recognition weights of an autoencoder can be used to compute an approximation to the Boltzmann distribution and this approximation corresponds to using a suboptimal encoding scheme and therefore gives an upper bound on the minimal description length.

Abstract:

An autoencoder network uses a set of recognition weights to convert an input vector into a representation vector. It then uses a set of generative weights to convert the representation vector into an approximate reconstruction of the input vector. We derive an objective function for training autoencoders based on the Minimum Description Length (MDL) principle. The aim is to minimize the information required to describe both the representation vector and the reconstruction error. This information is minimized by choosing representation vectors stochastically according to a Boltzmann distribution. Unfortunately, if the representation vectors use distributed representations, it is exponentially expensive to compute this Boltzmann distribution because it involves all possible representation vectors. We show that the recognition weights of an autoencoder can be used to compute an approximation to the Boltzmann distribution. This approximation corresponds to using a suboptimal encoding scheme and therefore gives an upper bound on the minimal description length. Even when this bound is poor, it can be used as a Lyapunov function for learning both the generative and the recognition weights. We demonstrate that this approach can be used to learn distributed representations in which many di erent hidden causes combine to produce each observed data vector. Such representations can be exponentially more e cient in their use of hardware than standard vector quantization or mixture models.

Minimizing Description Length in an Unsupervised Neural Network

Citations

Autoencoder-based feature learning for cyber security applications

Application of deep learning to cybersecurity: A survey

Text summarization using unsupervised deep learning

Nonlinear Information Bottleneck

Nonlinear Information Bottleneck.

References

Stochastic Complexity In Statistical Inquiry

A new view of the EM algorithm that justifies incremental and other variants

A minimum description length framework for unsupervised learning

Related Papers (5)

A General Nonlinear Embedding Framework Based on Deep Neural Network

Evolutionary Learning Algorithm for Projection Neural Networks

Probabilistic Neural Network With Complex Exponential Activation Functions in Image Recognition

Feed-forward neural network training using sparse representation

Recursive Extraction of Modular Structure from Layered Neural Networks Using Variational Bayes Method