Autoencoders, Minimum Description Length and Helmholtz Free Energy

Open AccessProceedings Article

Autoencoders, Minimum Description Length and Helmholtz Free Energy

Geoffrey E. Hinton, +1 more

- Vol. 6, pp 3-10

Chats0

TLDR

It is shown that the recognition weights of an autoencoder can be used to compute an approximation to the Boltzmann distribution and that this approximation gives an upper bound on the description length.

Abstract:

An autoencoder network uses a set of recognition weights to convert an input vector into a code vector. It then uses a set of generative weights to convert the code vector into an approximate reconstruction of the input vector. We derive an objective function for training autoencoders based on the Minimum Description Length (MDL) principle. The aim is to minimize the information required to describe both the code vector and the reconstruction error. We show that this information is minimized by choosing code vectors stochastically according to a Boltzmann distribution, where the generative weights define the energy of each possible code vector given the input vector. Unfortunately, if the code vectors use distributed representations, it is exponentially expensive to compute this Boltzmann distribution because it involves all possible code vectors. We show that the recognition weights of an autoencoder can be used to compute an approximation to the Boltzmann distribution and that this approximation gives an upper bound on the description length. Even when this bound is poor, it can be used as a Lyapunov function for learning both the generative and the recognition weights. We demonstrate that this approach can be used to learn factorial codes.

Autoencoders, Minimum Description Length and Helmholtz Free Energy

Citations

Deep Learning

Representation Learning: A Review and New Perspectives

Information Theory, Inference and Learning Algorithms

Learning Deep Architectures for AI

Whatever next? Predictive brains, situated agents, and the future of cognitive science

References

Neural networks and principal component analysis: learning from examples without local minima

Connectionist learning procedures

A new view of the EM algorithm that justifies incremental and other variants

A minimum description length framework for unsupervised learning

Developing Population Codes by Minimizing Description Length

Related Papers (5)

Gradient-based learning applied to document recognition

Generative Adversarial Nets

ImageNet Classification with Deep Convolutional Neural Networks

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization