scispace - formally typeset
Open AccessProceedings Article

Autoencoders, Minimum Description Length and Helmholtz Free Energy

Reads0
Chats0
TLDR
It is shown that the recognition weights of an autoencoder can be used to compute an approximation to the Boltzmann distribution and that this approximation gives an upper bound on the description length.
Abstract
An autoencoder network uses a set of recognition weights to convert an input vector into a code vector. It then uses a set of generative weights to convert the code vector into an approximate reconstruction of the input vector. We derive an objective function for training autoencoders based on the Minimum Description Length (MDL) principle. The aim is to minimize the information required to describe both the code vector and the reconstruction error. We show that this information is minimized by choosing code vectors stochastically according to a Boltzmann distribution, where the generative weights define the energy of each possible code vector given the input vector. Unfortunately, if the code vectors use distributed representations, it is exponentially expensive to compute this Boltzmann distribution because it involves all possible code vectors. We show that the recognition weights of an autoencoder can be used to compute an approximation to the Boltzmann distribution and that this approximation gives an upper bound on the description length. Even when this bound is poor, it can be used as a Lyapunov function for learning both the generative and the recognition weights. We demonstrate that this approach can be used to learn factorial codes.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

Deep Learning

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Journal ArticleDOI

Representation Learning: A Review and New Perspectives

TL;DR: Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.
Book

Information Theory, Inference and Learning Algorithms

TL;DR: A fun and exciting textbook on the mathematics underpinning the most dynamic areas of modern science and engineering.
Book

Learning Deep Architectures for AI

TL;DR: The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.
Journal ArticleDOI

Whatever next? Predictive brains, situated agents, and the future of cognitive science

TL;DR: This target article critically examines this "hierarchical prediction machine" approach, concluding that it offers the best clue yet to the shape of a unified science of mind and action.
References
More filters
Journal ArticleDOI

Neural networks and principal component analysis: learning from examples without local minima

TL;DR: The main result is a complete description of the landscape attached to E in terms of principal component analysis, showing that E has a unique minimum corresponding to the projection onto the subspace generated by the first principal vectors of a covariance matrix associated with the training patterns.
Book ChapterDOI

Connectionist learning procedures

TL;DR: These relatively simple, gradient-descent learning procedures work well for small tasks, and the new challenge is to find ways of improving their convergence rate and their generalization abilities so that they can be applied to larger, more realistic tasks.

A minimum description length framework for unsupervised learning

TL;DR: This thesis presents a general framework for describing unsupervised learning procedures based on the Minimum Description Length (MDL) principle, and describes three new learning algorithms derived in this manner from the MDL framework.
Journal ArticleDOI

Developing Population Codes by Minimizing Description Length

TL;DR: In this paper, the minimum description length principle is used to train the hidden units of a neural network to extract a representation that is cheap to describe but nonetheless allows the input to be reconstructed accurately.
Related Papers (5)