Training products of experts by minimizing contrastive divergence

doi:10.1162/089976602760128018

Journal ArticleDOI

Training products of experts by minimizing contrastive divergence

Geoffrey E. Hinton

- 01 Aug 2002 -

Neural Computation

- Vol. 14, Iss: 8, pp 1771-1800

TLDR

A product of experts (PoE) is an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary because it is hard even to approximate the derivatives of the renormalization term in the combination rule.

Abstract:

It is possible to combine multiple latent-variable models of the same data by multiplying their probability distributions together and then renormalizing. This way of combining individual "expert" models makes it hard to generate samples from the combined model but easy to infer the values of the latent variables of each expert, because the combination rule ensures that the latent variables of different experts are conditionally independent when given the data. A product of experts (PoE) is therefore an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary. Training a PoE by maximizing the likelihood of the data is difficult because it is hard even to approximate the derivatives of the renormalization term in the combination rule. Fortunately, a PoE can be trained using a different objective function called "contrastive divergence" whose derivatives with regard to the parameters can be approximated accurately and efficiently. Examples are presented of contrastive divergence learning using several types of expert on several types of data.

Citations

PDF

Open Access

More filters

Book

Deep Learning

Ian Goodfellow, +2 more

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Journal ArticleDOI

A fast learning algorithm for deep belief nets

Geoffrey E. Hinton, +2 more

- 01 Jul 2006 -

Neural Computation

TL;DR: A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.

...read moreread less

Proceedings Article

Rectified Linear Units Improve Restricted Boltzmann Machines

Vinod Nair, +1 more

TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.

...read moreread less

Journal ArticleDOI

Deep learning in neural networks

Jürgen Schmidhuber

- 01 Jan 2015 -

Neural Networks

TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

...read moreread less

Journal ArticleDOI

Representation Learning: A Review and New Perspectives

Yoshua Bengio, +2 more

- 01 Aug 2013 -

IEEE Transactions on Pattern Analysis an...

TL;DR: Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

Stuart Geman, +1 more

- 01 Nov 1984 -

IEEE Transactions on Pattern Analysis an...

TL;DR: The analogy between images and statistical mechanics systems is made and the analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations, creating a highly parallel ``relaxation'' algorithm for MAP estimation.

...read moreread less

MonographDOI

Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations

David E. Rumelhart, +2 more

Journal ArticleDOI

A maximum entropy approach to natural language processing

Adam L. Berger, +2 more

- 01 Mar 1996 -

Computational Linguistics

TL;DR: A maximum-likelihood approach for automatically constructing maximum entropy models is presented and how to implement this approach efficiently is described, using as examples several problems in natural language processing.

...read moreread less

Book

Information processing in dynamical systems: foundations of harmony theory

Paul Smolensky

TL;DR: The work reported in this chapter rests on the conviction that a methodology that has a crucial role to play in the development of cognitive science is mathematical analysis.

...read moreread less

BookDOI

Learning in graphical models

Michael I. Jordan

TL;DR: This paper presents an introduction to inference for Bayesian networks and a view of the EM algorithm that justifies incremental, sparse and other variants, as well as an information-theoretic analysis of hard and soft assignment methods for clustering.

...read moreread less

Training products of experts by minimizing contrastive divergence

Citations

Deep Learning

A fast learning algorithm for deep belief nets

Rectified Linear Units Improve Restricted Boltzmann Machines

Deep learning in neural networks

Representation Learning: A Review and New Perspectives

References

Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations

A maximum entropy approach to natural language processing

Information processing in dynamical systems: foundations of harmony theory

Learning in graphical models

Related Papers (5)

A fast learning algorithm for deep belief nets

Reducing the Dimensionality of Data with Neural Networks

Gradient-based learning applied to document recognition

Learning Deep Architectures for AI

Greedy Layer-Wise Training of Deep Networks

Trending Questions (1)