scispace - formally typeset
Journal ArticleDOI

Training products of experts by minimizing contrastive divergence

Geoffrey E. Hinton
- 01 Aug 2002 - 
- Vol. 14, Iss: 8, pp 1771-1800
TLDR
A product of experts (PoE) is an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary because it is hard even to approximate the derivatives of the renormalization term in the combination rule.
Abstract
It is possible to combine multiple latent-variable models of the same data by multiplying their probability distributions together and then renormalizing. This way of combining individual "expert" models makes it hard to generate samples from the combined model but easy to infer the values of the latent variables of each expert, because the combination rule ensures that the latent variables of different experts are conditionally independent when given the data. A product of experts (PoE) is therefore an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary. Training a PoE by maximizing the likelihood of the data is difficult because it is hard even to approximate the derivatives of the renormalization term in the combination rule. Fortunately, a PoE can be trained using a different objective function called "contrastive divergence" whose derivatives with regard to the parameters can be approximated accurately and efficiently. Examples are presented of contrastive divergence learning using several types of expert on several types of data.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

Deep Learning

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Journal ArticleDOI

A fast learning algorithm for deep belief nets

TL;DR: A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.
Proceedings Article

Rectified Linear Units Improve Restricted Boltzmann Machines

TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.
Journal ArticleDOI

Deep learning in neural networks

TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.
Journal ArticleDOI

Representation Learning: A Review and New Perspectives

TL;DR: Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.
References
More filters
Journal ArticleDOI

Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

TL;DR: The analogy between images and statistical mechanics systems is made and the analogous operation under the posterior distribution yields the maximum a posteriori (MAP) estimate of the image given the degraded observations, creating a highly parallel ``relaxation'' algorithm for MAP estimation.
Journal ArticleDOI

A maximum entropy approach to natural language processing

TL;DR: A maximum-likelihood approach for automatically constructing maximum entropy models is presented and how to implement this approach efficiently is described, using as examples several problems in natural language processing.
Book

Information processing in dynamical systems: foundations of harmony theory

TL;DR: The work reported in this chapter rests on the conviction that a methodology that has a crucial role to play in the development of cognitive science is mathematical analysis.
BookDOI

Learning in graphical models

TL;DR: This paper presents an introduction to inference for Bayesian networks and a view of the EM algorithm that justifies incremental, sparse and other variants, as well as an information-theoretic analysis of hard and soft assignment methods for clustering.
Related Papers (5)
Trending Questions (1)
What are the benefits of using a mixture of experts model?

The benefits of using a mixture of experts model include the ability to compute coefficients on each basis function separately and the flexibility to incorporate nonorthogonal experts and a nonlinear generative model.