scispace - formally typeset
Search or ask a question

Showing papers by "Geoffrey E. Hinton published in 2005"


Proceedings Article
06 Jan 2005
TL;DR: The properties of CD learning are studied and it is shown that it provides biased estimates in general, but that the bias is typically very small.
Abstract: Maximum-likelihood (ML) learning of Markov random fields is challenging because it requires estimates of averages that have an exponential number of terms. Markov chain Monte Carlo methods typically take a long time to converge on unbiased estimates, but Hinton (2002) showed that if the Markov chain is only run for a few steps, the learning can still work well and it approximately minimizes a different function called “contrastive divergence” (CD). CD learning has been successfully applied to various types of random fields. Here, we study the properties of CD learning and show that it provides biased estimates in general, but that the bias is typically very small. Fast CD learning can therefore be used to get close to an ML solution and slow ML learning can then be used to fine-tune the CD solution. Consider a probability distribution over a vector x (assumed discrete w.l.o.g.) and with parameters W p(x;W) = 1 Z(W) e (1) where Z(W) = ∑ x e −E(x;W) is a normalisation constant and E(x;W) is an energy function. This class of random-field distributions has found many practical applications (Li, 2001; Winkler, 2002; Teh et al., 2003; He et al., 2004). Maximum-likelihood (ML) learning of the parameters W given an iid sample X = {xn}n=1 can be done by gradient ascent: W = W + η ∂L(W;X ) ∂W ∣

751 citations


Proceedings Article
30 Jul 2005
TL;DR: This work describes a series of progressively better learning algorithms all of which are designed to run on neuron-like hardware and turns a generic network with three hidden layers and 1:7 million connections into a very good generative model of handwritten digits.
Abstract: If neurons are treated as latent variables, our visual systems are non-linear, densely-connected graphical models containing billions of variables and thousands of billions of parameters. Current algorithms would have difficulty learning a graphical model of this scale. Starting with an algorithm that has difficulty learning more than a few thousand parameters, I describe a series of progressively better learning algorithms all of which are designed to run on neuron-like hardware. The latest member of this series can learn deep, multi-layer belief nets quite rapidly. It turns a generic network with three hidden layers and 1:7 million connections into a very good generative model of handwritten digits. After learning, the model gives classification performance that is comparable to the best discriminative methods.

58 citations


Proceedings Article
05 Dec 2005
TL;DR: A generative model for handwritten digits that uses two pairs of opposing springs whose stiffnesses are controlled by a motor program is described, which can be used directly for digit classification or used as additional, highly informative outputs when training a feed-forward classifier.
Abstract: We describe a generative model for handwritten digits that uses two pairs of opposing springs whose stiffnesses are controlled by a motor program. We show how neural networks can be trained to infer the motor programs required to accurately reconstruct the MNIST digits. The inferred motor programs can be used directly for digit classification, but they can also be used in other ways. By adding noise to the motor program inferred from an MNIST image we can generate a large set of very different images of the same class, thus enlarging the training set available to other methods. We can also use the motor programs as additional, highly informative outputs which reduce overfitting when training a feed-forward classifier.

43 citations


Proceedings Article
01 Jan 2005
TL;DR: A generative model that contains a hidden Markov Random Field which has directed connections to the observable variables and a hybrid model that simultaneously learns parts of objects and their inter-relationships from intensity images is described.
Abstract: We describe a learning procedure for a generative model that contains a hidden Markov Random Field (MRF) which has directed connections to the observable variables. The learning procedure uses a variational approximation for the posterior distribution over the hidden variables. Despite the intractable partition function of the MRF, the weights on the directed connections and the variational approximation itself can be learned by maximizing a lower bound on the log probability of the observed data. The parameters of the MRF are learned by using the mean field version of contrastive divergence [1]. We show that this hybrid model simultaneously learns parts of objects and their inter-relationships from intensity images. We discuss the extension to multiple MRF’s linked into in a chain graph by directed connections.

33 citations


Proceedings ArticleDOI
27 Dec 2005
TL;DR: The feasibility of this approach is demonstrated by training an EBM using contrastive backpropagation on a dataset of idealized trajectories of two balls bouncing in a box and showing that the model learns an accurate and efficient representation of the dataset, taking advantage of the approximate independence between subsets of variables.
Abstract: Certain datasets can be efficiently modelled in terms of constraints that are usually satisfied but sometimes are strongly violated. We propose using energy-based density models (EBMs) implementing products of frequently approximately satisfied nonlinear constraints for modelling such datasets. We demonstrate the feasibility of this approach by training an EBM using contrastive backpropagation on a dataset of idealized trajectories of two balls bouncing in a box and showing that the model learns an accurate and efficient representation of the dataset, taking advantage of the approximate independence between subsets of variables.

29 citations


Proceedings Article
01 Jan 2005
TL;DR: In this article, the spectral gradient descent (SGD) method is proposed to improve the performance of the gradient-based dimensionality reduction method by using information contained in the leading eigenvalues of a data affinity matrix to modify the steps taken during a gradientbased optimization procedure.
Abstract: We introduce spectral gradient descent, a way of improving iterative dimensionality reduction techniques. 1 The method uses information contained in the leading eigenvalues of a data affinity matrix to modify the steps taken during a gradient-based optimization procedure. We show that the approach is able to speed up the optimization and to help dimensionality reduction methods find better local minima of their objective functions. We also provide an interpretation of our approach in terms of the power method for finding the leading eigenvalues of a symmetric matrix and verify the usefulness of the approach in some simple experiments.

11 citations


Journal ArticleDOI
TL;DR: The spectral gradient descent (SGD) method as mentioned in this paper uses information contained in the leading eigenvalues of a data affinity matrix to modify the steps taken during a gradient-based optimization procedure and is able to speed up the optimization and to help dimensionality reduction methods find better local minima of their objective functions.

7 citations


Proceedings ArticleDOI
27 Dec 2005
TL;DR: An approach to improve iterative dimensionality reduction methods by using information contained in the leading eigenvectors of a data affinity matrix by modifying the gradient of an iterative method so that latent space elements belonging to the same cluster are encouraged to move in similar directions during optimization.
Abstract: We describe an approach to improve iterative dimensionality reduction methods by using information contained in the leading eigenvectors of a data affinity matrix. Using an insight from the area of spectral clustering, we suggest modifying the gradient of an iterative method, so that latent space elements belonging to the same cluster are encouraged to move in similar directions during optimization. We also describe way to achieve this without actually having to explicitly perform an eigendecomposition. Preliminary experiments show that our approach makes it possible to speed up iterative methods and helps them to find better local minima of their objective function.

4 citations


Reference EntryDOI
14 Oct 2005
TL;DR: The results show clear trends in the development of neural networks in both the action and perceptual system as well as in the models used for decision-making.
Abstract: First page of article Keywords: artificial intelligence: neural networks; action and perceptual system

4 citations


01 Jan 2005
TL;DR: An approach toimprove iterative di- mensionality reduction methods by using information contained in the leading eigenvectors of a dataaffinity matrix is described, making it possible tospeed upiterative methods and helps them to find better local minima of their objective function.
Abstract: We describe anapproach toimprove iterative di- mensionality reduction methods byusing information contained intheleading eigenvectors ofa dataaffinity matrix. Using an insight fromtheareaofspectral clustering, we suggest modifying thegradient ofaniterative method, sothatlatent spaceelements belonging tothesamecluster areencouraged to moveinsimilar directions during optimization. Wealso describe waytoachieve this without actually having toexplicitly perform aneigendecomposition. Preliminary experiments showthat our approach makesitpossible tospeed upiterative methods and helps themtofind better local minima oftheir objective function.