Showing papers by "Geoffrey E. Hinton published in 2005"

PDF

Open Access

Proceedings Article•

[...]

Miguel Á. Carreira-Perpiñán, Geoffrey E. Hinton

06 Jan 2005

TL;DR: The properties of CD learning are studied and it is shown that it provides biased estimates in general, but that the bias is typically very small.

...read moreread less

Abstract: Maximum-likelihood (ML) learning of Markov random fields is challenging because it requires estimates of averages that have an exponential number of terms. Markov chain Monte Carlo methods typically take a long time to converge on unbiased estimates, but Hinton (2002) showed that if the Markov chain is only run for a few steps, the learning can still work well and it approximately minimizes a different function called “contrastive divergence” (CD). CD learning has been successfully applied to various types of random fields. Here, we study the properties of CD learning and show that it provides biased estimates in general, but that the bias is typically very small. Fast CD learning can therefore be used to get close to an ML solution and slow ML learning can then be used to fine-tune the CD solution. Consider a probability distribution over a vector x (assumed discrete w.l.o.g.) and with parameters W p(x;W) = 1 Z(W) e (1) where Z(W) = ∑ x e −E(x;W) is a normalisation constant and E(x;W) is an energy function. This class of random-field distributions has found many practical applications (Li, 2001; Winkler, 2002; Teh et al., 2003; He et al., 2004). Maximum-likelihood (ML) learning of the parameters W given an iid sample X = {xn}n=1 can be done by gradient ascent: W = W + η ∂L(W;X ) ∂W ∣

...read moreread less

751 citations

Proceedings Article•

What kind of a graphical model is the brain

[...]

Geoffrey E. Hinton¹•Institutions (1)

Canadian Institute for Advanced Research¹

30 Jul 2005

TL;DR: This work describes a series of progressively better learning algorithms all of which are designed to run on neuron-like hardware and turns a generic network with three hidden layers and 1:7 million connections into a very good generative model of handwritten digits.

...read moreread less

Abstract: If neurons are treated as latent variables, our visual systems are non-linear, densely-connected graphical models containing billions of variables and thousands of billions of parameters. Current algorithms would have difficulty learning a graphical model of this scale. Starting with an algorithm that has difficulty learning more than a few thousand parameters, I describe a series of progressively better learning algorithms all of which are designed to run on neuron-like hardware. The latest member of this series can learn deep, multi-layer belief nets quite rapidly. It turns a generic network with three hidden layers and 1:7 million connections into a very good generative model of handwritten digits. After learning, the model gives classification performance that is comparable to the best discriminative methods.

...read moreread less

58 citations

Proceedings Article•

Inferring Motor Programs from Images of Handwritten Digits

[...]

Vinod Nair¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

05 Dec 2005

TL;DR: A generative model for handwritten digits that uses two pairs of opposing springs whose stiffnesses are controlled by a motor program is described, which can be used directly for digit classification or used as additional, highly informative outputs when training a feed-forward classifier.

...read moreread less

Abstract: We describe a generative model for handwritten digits that uses two pairs of opposing springs whose stiffnesses are controlled by a motor program. We show how neural networks can be trained to infer the motor programs required to accurately reconstruct the MNIST digits. The inferred motor programs can be used directly for digit classification, but they can also be used in other ways. By adding noise to the motor program inferred from an MNIST image we can generate a large set of very different images of the same class, thus enlarging the training set available to other methods. We can also use the motor programs as additional, highly informative outputs which reduce overfitting when training a feed-forward classifier.

...read moreread less

43 citations

Proceedings Article•

Learning Causally Linked Markov Random Fields.

[...]

Geoffrey E. Hinton, Simon Osindero, Kejie Bao

01 Jan 2005

TL;DR: A generative model that contains a hidden Markov Random Field which has directed connections to the observable variables and a hybrid model that simultaneously learns parts of objects and their inter-relationships from intensity images is described.

...read moreread less

Abstract: We describe a learning procedure for a generative model that contains a hidden Markov Random Field (MRF) which has directed connections to the observable variables. The learning procedure uses a variational approximation for the posterior distribution over the hidden variables. Despite the intractable partition function of the MRF, the weights on the directed connections and the variational approximation itself can be learned by maximizing a lower bound on the log probability of the observed data. The parameters of the MRF are learned by using the mean field version of contrastive divergence [1]. We show that this hybrid model simultaneously learns parts of objects and their inter-relationships from intensity images. We discuss the extension to multiple MRF’s linked into in a chain graph by directed connections.

...read moreread less

33 citations

Proceedings Article•DOI•

Learning nonlinear constraints with contrastive backpropagation

[...]

Andriy Mnih¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

27 Dec 2005

TL;DR: The feasibility of this approach is demonstrated by training an EBM using contrastive backpropagation on a dataset of idealized trajectories of two balls bouncing in a box and showing that the model learns an accurate and efficient representation of the dataset, taking advantage of the approximate independence between subsets of variables.

...read moreread less

Abstract: Certain datasets can be efficiently modelled in terms of constraints that are usually satisfied but sometimes are strongly violated. We propose using energy-based density models (EBMs) implementing products of frequently approximately satisfied nonlinear constraints for modelling such datasets. We demonstrate the feasibility of this approach by training an EBM using contrastive backpropagation on a dataset of idealized trajectories of two balls bouncing in a box and showing that the model learns an accurate and efficient representation of the dataset, taking advantage of the approximate independence between subsets of variables.

...read moreread less

29 citations

Proceedings Article•

Improving dimensionality reduction with spectral gradient descent

[...]

Roland Memisevic, Geoffrey E. Hinton

01 Jan 2005

TL;DR: In this article, the spectral gradient descent (SGD) method is proposed to improve the performance of the gradient-based dimensionality reduction method by using information contained in the leading eigenvalues of a data affinity matrix to modify the steps taken during a gradientbased optimization procedure.

...read moreread less

Abstract: We introduce spectral gradient descent, a way of improving iterative dimensionality reduction techniques. 1 The method uses information contained in the leading eigenvalues of a data affinity matrix to modify the steps taken during a gradient-based optimization procedure. We show that the approach is able to speed up the optimization and to help dimensionality reduction methods find better local minima of their objective functions. We also provide an interpretation of our approach in terms of the power method for finding the leading eigenvalues of a symmetric matrix and verify the usefulness of the approach in some simple experiments.

...read moreread less

11 citations

Journal Article•DOI•

2005 Special Issue: Improving dimensionality reduction with spectral gradient descent

[...]

Roland Memisevic¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

01 Jun 2005-Neural Networks

TL;DR: The spectral gradient descent (SGD) method as mentioned in this paper uses information contained in the leading eigenvalues of a data affinity matrix to modify the steps taken during a gradient-based optimization procedure and is able to speed up the optimization and to help dimensionality reduction methods find better local minima of their objective functions.

...read moreread less

7 citations

Proceedings Article•DOI•

Embedding via clustering: using spectral information to guide dimensionality reduction

[...]

Roland Memisevic¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

27 Dec 2005

TL;DR: An approach to improve iterative dimensionality reduction methods by using information contained in the leading eigenvectors of a data affinity matrix by modifying the gradient of an iterative method so that latent space elements belonging to the same cluster are encouraged to move in similar directions during optimization.

...read moreread less

Abstract: We describe an approach to improve iterative dimensionality reduction methods by using information contained in the leading eigenvectors of a data affinity matrix. Using an insight from the area of spectral clustering, we suggest modifying the gradient of an iterative method, so that latent space elements belonging to the same cluster are encouraged to move in similar directions during optimization. We also describe way to achieve this without actually having to explicitly perform an eigendecomposition. Preliminary experiments show that our approach makes it possible to speed up iterative methods and helps them to find better local minima of their objective function.

...read moreread less

4 citations

Reference Entry•DOI•

Artificial Intelligence: Neural Networks

[...]

Geoffrey E. Hinton¹•Institutions (1)

University College London¹

14 Oct 2005

TL;DR: The results show clear trends in the development of neural networks in both the action and perceptual system as well as in the models used for decision-making.

...read moreread less

Abstract: First page of article Keywords: artificial intelligence: neural networks; action and perceptual system

...read moreread less

4 citations

Embedding viaclustering: Using spectral information toguidedimensionality reduction

[...]

Roland Memisevic, Geoffrey E. Hinton

01 Jan 2005

TL;DR: An approach toimprove iterative di- mensionality reduction methods by using information contained in the leading eigenvectors of a dataaffinity matrix is described, making it possible tospeed upiterative methods and helps them to find better local minima of their objective function.

...read moreread less

Abstract: We describe anapproach toimprove iterative di- mensionality reduction methods byusing information contained intheleading eigenvectors ofa dataaffinity matrix. Using an insight fromtheareaofspectral clustering, we suggest modifying thegradient ofaniterative method, sothatlatent spaceelements belonging tothesamecluster areencouraged to moveinsimilar directions during optimization. Wealso describe waytoachieve this without actually having toexplicitly perform aneigendecomposition. Preliminary experiments showthat our approach makesitpossible tospeed upiterative methods and helps themtofind better local minima oftheir objective function.

...read moreread less