Showing papers by "Geoffrey E. Hinton published in 2006"

PDF

Open Access

Journal Article•DOI•

Reducing the Dimensionality of Data with Neural Networks

[...]

Geoffrey E. Hinton¹, Ruslan Salakhutdinov¹•Institutions (1)

28 Jul 2006-Science

TL;DR: In this article, an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data is described.

...read moreread less

Abstract: High-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors. Gradient descent can be used for fine-tuning the weights in such "autoencoder" networks, but this works well only if the initial weights are close to a good solution. We describe an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

...read moreread less

16,717 citations

Journal Article•DOI•

A fast learning algorithm for deep belief nets

[...]

Geoffrey E. Hinton¹, Simon Osindero¹, Yee Whye Teh²•Institutions (2)

University of Toronto¹, National University of Singapore²

01 Jul 2006-Neural Computation

TL;DR: A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.

...read moreread less

Abstract: We show how to use "complementary priors" to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.

...read moreread less

15,055 citations

Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks

[...]

Geoffrey E. Hinton, Ruslan Salakhutdinov

01 Jan 2006

TL;DR: This work describes an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

...read moreread less

Abstract: High-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors. Gradient descent can be used for fine-tuning the weights in such “autoencoder” networks, but this works well only if the initial weights are close to a good solution. We describe an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

...read moreread less

2,842 citations

Proceedings Article•

Modeling Human Motion Using Binary Latent Variables

[...]

Graham W. Taylor¹, Geoffrey E. Hinton¹, Sam T. Roweis¹•Institutions (1)

University of Toronto¹

04 Dec 2006

TL;DR: A non-linear generative model for human motion data that uses an undirected model with binary latent variables and real-valued "visible" variables that represent joint angles that makes on-line inference efficient and allows for a simple approximate learning procedure.

...read moreread less

Abstract: We propose a non-linear generative model for human motion data that uses an undirected model with binary latent variables and real-valued "visible" variables that represent joint angles. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. Such an architecture makes on-line inference efficient and allows us to use a simple approximate learning procedure. After training, the model finds a single set of parameters that simultaneously capture several different kinds of motion. We demonstrate the power of our approach by synthesizing various motion sequences and by performing on-line filling in of data lost during motion capture.

...read moreread less

728 citations

Journal Article•DOI•

Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation

[...]

Geoffrey E. Hinton¹, Simon Osindero¹, Max Welling¹, Yee Whye Teh¹•Institutions (1)

University of Toronto¹

08 Jul 2006-Cognitive Science

TL;DR: A way of modeling high-dimensional data vectors by using an unsupervised, nonlinear, multilayer neural network in which the activity of each neuron-like unit makes an additive contribution to a global energy score that indicates how surprised the network is by the data vector.

...read moreread less

141 citations

Journal Article•DOI•

Topographic Product Models Applied to Natural Scene Statistics

[...]

Simon Osindero¹, Max Welling², Geoffrey E. Hinton³•Institutions (3)

University of Toronto¹, University of California, Irvine², Canadian Institute for Advanced Research³

01 Feb 2006-Neural Computation

TL;DR: An energy-based model is presented that uses a product of generalized Student-t distributions to capture the statistical structure in data sets to study the topographic organization of Gabor-like receptive fields that the model learns.

...read moreread less

Abstract: We present an energy-based model that uses a product of generalized Student-t distributions to capture the statistical structure in data sets. This model is inspired by and particularly applicable to "natural" data sets such as images. We begin by providing the mathematical framework, where we discuss complete and overcomplete models and provide algorithms for training these models from data. Using patches of natural scenes, we demonstrate that our approach represents a viable alternative to independent component analysis as an interpretive model of biological visual systems. Although the two approaches are similar in flavor, there are also important differences, particularly when the representations are overcomplete. By constraining the interactions within our model, we are also able to study the topographic organization of Gabor-like receptive fields that our model learns. Finally, we discuss the relation of our new approach to previous work—in particular, gaussian scale mixture models and variants of independent components analysis.

...read moreread less

124 citations