Showing papers by "Geoffrey E. Hinton published in 2007"

PDF

Open Access

Proceedings Article•DOI•

Restricted Boltzmann machines for collaborative filtering

[...]

Ruslan Salakhutdinov¹, Andriy Mnih¹, Geoffrey E. Hinton¹•Institutions (1)

20 Jun 2007

TL;DR: This paper shows how a class of two-layer undirected graphical models, called Restricted Boltzmann Machines (RBM's), can be used to model tabular data, such as user's ratings of movies, and demonstrates that RBM's can be successfully applied to the Netflix data set.

...read moreread less

Abstract: Most of the existing approaches to collaborative filtering cannot handle very large data sets. In this paper we show how a class of two-layer undirected graphical models, called Restricted Boltzmann Machines (RBM's), can be used to model tabular data, such as user's ratings of movies. We present efficient learning and inference procedures for this class of models and demonstrate that RBM's can be successfully applied to the Netflix data set, containing over 100 million user/movie ratings. We also show that RBM's slightly outperform carefully-tuned SVD models. When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflix's own system.

...read moreread less

1,960 citations

Journal Article•DOI•

Learning multiple layers of representation

[...]

Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

01 Oct 2007-Trends in Cognitive Sciences

TL;DR: The limitations of backpropagation learning can now be overcome by using multilayer neural networks that contain top-down connections and training them to generate sensory data rather than to classify it.

...read moreread less

960 citations

Proceedings Article•DOI•

Three new graphical models for statistical language modelling

[...]

Andriy Mnih¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

20 Jun 2007

TL;DR: It is shown how real-valued distributed representations for words can be learned at the same time as learning a large set of stochastic binary hidden features that are used to predict the distributed representation of the next word from previous distributed representations.

...read moreread less

Abstract: The supremacy of n-gram models in statistical language modelling has recently been challenged by parametric models that use distributed representations to counteract the difficulties caused by data sparsity. We propose three new probabilistic language models that define the distribution of the next word in a sequence given several preceding words by using distributed representations of those words. We show how real-valued distributed representations for words can be learned at the same time as learning a large set of stochastic binary hidden features that are used to predict the distributed representation of the next word from previous distributed representations. Adding connections from the previous states of the binary hidden features improves performance as does adding direct connections between the real-valued distributed representations. One of our models significantly outperforms the very best n-gram models.

...read moreread less

653 citations

Proceedings Article•

Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure

[...]

Ruslan Salakhutdinov, Geoffrey E. Hinton

11 Mar 2007

TL;DR: This work shows how to pretrain and fine-tune a multilayer neural network to learn a nonlinear transformation from the input space to a lowdimensional feature space in which K-nearest neighbour classification performs well.

...read moreread less

Abstract: We show how to pretrain and fine-tune a multilayer neural network to learn a nonlinear transformation from the input space to a lowdimensional feature space in which K-nearest neighbour classification performs well. We also show how the non-linear transformation can be improved using unlabeled data. Our method achieves a much lower error rate than Support Vector Machines or standard backpropagation on a widely used version of the MNIST handwritten digit recognition task. If some of the dimensions of the low-dimensional feature space are not used for nearest neighbor classification, our method uses these dimensions to explicitly represent transformations of the digits that do not affect their identity.

...read moreread less

531 citations

Book Chapter•DOI•

To recognize shapes, first learn to generate images.

[...]

Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

01 Jan 2007-Progress in Brain Research

TL;DR: This chapter describes several of the proposed algorithms and shows how they can be combined to produce hybrid methods that work efficiently in networks with many layers and millions of adaptive connections.

...read moreread less

Abstract: The uniformity of the cortical architecture and the ability of functions to move to different areas of cortex following early damage strongly suggest that there is a single basic learning algorithm for extracting underlying structure from richly structured, high-dimensional sensory data. There have been many attempts to design such an algorithm, but until recently they all suffered from serious computational weaknesses. This chapter describes several of the proposed algorithms and shows how they can be combined to produce hybrid methods that work efficiently in networks with many layers and millions of adaptive connections.

...read moreread less

336 citations

Proceedings Article•

Learning Multilevel Distributed Representations for High-Dimensional Sequences

[...]

Ilya Sutskever, Geoffrey E. Hinton

11 Mar 2007

TL;DR: A new family of non-linear sequence models that are substantially more powerful than hidden Markov models or linear dynamical systems are described, and their performance is demonstrated using synthetic video sequences of two balls bouncing in a box.

...read moreread less

Abstract: We describe a new family of non-linear sequence models that are substantially more powerful than hidden Markov models or linear dynamical systems. Our models have simple approximate inference and learning procedures that work well in practice. Multilevel representations of sequential data can be learned one hidden layer at a time, and adding extra hidden layers improves the resulting generative models. The models can be trained with very high-dimensional, very non-linear data such as raw pixel sequences. Their performance is demonstrated using synthetic video sequences of two balls bouncing in a box.

...read moreread less

239 citations

Proceedings Article•

Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

[...]

Geoffrey E. Hinton¹, Ruslan Salakhutdinov¹•Institutions (1)

University of Toronto¹

03 Dec 2007

TL;DR: This work shows how to use unlabeled data and a deep belief net (DBN) to learn a good covariance kernel for a Gaussian process.

...read moreread less

Abstract: We show how to use unlabeled data and a deep belief net (DBN) to learn a good covariance kernel for a Gaussian process. We first learn a deep generative model of the unlabeled data using the fast, greedy algorithm introduced by [7]. If the data is high-dimensional and highly-structured, a Gaussian kernel applied to the top layer of features in the DBN works much better than a similar kernel applied to the raw input. Performance at both regression and classification can then be further improved by using backpropagation through the DBN to discriminatively fine-tune the covariance kernel.

...read moreread less

227 citations

Proceedings Article•DOI•

Unsupervised Learning of Image Transformations

[...]

Roland Memisevic¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

17 Jun 2007

TL;DR: A probabilistic model for learning rich, distributed representations of image transformations that develops domain specific motion features, in the form of fields of locally transformed edge filters, and can fantasize new transformations on previously unseen images.

...read moreread less

Abstract: We describe a probabilistic model for learning rich, distributed representations of image transformations. The basic model is defined as a gated conditional random field that is trained to predict transformations of its inputs using a factorial set of latent variables. Inference in the model consists in extracting the transformation, given a pair of images, and can be performed exactly and efficiently. We show that, when trained on natural videos, the model develops domain specific motion features, in the form of fields of locally transformed edge filters. When trained on affine, or more general, transformations of still images, the model develops codes for these transformations, and can subsequently perform recognition tasks that are invariant under these transformations. It can also fantasize new transformations on previously unseen images. We describe several variations of the basic model and provide experimental results that demonstrate its applicability to a variety of tasks.

...read moreread less

220 citations

Proceedings Article•

Modeling image patches with a directed hierarchy of Markov random fields

[...]

Simon Osindero¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

03 Dec 2007

TL;DR: An efficient learning procedure for multilayer generative models that combine the best aspects of Markov random fields and deep, directed belief nets is described and it is shown that this type of model is good at capturing the statistics of patches of natural images.

...read moreread less

Abstract: We describe an efficient learning procedure for multilayer generative models that combine the best aspects of Markov random fields and deep, directed belief nets. The generative models can be learned one layer at a time and when learning is complete they have a very fast inference procedure for computing a good approximation to the posterior distribution in all of the hidden layers. Each hidden layer has its own MRF whose energy function is modulated by the top-down directed connections from the layer above. To generate from the model, each layer in turn must settle to equilibrium given its top-down input. We show that this type of model is good at capturing the statistics of patches of natural images.

...read moreread less

145 citations

Proceedings Article•

[...]

James Cook, Ilya Sutskever, Andriy Mnih, Geoffrey E. Hinton

11 Mar 2007

TL;DR: This work shows how to visualize a set of pairwise similarities between objects by using several different two-dimensional maps, each of which captures different aspects of the similarity structure.

...read moreread less

Abstract: We show how to visualize a set of pairwise similarities between objects by using several different two-dimensional maps, each of which captures different aspects of the similarity structure. When the objects are ambiguous words, for example, different senses of a word occur in different maps, so “river” and “loan” can both be close to “bank” without being at all close to each other. Aspect maps resemble clustering because they model pair-wise similarities as a mixture of different types of similarity, but they also resemble local multi-dimensional scaling because they model each type of similarity by a twodimensional map. We demonstrate our method on a toy example, a database of human wordassociation data, a large set of images of handwritten digits, and a set of feature vectors that represent words.

...read moreread less

99 citations

Variational Learning in Non-Linear Gaussian Belief Networks

[...]

Brendan J. Frey, Geoffrey E. Hinton

01 Jan 2007

TL;DR: In this article, nonlinear units are obtained by passing the outputs of linear Gaussian units through various nonlinearities, and a general variational method that maximizes a lower bound on the likelihood of a training set is presented.

...read moreread less

Abstract: We view perceptual tasks such as vision and speech recognition as inference problems where the goal is to estimate the posterior distribution over latent variables (e.g., depth in stereo vision) given the sensory input. The recent flurry of research in independent component analysis exemplifies the importance of inferring the continuousvalued latent variables of input data. The latent variables found by this method are linearly related to the input, but perception requires nonlinear inferences such as decision-making. Even continuous latent variables such as depth are nonlinearly related to the input. In this paper, we present a unifying framework for stochastic neural networks with nonlinear latent variables. Nonlinear units are obtained by passing the outputs of linear Gaussian units through various nonlinearities. We present a general variational method that maximizes a lower bound on the likelihood of a training set, and give results on two visual feature extraction problems.

...read moreread less