scispace - formally typeset
Search or ask a question

Showing papers by "Geoffrey E. Hinton published in 1999"


Book
01 Jun 1999
TL;DR: In this article, an unsupervised learner brings to bear prior biases as to what aspects of the structure of the input should be captured in the output, which is called prior bias capture.
Abstract: Unsupervised learning studies how systems can learn to represent particular input patterns in a way that reflects the statistical structure of the overall collection of input patterns. By contrast with SUPERVISED LEARNING or REINFORCEMENT LEARNING, there are no explicit target outputs or environmental evaluations associated with each input; rather the unsupervised learner brings to bear prior biases as to what aspects of the structure of the input should be captured in the output.

1,290 citations


Book
01 Jun 1999
TL;DR: A way of finessing this combinatorial explosion by maximizing an easily computed lower bound on the probability of the observations is described, viewed as a form of hierarchical self-supervised learning that may relate to the function of bottom-up and top-down cortical processing pathways.
Abstract: Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative models, each pattern can be generated in exponentially many ways. It is thus intractable to adjust the parameters to maximize the probability of the observed patterns. We describe a way of finessing this combinatorial explosion by maximizing an easily computed lower bound on the probability of the observations. Our method can be viewed as a form of hierarchical self-supervised learning that may relate to the function of bottom-up and top-down cortical processing pathways.

1,018 citations


Proceedings ArticleDOI
01 Jan 1999
TL;DR: This training algorithm suggests a biologically plausible way of learning neural population codes by maximizing the probabilities that the individual models assign to the observed data.
Abstract: It is possible to combine multiple probabilistic models of the same data by multiplying the probabilities together and then renormalizing. This is a very efficient way to model high-dimensional data which simultaneously satisfies many different low dimensional constraints. Each individual expert model can focus on giving high probability to data vectors that satisfy just one of the constraints. Data vectors that satisfy this one constraint but violate other constraints will be ruled out by their low probability under the other expert models. Training a product of models appears difficult because, in addition to maximizing the probabilities that the individual models assign to the observed data, it is necessary to make the models disagree on unobserved regions of the data space. However, if the individual models are tractable there is a fairly efficient way to train a product of models. This training algorithm suggests a biologically plausible way of learning neural population codes.

377 citations


Book
01 Jan 1999
TL;DR: This volume of Foundations of Neural Computation, on unsupervised learning algorithms, focuses on neural network learning algorithms that do not require an explicit teacher to extract an efficient internal representation of the statistical structure implicit in the inputs.
Abstract: Since its founding in 1989 by Terrence Sejnowski, Neural Computation has become the leading journal in the field. Foundations of Neural Computation collects, by topic, the most significant papers that have appeared in the journal over the past nine years. This volume of Foundations of Neural Computation, on unsupervised learning algorithms, focuses on neural network learning algorithms that do not require an explicit teacher. The goal of unsupervised learning is to extract an efficient internal representation of the statistical structure implicit in the inputs. These algorithms provide insights into the development of the cerebral cortex and implicit learning in humans. They are also of interest to engineers working in areas such as computer vision and speech recognition who seek efficient representations of raw input data.

303 citations


Journal ArticleDOI
TL;DR: This article presents a general variational method that maximizes a lower bound on the likelihood of a training set and gives results on two visual feature extraction problems.
Abstract: This chapter contains sections titled: Introduction, Variational Expectation Maximization, Visual Feature Extraction, Handwriting Recognition, Conclusions, Appendix A: The E-Step, Appendix B: The M-Step, Appendix C: M(μ, σ), V(μ, σ) and Their Derivatives for Interesting Nonlinear Functions, Acknowledgments, References

78 citations


Proceedings Article
29 Nov 1999
TL;DR: Using parse trees as internal representations of images, credibility networks are able to perform segmentation and recognition simultaneously, removing the need for ad hoc segmentation heuristics.
Abstract: We describe a class of probabilistic models that we call credibility networks. Using parse trees as internal representations of images, credibility networks are able to perform segmentation and recognition simultaneously, removing the need for ad hoc segmentation heuristics. Promising results in the problem of segmenting handwritten digits were obtained.

57 citations


Proceedings Article
29 Nov 1999
TL;DR: This work shows how to represent sharp posterior probability distributions using real valued coefficients on broadly-tuned basis functions and describes a simple simulation in which spiking neurons learn to model an image sequence by fitting a dynamic generative model.
Abstract: We first show how to represent sharp posterior probability distributions using real valued coefficients on broadly-tuned basis functions. Then we show how the precise times of spikes can be used to convey the real-valued coefficients on the basis functions quickly and accurately. Finally we describe a simple simulation in which spiking neurons learn to model an image sequence by fitting a dynamic generative model.

46 citations


Book ChapterDOI
01 Feb 1999
TL;DR: In this article, the authors describe a directed acyclic graphical model that contains a hierarchy of linear units and a mechanism for dynamically selecting an appropriate subset of these units to model each observation.
Abstract: We describe a directed acyclic graphical model that contains a hierarchy of linear units and a mechanism for dynamically selecting an appropriate subset of these units to model each observation. The non-linear selection mechanism is a hierarchy of binary units each of which gates the output of one of the linear units. There are no connections from linear units to binary units, so the generative model can be viewed as a logistic belief net (Neal 1992) which selects a skeleton linear model from among the available linear units. We show that Gibbs sampling can be used to learn the parameters of the linear and binary units even when the sampling is so brief that the Markov chain is far from equilibrium.

40 citations


Book
01 Jun 1999
TL;DR: It is shown how MDL can be used to develop highly redundant population codes, thus allowing flexibility, as the network develops a discontinuous topography when presented with different input classes.

20 citations


Proceedings ArticleDOI
01 Jan 1999
TL;DR: This work demonstrates that a novel hierarchical, generative model that can be viewed as a nonlinear generalisation of factor analysis and can be implemented in a neural network performs perceptual inference in a probabilistically consistent manner by using top-down, bottom-up and lateral connections.
Abstract: A persistent worry with computational models of unsupervised learning is that learning will become more difficult as the problem is scaled. We examine this issue in the context of a novel hierarchical, generative model that can be viewed as a nonlinear generalisation of factor analysis and can be implemented in a neural network. The model performs perceptual inference in a probabilistically consistent manner by using top-down, bottom-up and lateral connections. These connections can be learned using simple rules that require only locally available information. We first demonstrate that the model can extract a sparse, distributed, hierarchical representation of global disparity from simplified random-dot stereograms. We then investigate some of the scaling properties of the algorithm on this problem and find that: 1) increasing the image size leads to faster and more reliable learning; 2) increasing the depth of the network from one to two hidden layers leads to better representations at the first hidden layer; and 3) once one part of the network has discovered how to represent disparity, it "supervises" other parts of the network, greatly speeding up their learning.

3 citations


Proceedings ArticleDOI
23 Aug 1999
TL;DR: Experimental results show that the MFA-based approach can obtain better classification performance than the conventional subspace methods.
Abstract: This paper describes a practical application of a mixture of factor analyzers (MFA) to pattern recognition. The MFA extracts locally linear manifolds underlying given high dimensional data. In this respect, the NFA-based approach is similar to the conventional subspace methods that approximate the data space with low dimensional linear subspaces. However, the MFA-based classifier, unlike the conventional subspace methods, can perform classification based on the Bayes decision rule due to its probabilistic formulation. Experimental results show that the MFA-based approach can obtain better classification performance than the conventional subspace methods.

01 Jan 1999
TL;DR: This paper demonstrates the possibility of replacing the numerical simulation of nontrivial dynamic models with a dramatically more efficient "NeuroAnimator" that exploits neural networks and introduces a remarkably fast algorithm for learning controllers that enables either complex physics-based models or their neural network emulators to synthesize motions satisfying prescribed animation goals.
Abstract: Computer animation through the numerical simulation of physics-based graphics models offers unsurpassed realism, but it can be computationally demanding. This paper demonstrates the possibility of replacing the numerical simulation of nontrivial dynamic models with a dramatically more efficient "NeuroAnimator" that exploits neural networks. NeuroAnimators are automatically trained off-line to emulate physical dynamics through the observation of physics-based models in action. Depending on the model, its neural network emulator can yield physically realistic animation one or two orders of magnitude faster than conventional numerical simulation. We demonstrate NeuroAnimators for a variety of physics-based models. By exploiting the network structure of the NeuroAnimator, we also introduce a remarkably fast algorithm for learning controllers that enables either complex physics-based models or their neural network emulators to synthesize motions satisfying prescribed animation goals.