scispace - formally typeset
Search or ask a question

Showing papers by "Geoffrey E. Hinton published in 2000"


Journal ArticleDOI
TL;DR: A new statistical model for time series that iteratively segments data into regimes with approximately linear dynamics and learns the parameters of each of these linear regimes is introduced and the results suggest that variational approximations are a viable method for inference and learning in switching state-space models.
Abstract: We introduce a new statistical model for time series that iteratively segments data into regimes with approximately linear dynamics and learns the parameters of each of these linear regimes. This model combines and generalizes two of the most widely used stochastic time-series models— hidden Markov models and linear dynamical systems—and is closely related to models that are widely used in the control and econometrics literatures. It can also be derived by extending the mixture of experts neural network (Jacobs, Jordan, Nowlan, & Hinton, 1991) to its fully dynamical version, in which both expert and gating networks are recurrent. Inferring the posterior probabilities of the hidden states of this model is computationally intractable, and therefore the exact expectation maximization (EM) algorithm cannot be applied. However, we present a variational approximation that maximizes a lower bound on the log-likelihood and makes use of both the forward and backward recursions for hidden Markov models and the Kalman filter recursions for linear dynamical systems. We tested the algorithm on artificial data sets and a natural data set of respiration force from a patient with sleep apnea. The results suggest that variational approximations are a viable method for inference and learning in switching state-space models.

478 citations


Proceedings Article
01 Jan 2000
TL;DR: A neurally-inspired, unsupervised learning algorithm that builds a non-linear generative model for pairs of face images from the same individual and individuals are then recognized by finding the highest relative probability pair among all pairs that consist of a test image and an image whose identity is known.
Abstract: We describe a neurally-inspired, unsupervised learning algorithm that builds a non-linear generative model for pairs of face images from the same individual. Individuals are then recognized by finding the highest relative probability pair among all pairs that consist of a test image and an image whose identity is known. Our method compares favorably with other methods in the literature. The generative model consists of a single layer of rate-coded, non-linear feature detectors and it has the property that, given a data vector, the true posterior probability distribution over the feature detector activities can be inferred rapidly without iteration or approximation. The weights of the feature detectors are learned by comparing the correlations of pixel intensities and feature activations in two phases: When the network is observing real data and when it is observing reconstructions of real data generated from the feature activations.

157 citations


Journal ArticleDOI
01 Aug 2000
TL;DR: This work presents a new EM algorithm which performs split and merge operations on the Gaussians to escape from these configurations of Gaussian mixture models, which often gets caught in local maxima of the likelihood.
Abstract: The EM algorithm for Gaussian mixture models often gets caught in local maxima of the likelihood which involve having too many Gaussians in one part of the space and too few in another, widely separated part of the space. We present a new EM algorithm which performs split and merge operations on the Gaussians to escape from these configurations. This algorithm uses two novel criteria for efficiently selecting the split and merge candidates. Experimental results on synthetic and real data show the effectiveness of using the split and merge operations to improve the likelihood of both the training data and of held-out test data.

80 citations


Proceedings Article
01 Jan 2000
TL;DR: The product of experts model is found to perform comparably to table-based Q-learning for small instances of the task, and continues to perform well when the problem becomes too large for a table- based representation.
Abstract: The problem of reinforcement learning in large factored Markov decision processes is explored. The Q-value of a state-action pair is approximated by the free energy of a product of experts network. Network parameters are learned on-line using a modified SARSA algorithm which minimizes the inconsistency of the Q-values of consecutive state-action pairs. Actions are chosen based on the current value estimates by fixing the current state and sampling actions from the network using Gibbs sampling. The algorithm is tested on a co-operative multi-agent task. The product of experts model is found to perform comparably to table-based Q-learning for small instances of the task, and continues to perform well when the problem becomes too large for a table-based representation.

27 citations


01 Jan 2000
TL;DR: This work discsus Hinton’s (1989) relative payoff procedure (RPP), a static reinforcement learning algorithm whose foundation is not stochastic gradient ascent, and shows circumstances under which applying the RPP is guaranteed to increase the mean return.
Abstract: We discsus Hinton’s (1989) relative payoff procedure (RPP), a static reinforcement learning algorithm whose foundation is not stochastic gradient ascent. We show circumstances under which applying the RPP is guaranteed to increase the mean return, even though it can make large changes in the values of the parameters. The proof is based on a mapping between the RPP and a form of the expectation-maximisation procedure of Dempster, Laird & Rubin (1976).

25 citations


Proceedings Article
01 Jan 2000
TL;DR: In this article, the product of experts learning procedure is used to discover a set of stochastic binary features that constitute a non-linear generative model of handwritten images of digits.
Abstract: The product of experts learning procedure [1] can discover a set of stochastic binary features that constitute a non-linear generative model of handwritten images of digits. The quality of generative models learned in this way can be assessed by learning a separate model for each class of digit and then comparing the unnormalized probabilities of test images under the 10 different class-specific models. To improve discriminative performance, it is helpful to learn a hierarchy of separate models for each digit class. Each model in the hierarchy has one layer of hidden units and the nth level model is trained on data that consists of the activities of the hidden units in the already trained (n - 1)th level model. After training, each level produces a separate, unnormalized log probabilty score. With a three-level hierarchy for each of the 10 digit classes, a test image produces 30 scores which can be used as inputs to a supervised, logistic classification network that is trained on separate data. On the MNIST database, our system is comparable with current state-of-the-art discriminative methods, demonstrating that the product of experts learning procedure can produce effective generative models of high-dimensional data.

21 citations


Proceedings ArticleDOI
24 Jul 2000
TL;DR: This paper presents results in two simple domains, which show that learning leads to good generalization in linear relational embedding and an extended formulation of LRE that solves both these problems.
Abstract: Linear relational embedding (LRE) was introduced previously by the authors (1999) as a means of extracting a distributed representation of concepts from relational data. The original formulation cannot use negative information and cannot properly handle data in which there are multiple correct answers. In this paper we propose an extended formulation of LRE that solves both these problems. We present results in two simple domains, which show that learning leads to good generalization.

17 citations


Journal ArticleDOI

10 citations


Proceedings Article
30 Jul 2000
TL;DR: It is difficult to fit a product of experts to data using maximum likelihood because the gradient of the log likelihood is intractable, but there is an efficient way of optimizing a different objective function and this produces good models of high-dimensional data.
Abstract: It is possible to combine multiple non-linear probabilistic models of the same data by multiplying the probability distributions together and then renormalizing. A “productof experts”is a very efficient way to model data that simultaneously satisfies many different constraints. It is difficult to fit a product of experts to data using maximum likelihood because the gradient of the log likelihood is intractable, but there is an efficient way of optimizing a different objective function and this produces good models of high-dimensional data.

7 citations


Proceedings Article
29 Jun 2000
TL;DR: Linear Relational Embedding is a method of learning a distributed representation of concepts from data consisting of binary relations between concepts that is fast and leads to good generalization on a task involving family relationships.
Abstract: Linear Relational Embedding is a method of learning a distributed representation of concepts from data consisting of binary relations between concepts. Concepts are represented as vectors, binary relations as matrices, and the operation of applying a relation to a concept as a matrix-vector multiplication that produces an approximation to the related concept. A representation for concepts and relations is learned by maximizing an appropriate discriminative goodness function using gradient ascent. On a task involving family relationships, learning is fast and leads to good generalization.

7 citations