scispace - formally typeset
Search or ask a question

Showing papers by "Geoffrey E. Hinton published in 1990"


Journal ArticleDOI
TL;DR: A translation-invariant back-propagation network is described that performs better than a sophisticated continuous acoustic parameter hidden Markov model on a noisy, 100-speaker confusable vocabulary isolated word recognition task.

635 citations


Book ChapterDOI
TL;DR: These relatively simple, gradient-descent learning procedures work well for small tasks, and the new challenge is to find ways of improving their convergence rate and their generalization abilities so that they can be applied to larger, more realistic tasks.
Abstract: A major goal of research on networks of neuron-like processing units is to discover efficient learning procedures that allow these networks to construct complex internal representations of their environment. The learning procedures must be capable of modifying the connection strengths in such a way that internal units which are not part of the input or output come to represent important features of the task domain. Several interesting gradient-descent procedures have recently been discovered. Each connection computes the derivative, with respect to the connection strength, of a global measure of the error in the performance of the network. The strength is then adjusted in the direction that decreases the error. These relatively simple, gradient-descent learning procedures work well for small tasks and the new challenge is to find ways of improving their convergence rate and their generalization abilities so that they can be applied to larger, more realistic tasks.

514 citations


Journal ArticleDOI
TL;DR: Three different ways of mapping part-whole hierarchies into connectionist networks are described, suggesting that neural networks have two quite different methods for performing inference.

354 citations


Book
03 Jan 1990
TL;DR: By using the appropriate interpretation for the way in which a DBM represents the probability of an output vector given an input vector, it is shown that the DBM performs steepest descent in the same function as the original SBM, except at rare discontinuities.
Abstract: The Boltzmann machine learning procedure has been successfully applied in deterministic networks of analog units that use a mean field approximation to efficiently simulate a truly stochastic system (Peterson and Anderson 1987). This type of “deterministic Boltzmann machine” (DBM) learns much faster than the equivalent “stochastic Boltzmann machine” (SBM), but since the learning procedure for DBM's is only based on an analogy with SBM's, there is no existing proof that it performs gradient descent in any function, and it has only been justified by simulations. By using the appropriate interpretation for the way in which a DBM represents the probability of an output vector given an input vector, it is shown that the DBM performs steepest descent in the same function as the original SBM, except at rare discontinuities. A very simple way of forcing the weights to become symmetrical is also described, and this makes the DBM more biologically plausible than back-propagation (Werbos 1974; Parker 1985; Rumelhart...

150 citations


Book
01 Oct 1990
TL;DR: The authors concentrate here on connectionism's potential as a practical technology for building intelligent systems, and also some of the unsolved problems facing this approach.
Abstract: A number of researchers have begun exploring the use of massively parallel architectures in an attempt to get around the limitations of conventional symbol processing. Many of these parallel architectures are connectionist: The system's collection of permanent knowledge is stored as a pattern of connections or connection strengths among the processing elements, so the knowledge directly determines how the processing elements interact rather that sitting passively in a memory, waiting to be looked at by the CPU. Some connectionist schemes use formal, symbolic representations, while others use more analog approaches. Some even develop their own internal representations after seeing examples of the patterns they are to recognize or the relationships they are to store. Connectionism is somewhat controversial in the AI community. It is new, still unproven in large-scale practical applications, and very different in style from the traditional AI approach. The authors have only begun to explore the behavior and potential of connectionist networks. In this article, the authors describe some of the central issues and ideas of connectionism, and also some of the unsolved problems facing this approach. Part of the motivation for connectionist research is the possible similarity in function between connectionist networks and the neutral networksmore » of the human cortex, but they concentrate here on connectionism's potential as a practical technology for building intelligent systems.« less

136 citations


Journal ArticleDOI
TL;DR: Using the probably approximately correct framework developed in [12], Baum and Haussler have shown that if a neural network can be trained to automatically construct its own internal representations, then it might be better to settle for the system that works best.

135 citations


Proceedings Article
01 Oct 1990
TL;DR: Simulations reveal that the modular architecture, composed of competing expert networks, suggested by Jacobs, Jordan, Nowlan and Hinton (1991), is capable of uncovering interesting decompositions in a complex task.
Abstract: We compare the performance of the modular architecture, composed of competing expert networks, suggested by Jacobs, Jordan, Nowlan and Hinton (1991) to the performance of a single back-propagation network on a complex, but low-dimensional, vowel recognition task. Simulations reveal that this system is capable of uncovering interesting decompositions in a complex task. The type of decomposition is strongly influenced by the nature of the input to the gating network that decides which expert to use for each case. The modular architecture also exhibits consistently better generalization on many variations of the task.

108 citations




Journal ArticleDOI
TL;DR: This work shows that the bootstrap or decision-directed version of the Widrow-Hoff rule can be viewed as an unsupervised clustering algorithm in which the data points are transformed so that they form two clusters that are as tight as possible.
Abstract: An algorithm that is widely used for adaptive equalization in current modems is the bootstrap or decision-directed version of the Widrow-Hoff rule. We show that this algorithm can be viewed as an unsupervised clustering algorithm in which the data points are transformed so that they form two clusters that are as tight as possible. The standard algorithm performs gradient ascent in a crude model of the log likelihood of generating the transformed data points from two gaussian distributions with fixed centers. Better convergence is achieved by using the exact gradient of the log likelihood.

42 citations


Book
01 Jan 1990
TL;DR: This thesis describes a frame system similar to KL-ONE, called micro-KLONE, for representing and reasoning about knowledge which may be incomplete or inconsistent, based on probabilistic sampling to find a single plausible model of the domain in order to answer a query.
Abstract: This thesis describes a frame system similar to KL-ONE, called micro-KLONE, for representing and reasoning about knowledge which may be incomplete or inconsistent. An unusual semantics appropriate to familiar situations is proposed. It is based on probabilistic sampling to find a single plausible model of the domain in order to answer a query. Correct answering of queries is intractable, so the implementation make two approximations in order to run quickly: (1) The underlying connectionist architecture is only large enough to represent partial models of the domain, and (2) the system is only allowed to search for a limited time, so it may not even find the best partial intepretation. Lacking a provably correct implementation, the usefulness of the system becomes an empirical question. The "Ted Turner" problem is presented as an example in which the system draws an interesting common sense conclusion to a counterfactual query.

Proceedings Article
01 Oct 1990
TL;DR: Using an unsupervised learning procedure, a network is trained on an ensemble of images of the same two-dimensional object at different positions, orientations and sizes, and can reject instances of other shapes by using the fact that the predictions made by its two halves disagree.
Abstract: Using an unsupervised learning procedure, a network is trained on an ensemble of images of the same two-dimensional object at different positions, orientations and sizes. Each half of the network "sees" one fragment of the object, and tries to produce as output a set of 4 parameters that have high mutual information with the 4 parameters output by the other half of the network. Given the ensemble of training patterns, the 4 parameters on which the two halves of the network can agree are the position, orientation, and size of the whole object, or some recoding of them. After training, the network can reject instances of other shapes by using the fact that the predictions made by its two halves disagree. If two competing networks are trained on an unlabelled mixture of images of two objects, they cluster the training cases on the basis of the objects' shapes, independently of the position, orientation, and size.

ReportDOI
01 Jan 1990
TL;DR: The simplicity and locality of the contrastive Hebb synapse used in Boltzmann machine learning makes it an attractive model for real biological synapses and it is shown that the CHS still works in practice provided the connectivity is grossly symmetrical.
Abstract: : The simplicity and locality of the contrastive Hebb synapse (CHS) used in Boltzmann machine learning makes it an attractive model for real biological synapses. The slow learning exhibited by the stochastic Boltzmann machine can be greatly improved by using a mean field approximation and it has been shown (Hinton, 1989) that the CHS also performs steepest descent in these deterministic mean field networks. A major weakness of the learning procedure, from a biological perspective, is that the derivation assumes detailed symmetry of the connectivity. Using networks with purely asymmetric connectivity, we show that the CHS still works in practice provided the connectivity is grossly symmetrical so that if unit i sends a connection to unit j, there are numerous indirect feedback paths from j to i.

20 Apr 1990
TL;DR: One reason for expecting back-propagation to be good at speech is the success that hidden Markov models have enjoyed in speech, and it can be useful when there is a rigorous automatic method for tuning its parameters.
Abstract: : Currently, one of the most powerful connectionist learning procedures is back-propagation which repeatedly adjusts the weights in a network so as to minimize a measure of the difference between the actual output vector of the network and a desired output vector given the current input vector. The simple weight adjusting rule is derived by propagating partial derivatives of the error backwards through the net using the chain rule. Experiments have shown that back-propagation has most of the properties desired by connectionists. As with any worthwhile learning rule, it can learn non-linear black box functions and make fine distinctions between input patterns in the presence of noise. Moreover, starting from random initial states, back-propagation networks can learn to use their hidden (intermediate layer) units to efficiently represent the structure that is inherent in their input data, often discovering intuitively pleasing features. The fact that back-propagation can discover features and distinguish between similar patterns in the presence of noise makes it a natural candidate as a speech recognition method. Another reason for expecting back-propagation to be good at speech is the success that hidden Markov models have enjoyed in speech can be useful when there is a rigorous automatic method for tuning its parameters.