scispace - formally typeset
Search or ask a question

Showing papers by "Geoffrey E. Hinton published in 1991"


Journal ArticleDOI
TL;DR: A new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases, which is demonstrated to be able to be solved by a very simple expert network.
Abstract: We present a new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases. The new procedure can be viewed either as a modular version of a multilayer supervised network, or as an associative version of competitive learning. It therefore provides a new link between these two apparently different approaches. We demonstrate that the learning procedure divides up a vowel discrimination task into appropriate subtasks, each of which can be solved by a very simple expert network.

4,338 citations


Journal ArticleDOI
TL;DR: In this paper, a recurrent connectionist network was trained to output semantic feature vectors when presented with letter strings, and when damaged, the network exhibited characteristics that resembled several of the phenomena found in deep dyslexia and semantic access dyslexias.
Abstract: A recurrent connectionist network was trained to output semantic feature vectors when presented with letter strings. When damaged, the network exhibited characteristics that resembled several of the phenomena found in deep dyslexia and semantic-access dyslexia. Damaged networks sometimes settled to the semantic vectors for semantically similar but visually dissimilar words. With severe damage, a forced-choice decision between categories was possible even when the choice of the particular semantic vector within the category was not possible. The damaged networks typically exhibited many mixed visual and semantic errors in which the output corresponded to a word that was both visually and semantically similar. Surprisingly, damage near the output sometimes caused pure visual errors. Indeed, the characteristic error pattern of deep dyslexia occurred with damage to virtually any part of the network.

635 citations


01 Jan 1991
TL;DR: This chapter contains sections titled connectionist Representation and Tensor Product Binding: Definition and Examples, and tensor Product Representation: Properties.
Abstract: This chapter contains sections titled: 1 Introduction, 2 Connectionist Representation and Tensor Product Binding: Definition and Examples, 3 Tensor Product Representation: Properties, 4 Conclusion

515 citations


Proceedings Article
02 Dec 1991
TL;DR: An elastic matching algorithm is used to minimize an energy function that includes both the deformation energy of the digit model and the log probability that the model would generate the inked pixels in the image.
Abstract: Hand-printed digits can be modeled as splines that are governed by about 8 control points. For each known digit, the control points have preferred "home" locations, and deformations of the digit are generated by moving the control points away from their home locations. Images of digits can be produced by placing Gaussian ink generators uniformly along the spline. Real images can be recognized by finding the digit model most likely to have generated the data. For each digit model we use an elastic matching algorithm to minimize an energy function that includes both the deformation energy of the digit model and the log probability that the model would generate the inked pixels in the image. The model with the lowest total energy wins. If a uniform noise process is included in the model of image generation, some of the inked pixels can be rejected as noise as a digit model is fitting a poorly segmented image. The digit models learn by modifying the home locations of the control points.

106 citations



Book
01 Oct 1991
TL;DR: BoltzCONS - dynamic symbol structures in a connectionist network, D.S. Touretzky mapping part-whole hierarchies into connectionist networks, G.L. Hinton recursive distributed representations and the representation of symbolic structures in connectionist systems.
Abstract: BoltzCONS - dynamic symbol structures in a connectionist network, D.S. Touretzky mapping part-whole hierarchies into connectionist networks, G. E. Hinton recursive distributed representations, J.B. Pollack mundane reasoning by settling on a plausible model, M. Derthick tensor product variable binding and the representation of symbolic structures in connectionist systems, P. Smolensky learning and applying contextual constraints in sentence comprehension, M.F. St. John and J.L. McClelland.

48 citations


Book ChapterDOI
01 Jan 1991
TL;DR: This type of mean field network (MFN) with tied weights that is capable of approximating the recognizer for a hidden markov model (HMM) is presented as a way of allowing more powerful representations without abandoning the automatic parameter estimation procedures.
Abstract: Neural networks can be used to discriminate between very similar phonemes and they can handle the variability in time of occurrence by using a time-delay architecture followed by a temporal integration (Lang, Hinton and Waibel, 1990) So far, however, neural networks have been less successful at handling longer duration events that require something equivalent to “time warping” in order to match stored knowledge to the data We present a type of mean field network (MFN) with tied weights that is capable of approximating the recognizer for a hidden markov model (HMM) In the process of settling to a stable state, the MFN finds a blend of likely ways of generating the input string given its internal model of the probabilities of transitions between hidden states and the probabilities of input symbols given a hidden state This blend is a heuristic approximation to the full set of path probabilities that is implicitly represented by an HMM recognizer The learning algorithm for the MFN is less efficient than for an HMM of the same size However, the MFN is capable of using distributed representations of the hidden state, and this can make it exponentially more efficient than an HMM when modelling strings produced by a generator that itself has componential states We view this type of MFN as a way of allowing more powerful representations without abandoning the automatic parameter estimation procedures that have allowed relatively simple models like HMM's to outperform complex AI representations on real tasks

40 citations


Proceedings Article
01 Jan 1991
TL;DR: Simulations demonstrate that this complexity term is more effective than previous complexity terms.
Abstract: Geoffrey E. Hinton Department of Computer Science . U ni versi ty of Toran to Toronto, Canada M5S lA4 One way of simplifying neural networks so they generalize better is to add an extra t.erm 10 the error fUll c tion that will penalize complexit.y. \Ve propose a new penalt.y t.erm in which the dist rihution of weight values is modelled as a mixture of multiple gaussians . C nder this model, a set of weights is simple if the weights can be clustered into subsets so that the weights in each cluster have similar values . We allow the parameters of the mixture model to adapt at t.he same time as t.he network learns. Simulations demonstrate that this complexity term is more effective than previous complexity terms.

32 citations


Book ChapterDOI
01 Jan 1991
TL;DR: It is shown that the CHS approximates steepest descent and that the proportional error in the approximation can be expected to decrease as the size of the network increases.
Abstract: The simplicity and locality of the “contrastive Hebb synapse” (CHS) used in Boltzmann machine learning makes it an attractive model for real biological synapses. The slow learning exhibited by the stochastic Boltzmann machine can be greatly improved by using a mean field approximation and it has been shown (Hinton, 1989) that the CHS also performs steepest descent in these deterministic mean field networks. A major weakness of the learning procedure, from a biological perspective, is that the derivation assumes detailed symmetry of the connectivity. Using networks with purely asymmetric connectivity, we show that the CHS still works in practice provided the connectivity is grossly symmetrical so that if unit i sends a connection to unit j, there are numerous indirect feedback paths from j to i. So long as the network settles to a stable state, we show that the CHS approximates steepest descent and that the proportional error in the approximation can be expected to decrease as the size of the network increases.

18 citations


01 Jan 1991
TL;DR: This chapter contains sections titled Representing Linked Lists on an Associative Retrieval Machine, Managing a Distributed Memory, and Connectionist Implementation.
Abstract: This chapter contains sections titled: 1. Introduction, 2. Direct and Indirect Representations, 3. Representing Linked Lists on an Associative Retrieval Machine, 4. Associative Stacks, 5. Associative Trees, 6. Connectionist Implementation, 7. Managing a Distributed Memory, 8. Discussion, 9. Conclusions

17 citations


Proceedings Article
02 Dec 1991
TL;DR: Two new models which handle surfaces with discontinuities are proposed which develop a mixture of expert interpolators and specialized, asymmetric interpolators that do not cross the discontinUities.
Abstract: We have previously described an unsupervised learning procedure that discovers spatially coherent properties of the world by maximizing the information that parameters extracted from different parts of the sensory input convey about some common underlying cause. When given random dot stereograms of curved surfaces, this procedure learns to extract surface depth because that is the property that is coherent across space. It also learns how to interpolate the depth at one location from the depths at nearby locations (Becker and Hinton, 1992). In this paper, we propose two new models which handle surfaces with discontinuities. The first model attempts to detect cases of discontinuities and reject them. The second model develops a mixture of expert interpolators. It learns to detect the locations of discontinuities and to invoke specialized, asymmetric interpolators that do not cross the discontinuities.

01 Jan 1991
TL;DR: In this article, the authors discuss the use of recursive auto-associative memory in experiments with Recursive Auto-Associative Memory (RAM) and explore experiments with RAM.
Abstract: This chapter contains sections titled: 1 Introduction, 2 Recursive Auto-Associative Memory, 3 Experiments with Recursive Auto-Associative Memories, 4 Discussion, 5 Conclusion


Proceedings ArticleDOI
TL;DR: Simulations show that using an information-theoretic algorithm called IMAX, a network can be trained to represent depth by observing random dot stereograms of surfaces with continuously varying disparities.
Abstract: In the unsupervised learning paradigm, a network of neuron-like units is presented with an ensemble of input patterns from a structured environment, such as the visual world, and learns to represent the regularities in that input. The major goal in developing unsupervised learning algorithms is to find objective functions that characterize the quality of the network's representation without explicitly specifying the desired outputs of any of the units. The sort of objective functions considered cause a unit to become tuned to spatially coherent features of visual images (such as texture, depth, shading, and surface orientation), by learning to predict the outputs of other units which have spatially adjacent receptive fields. Simulations show that using an information-theoretic algorithm called IMAX, a network can be trained to represent depth by observing random dot stereograms of surfaces with continuously varying disparities. Once a layer of depth-tuned units has developed, subsequent layers are trained to perform surface interpolation of curved surfaces, by learning to predict the depth of one image region based on depth measurements in surrounding regions. An extension of the basic model allows a population of competing neurons to learn a distributed code for disparity, which naturally gives rise to a representation of discontinuities.© (1991) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.