scispace - formally typeset
Search or ask a question

Showing papers by "Geoffrey E. Hinton published in 1987"


Journal Article
TL;DR: The assumption that acquired character istics are not in- herited is ofte n taken to imply that adaptations t he adaptations an organism learns dur ing its lifeti me cannot guide the course of evolut ion as discussed by the authors.
Abstract: The assumption that acquired character istics are not in­ herited is ofte n taken to imply t hat t he adaptations t hat an organism learns dur ing its lifeti me cannot guide t he course of evolut ion . This infere nce is incor rec t (2). Learni ng alt ers the shape of t he search space in which evolu tio n operates and thereby pro vides good evolut ion ar y paths towa rds sets of co-adapted alleles. We demonst r at e t hat th is effect allows learning organisms to evolve much faster than their 000 ­ learning equivalents, even though the characteris tics acquired by t he phenotype are not communicated to the genotype.

1,065 citations


Proceedings Article
01 Jan 1987
TL;DR: All the original associations of a network can be "deblurred" by rehearsing on just a few of them by allowing the fast weights to take on values that temporarily cancel out the changes in the slow weights caused by the subsequent learning.
Abstract: Connectionist models usually have a single weight on each connection. Some interesting new properties emerge if each connection has two weights: A slowly changing, plastic weight which stores long-term knowledge and a fast-changing, elastic weight which stores temporary knowledge and spontaneously decays towards zero. If a network learns a set of associations and then these associations are "blurred" by subsequent learning, all the original associations can be "deblurred" by rehearsing on just a few of them. The rehearsal allows the fast weights to take on values that temporarily cancel out the changes in the slow weights caused by the subsequent learning.

243 citations


Book ChapterDOI
15 Jun 1987
TL;DR: This paper describes a recently developed procedure that can learn to perform a recognition task and uses canonical internal representations of the patterns to identify familiar patterns in novel positions.
Abstract: One major goal of research on massively parallel networks of neuron-like processing elements is to discover efficient methods for recognizing patterns. Another goal is to discover general learning procedures that allow networks to construct the internal representations that are required for complex tasks. This paper describes a recently developed procedure that can learn to perform a recognition task. The network is trained on examples in which the input vector represents an instance of a pattern in a particular position and the required output vector represents its name. After prolonged training, the network develops canonical internal representations of the patterns and it uses these canonical representations to identify familiar patterns in novel positions.

205 citations


Proceedings Article
01 Jan 1987
TL;DR: Simulations in simple networks show that the learning procedure usually converges rapidly on a good set of codes, and analysis shows that in certain restricted cases it performs gradient descent in the squared reconstruction error.
Abstract: We describe a new learning procedure for networks that contain groups of nonlinear units arranged in a closed loop. The aim of the learning is to discover codes that allow the activity vectors in a "visible" group to be represented by activity vectors in a "hidden" group. One way to test whether a code is an accurate representation is to try to reconstruct the visible vector from the hidden vector. The difference between the original and the reconstructed visible vectors is called the reconstruction error, and the learning procedure aims to minimize this error. The learning procedure has two passes. On the first pass, the original visible vector is passed around the loop, and on the second pass an average of the original vector and the reconstructed vector is passed around the loop. The learning procedure changes each weight by an amount proportional to the product of the "presynaptic" activity and the difference in the post-synaptic activity on the two passes. This procedure is much simpler to implement than methods like back-propagation. Simulations in simple networks show that it usually converges rapidly on a good set of codes, and analysis shows that in certain restricted cases it performs gradient descent in the squared reconstruction error.

153 citations


Journal ArticleDOI
TL;DR: Further research is described on back-propagation for layered networks of deterministic, neuron-like units and an example in which a network learns a set of filters that enable it to discriminate formant-like patterns in the presence of noise.

125 citations


01 Mar 1987
TL;DR: In this article, the authors describe an extension of the basic idea which makes it resemble competitive learning and which causes members of a population of these units to differentiate, each extracting different structure from the input.
Abstract: Hill climbing is used to maximize an information theoretic measure of the difference between the actual behavior of a unit and the behavior that would be predicted by a statistician who knew the first order statistics of the inputs but believed them to be independent. This causes the unit to detect higher order correlations among its inputs. Initial simulations are presented, and seem encouraging. We describe an extension of the basic idea which makes it resemble competitive learning and which causes members of a population of these units to differentiate, each extracting different structure from the input.

44 citations



Journal ArticleDOI
TL;DR: Most people can correctly apply the concepts of horizontal and vertical in describing objects, but a simple demonstration shows that they are confused about how these concepts work.
Abstract: Most people can correctly apply the concepts of horizontal and vertical in describing objects, but a simple demonstration shows that they are confused about how these concepts work. The nature of the confusion and its possible causes are briefly discussed.

4 citations


Journal ArticleDOI
01 Feb 1987
TL;DR: This paper describes research done at the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology, and major support for the research reported in this paper was provided by the System Development Foundation.
Abstract: I would like to acknowledge the help of Gul Agha, Jonathon Amsterdam, Peter de Jong, Carl Manning, Richard Waldinger, and Fanya Montalvo in improving the presentation. I owe a tremendous intellectual debt to my colleagues in the Message Passing Semantics Group, the Tremont Research Institute, and the MIT Artificial Intelligence Laboratory. Ken Kahn, Ueda, Keith Clark, and Takeuchi helped me greatly to understand (Flat) Concurrent Prolog, (Flat) Parlog, and (Flat) Guarded Horn Clauses. This paper describes research done at the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. Major support for the research reported in this paper was provided by the System Development Foundation. Major support for other related work in the Artificial Intelligence Laboratory is provided, in part, by the Advanced Research Projects Agency of the Department of Defense under Office of Naval Research contract N0014-80-C-0505. I would like to thank Carl York, Charles Smith, and Patrick Winston for their support and encouragement.

1 citations