scispace - formally typeset
Search or ask a question

Showing papers on "Unsupervised learning published in 1987"


01 Dec 1987
TL;DR: In this article, the stability-plasticity dilemma and Adaptive Resonance Theory are discussed in the context of self-organizing learning and recognition systems, and the three R's: Recognition, Reinforcement, and Recall.
Abstract: : Partial Contents: Attention and Expectation in Self-Organizing Learning and Recognition Systems; The Stability-Plasticity Dilemma and Adaptive Resonance Theory; Competitive Learning Models; Self-Stabilized Learning by an ART Architecture in an Arbitrary Input Environment; Attentional Priming and Prediction: Matching by the 2/3 Rule; Automatic Control of Hypothesis Testing by Attentional-Orienting Interactions; Learning to Recognize an Analog World; Invariant Visual Pattern Recognition; The Three R's: Recognition, Reinforcement, and Recall; Self-Stabilization of Speech Perception and Production Codes: New Light on Motor Theory; and Psychophysiological and Neurophysiological Predictions of ART.

1,196 citations


Book
31 Aug 1987
TL;DR: This dissertation focuses on the development of a heuristic search algorithm that automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging discrete-time components of a genealogy tree.
Abstract: 1. Introduction.- 1.1. Satisfying hidden strong constraints.- 1.2. Function optimization.- 1.2.1. The methodology of heuristic search.- 1.2.2. The shape of function spaces.- 1.3. High-dimensional binary vector spaces.- 1.3.1. Graph partitioning.- 1.4. Dissertation overview.- 1.5. Summary.- 2. The model.- 2.1. Design goal: Learning while searching.- 2.1.1. Knowledge representation.- 2.1.2. Point-based search strategies.- 2.1.3. Population-based search strategies.- 2.1.4. Combination rules.- 2.1.5. Election rules.- 2.1.6. Summary: Learning while searching.- 2.2. Design goal: Sustained exploration.- 2.2.1. Searching broadly.- 2.2.2. Convergence and divergence.- 2.2.3. Mode transitions.- 2.2.4. Resource allocation via taxation.- 2.2.5. Summary: Sustained exploration.- 2.3. Connectionist computation.- 2.3.1. Units and links.- 2.3.2. A three-state stochastic unit.- 2.3.3. Receptive fields.- 2.4. Stochastic iterated genetic hillclimbing.- 2.4.1. Knowledge representation in SIGH.- 2.4.2. The SIGH control algorithm.- 2.4.3. Formal definition.- 2.5. Summary.- 3. Empirical demonstrations.- 3.1. Methodology.- 3.1.1. Notation.- 3.1.2. Parameter tuning.- 3.1.3. Non-termination.- 3.2. Seven algorithms.- 3.2.1. Iterated hillclimbing-steepest ascent (IHC-SA).- 3.2.2. Iterated hillclimbing-next ascent (IHC-NA).- 3.2.3. Stochastic hillclimbing (SHC).- 3.2.4. Iterated simulated annealing (ISA).- 3.2.5. Iterated genetic search-Uniform combination (IGS-U).- 3.2.6. Iterated genetic search-Ordered combination (IGS-O).- 3.2.7. Stochastic iterated genetic hillclimbing (SIGH).- 3.3. Six functions.- 3.3.1. A linear space-"One Max".- 3.3.2. A local maximum-"Two Max".- 3.3.3. A large local maximum-"Trap".- 3.3.4. Fine-grained local maxima-"Porcupine".- 3.3.5. Flat areas-"Plateaus".- 3.3.6. A combination space-"Mix".- 4. Analytic properties.- 4.1. Problem definition.- 4.2. Energy functions.- 4.3. Basic properties of the learning algorithm.- 4.3.1. Motivating the approach.- 4.3.2. Defining reinforcement signals.- 4.3.3. Defining similarity measures.- 4.3.4. The equilibrium distribution.- 4.4. Convergence.- 4.5. Divergence.- 5. Graph partitioning.- 5.1. Methodology.- 5.1.1. Problems.- 5.1.2. Algorithms.- 5.1.3. Data collection.- 5.1.4. Parameter tuning.- 5.2. Adding a linear component.- 5.3. Experiments on random graphs.- 5.4. Experiments on multilevel graphs.- 6. Related work.- 6.1. The problem space formulation.- 6.2. Search and learning.- 6.2.1. Learning while searching.- 6.2.2. Symbolic learning.- 6.2.3. Hillclimbing.- 6.2.4. Stochastic hillclimbing and simulated annealing.- 6.2.5. Genetic algorithms.- 6.3. Connectionist modelling.- 6.3.1. Competitive learning.- 6.3.2. Back propagation.- 6.3.3. Boltzmann machines.- 6.3.4. Stochastic iterated genetic hillclimbing.- 6.3.5. Harmony theory.- 6.3.6. Reinforcement models.- 7. Limitations and variations.- 7.1. Current limitations.- 7.1.1. The problem.- 7.1.2. The SIGH model.- 7.2. Possible variations.- 7.2.1. Exchanging parameters.- 7.2.2. Beyond symmetric connections.- 7.2.3. Simultaneous optimization.- 7.2.4. Widening the bottleneck.- 7.2.5. Temporal credit assignment.- 7.2.6. Learning a function.- 8. Discussion and conclusions.- 8.1. Stability and change.- 8.2. Architectural goals.- 8.2.1 High potential parallelism.- 8.2.2 Highly incremental.- 8.2.3 "Generalized Hebbian" learning.- 8.2.4 Unsupervised learning.- 8.2.5 "Closed loop" interactions.- 8.2.6 Emergent properties.- 8.3. Discussion.- 8.3.1 The processor/memory distinction.- 8.3.2 Physical computation systems.- 8.3.3 Between mind and brain.- 8.4. Conclusions.- 8.4.1. Recapitulation.- 8.4.2. Contributions.- References.

750 citations



Proceedings Article
01 Jan 1987
TL;DR: The back propagation algorithm for supervised learning can be generalized, put on a satisfactory conceptual footing, and very likely made more efficient by defining the values of the output and input neurons as probabilities and varying the synaptic weights in the gradient direction of the log likelihood, rather than the 'error'.
Abstract: We propose that the back propagation algorithm for supervised learning can be generalized, put on a satisfactory conceptual footing, and very likely made more efficient by defining the values of the output and input neurons as probabilities and varying the synaptic weights in the gradient direction of the log likelihood, rather than the 'error'.

186 citations


Proceedings Article
01 Jan 1987
TL;DR: A family of learning algorithms that operate on a recurrent, symmetrically connected, neuromorphic network that, like the Boltzmann machine, settles in the presence of noise and a version of the supervised learning algorithm for a network with analog activation functions.
Abstract: We describe a family of learning algorithms that operate on a recurrent, symmetrically connected, neuromorphic network that, like the Boltzmann machine, settles in the presence of noise. These networks learn by modifying synaptic connection strengths on the basis of correlations seen locally by each synapse. We describe a version of the supervised learning algorithm for a network with analog activation functions. We also demonstrate unsupervised competitive learning with this approach, where weight saturation and decay play an important role, and describe preliminary experiments in reinforcement learning, where noise is used in the search procedure. We identify the above described phenomena as elements that can unify learning techniques at a physical microscopic level. These algorithms were chosen for ease of implementation in vlsi. We have designed a CMOS test chip in 2 micron rules that can speed up the learning about a millionfold over an equivalent simulation on a VAX 11/780. The speedup is due to parallel analog computation for summing and multiplying weights and activations, and the use of physical processes for generating random noise. The components of the test chip are a noise amplifier, a neuron amplifier, and a 300 transistor adaptive synapse, each of which is separately testable. These components are also integrated into a 6 neuron and 15 synapse network. Finally, we point out techniques for reducing the area of the electronic correlational synapse both in technology and design and show how the algorithms we study can be implemented naturally in electronic systems.

55 citations


01 Mar 1987
TL;DR: In this article, the authors describe an extension of the basic idea which makes it resemble competitive learning and which causes members of a population of these units to differentiate, each extracting different structure from the input.
Abstract: Hill climbing is used to maximize an information theoretic measure of the difference between the actual behavior of a unit and the behavior that would be predicted by a statistician who knew the first order statistics of the inputs but believed them to be independent. This causes the unit to detect higher order correlations among its inputs. Initial simulations are presented, and seem encouraging. We describe an extension of the basic idea which makes it resemble competitive learning and which causes members of a population of these units to differentiate, each extracting different structure from the input.

44 citations


Journal ArticleDOI
TL;DR: A set of dynamic adaptation procedures for updating expected feature values during recognition using maximum a posteriori probability (MAP) estimation techniques to update the mean vectors of sets of feature values on a speaker-by-speaker basis.
Abstract: In this paper, we describe efforts to improve the performance of FEATURE, the Carnegie-Mellon University speaker-independent speech recognition system that classifies isolated letters of the English alphabet by enabling the system to learn the acoustical characteristics of individual speakers. Even when features are designed to be speaker-independent, it is frequently observed that feature values may vary more from speaker to speaker for a single letter than they vary from letter to letter. In these cases, it is necessary to adjust the system's statistical description of the features of individual speakers to obtain improved recognition performance. This paper describes a set of dynamic adaptation procedures for updating expected feature values during recognition. The algorithm uses maximum a posteriori probability (MAP) estimation techniques to update the mean vectors of sets of feature values on a speaker-by-speaker basis. The MAP estimation algorithm makes use of both knowledge of the observations input to the system from an individual speaker and the relative variability of the features' means within and across all speakers. In addition, knowledge of the covariance of the features' mean vectors across the various letters enables the system to adapt its representation of similar-sounding letters after any one of them is presented to the classifier. The use of dynamic speaker adaptation improves classification performance of FEATURE by 49 percent after four presentations of the alphabet, when the system is provided with supervised training indicating which specific utterance had been presented to the classifier from a particular user. Performance can be improved by as much as 31 percent when the system is allowed to adapt passively in an unsupervised learning mode. without any information from individual users.

41 citations


Book ChapterDOI
01 Jan 1987
TL;DR: In this paper, the authors present a new approach to inferring the structure of a deterministic finite-state environment by experimentation based on the notion of a test: a sequence of actions followed by a predicted sensation.
Abstract: We present a new approach to the problem of inferring the structure of a deterministic finite-state environment by experimentation. The learner is presumed to have no a priori knowledge of the environment other than knowing how to perform the set of basic actions and knowing what elementary sensations are possible. The actions affect the state of the environment and the sensations of the learner according to deterministic rules that are to be learned. The goal of the learner is to construct a perfect model of his environment – one that enables him to predict perfectly the result of any proposed sequence of actions. Our approach is based on the notion of a “test”: a sequence of actions followed by a predicted sensation. The value (true or false) of a test at the current state can be easily determined by executing it. We define two tests to be “equivalent” if they have the same value at any global state. Our procedure uses systematic experimentation to discover the equivalence relation on tests determined by the environment, and produces a set of “canonical” tests. The equivalence classes produced correspond in many cases to a natural decomposition of the structure of the environment; one may say that our procedure discovers the appropriate set of state variables useful for describing the environment. Our procedure has been implemented, and appears to be remarkably effective in practice. For example, it has successfully inferred in a few minutes each the structure of Rubik's Cube (over 1019 global states) and a simple “grid world” environment (over 1011 global states); these examples are many orders of magnitude larger than what was possible with previous techniques.

34 citations


Proceedings Article
23 Aug 1987
TL;DR: This paper compares network learning using the generalized delta rule to human learning on two concept identification tasks and finds relative ease of concept identification and generalizing from incomplete data.
Abstract: The generalized delta rule (which is also known as error back-propagation) is a significant advance over previous procedures for network learning In this paper, we compare network learning using the generalized delta rule to human learning on two concept identification tasks: • Relative ease of concept identification • Generalizing from incomplete data

16 citations


Proceedings ArticleDOI
27 Mar 1987
TL;DR: This work describes some experiments in real-time 3-D object classification using a learning system derived from a general neural model for supervised learning and examines the feasibility and merits of the learning system in a simple machine vision problem.
Abstract: We describe some experiments in real-time 3-D object classification using a learning system derived from a general neural model for supervised learning. The primary advantages of the learning system are its ability to learn from experience to recognize patterns and its inherent massive parallelism. Our motivation is to examine the feasibility and merits of the learning system in a simple machine vision problem.

14 citations


Proceedings Article
01 Jan 1987
TL;DR: This research investigates a new technique for unsupervised learning of nonlinear control problems, applied both to Michie and Chambers BOXES algorithm and to Barto, Sutton and Anderson's extension, the ASE/ACE system, and has significantly improved the convergence rate of stochastically based learning automata.
Abstract: This research investigates a new technique for unsupervised learning of nonlinear control problems. The approach is applied both to Michie and Chambers BOXES algorithm and to Barto, Sutton and Anderson's extension, the ASE/ACE system, and has significantly improved the convergence rate of stochastically based learning automata. Recurrence learning is a new nonlinear reward-penalty algorithm. It exploits information found during learning trials to reinforce decisions resulting in the recurrence of nonfailing states. Recurrence learning applies positive reinforcement during the exploration of the search space, whereas in the BOXES or ASE algorithms, only negative weight reinforcement is applied, and then only on failure. Simulation results show that the added information from recurrence learning increases the learning rate. Our empirical results show that recurrence learning is faster than both basic failure driven learning and failure prediction methods. Although recurrence learning has only been tested in failure driven experiments, there are goal directed learning applications where detection of recurring oscillations may provide useful information that reduces the learning time by applying negative, instead of positive reinforcement. Detection of cycles provides a heuristic to improve the balance between evidence gathering and goal directed search.

Proceedings Article
01 Jan 1987
TL;DR: It is observed that networks with ±1 units quite generally exhibit a significantly better learning behavior than the corresponding 0,1 versions and an adaption of the weight-structure to the symmetries of the problem can lead to a drastic increase in learning speed.
Abstract: We investigate the behavior of different learning algorithms for networks of neuron-like units. As test cases we use simple pattern association problems, such as the XOR-problem and symmetry detection problems. The algorithms considered are either versions of the Boltzmann machine learning rule or based on the backpropagation of errors. We also propose and analyze a generalized delta rule for linear threshold units. We find that the performance of a given learning algorithm depends strongly on the type of units used. In particular, we observe that networks with ±1 units quite generally exhibit a significantly better learning behavior than the corresponding 0,1 versions. We also demonstrate that an adaption of the weight-structure to the symmetries of the problem can lead to a drastic increase in learning speed.

Proceedings ArticleDOI
10 Sep 1987
TL;DR: This paper describes a procedure for classifying tissue types from unlabeled acoustic measurements (data type unknown) using unsupervised cluster analysis and a comparison with a supervised technique using labeled data.
Abstract: This paper describes a procedure for classifying tissue types from unlabeled acoustic measurements (data type unknown) using unsupervised cluster analysis. These techniques are being applied to unsupervised ultrasonic image segmentation and tissue characteriza-tion. The performance of a new clustering technique is measured and compared with supervised methods, such as a linear Bayes classifier. In these comparisons two objectives are sought: a) How well does the clustering method group the data? b) Do the clusters correspond to known tissue classes? The first question is investigated by a measure of cluster similarity and dispersion. The second question involves a comparison with a supervised technique using labeled data.

01 May 1987
TL;DR: This dissertation investigates new parametric and nonparametric bounds on the Bayes risk that can be used as a criterion in feature selection and extraction in radar target identification (RTI).
Abstract: : This dissertation investigates new parametric and nonparametric bounds on the Bayes risk that can be used as a criterion in feature selection and extraction in radar target identification (RTI). For the parametric case, where the form of the underlying statistical distributions is known, Bayesian decision theory offers a well-motivated methodology for the design of parametric classifiers. This investigation provides new bounds on the Bayes risk for both simple and composite classes. Bounds on the Bayes risk for M classes are derived in terms of the risk functions for (M-1) classes, and so on until the result depends only on the Pairwise Bayes risks. When the parameters of the underlying distributions are unknown, an analysis of the effect of finite sample size and dimensionality on these bounds is given for the case of supervised learning. For the case of unsupervised learning, the parameters of these distributions are evaluated by using the maximum likelihood technique by means of an iterative method and an appropriate algorithm. Finally, for the nonparametric case, where the form of the underlying statistical distributions is unknown, a nonparametric technique, the nearest-neighbor (N N) rule, is used to provide estimated bounds on the Bayes risk. Two methods are proposed to produce a finite size risk close to the asymptotic one. The difference between the finite sample size risk and the asymptotic risk is used as the criterion of improvement.


Proceedings ArticleDOI
27 Mar 1987
TL;DR: The results presented herein demonstrate the ability of a pulse-driven learning network to exhibit learning from association, learning from reward/punishment for simple problems and the existence of a stable solution for solving a complex problem.
Abstract: A pulse-driven learning network can be applied to any problem where adaptive behavior (i.e., the ability to adjust behavior to situations where a priori solutions are not known) is important. The pulse-driven learning network approach is different from other connectionist techniques in the way communication occurs between nodes. Since other connectionist techniques allow communication to occur in a continuum fashion, solutions at each compute cycle exist only when the system is in an equilibrium state. Not only is this a very computationally intensive process, but false solutions are also possible. The learning network does not have either of these problems because communication between nodes is in the form of a pulse and the correction solution is extracted from the network in as few as ten pulses from the input nodes. The results presented herein demonstrate the ability of a pulse-driven learning network to exhibit learning from association, learning from reward/punishment for simple problems and the existence of a stable solution for solving a complex problem.