Showing papers on "Unsupervised learning published in 1987"

PDF

Open Access

The ART of Adaptive Pattern Recognition by a Self-Organizing Neural Network. Revision,

[...]

01 Dec 1987

TL;DR: In this article, the stability-plasticity dilemma and Adaptive Resonance Theory are discussed in the context of self-organizing learning and recognition systems, and the three R's: Recognition, Reinforcement, and Recall.

...read moreread less

Abstract: : Partial Contents: Attention and Expectation in Self-Organizing Learning and Recognition Systems; The Stability-Plasticity Dilemma and Adaptive Resonance Theory; Competitive Learning Models; Self-Stabilized Learning by an ART Architecture in an Arbitrary Input Environment; Attentional Priming and Prediction: Matching by the 2/3 Rule; Automatic Control of Hypothesis Testing by Attentional-Orienting Interactions; Learning to Recognize an Analog World; Invariant Visual Pattern Recognition; The Three R's: Recognition, Reinforcement, and Recall; Self-Stabilization of Speech Perception and Production Codes: New Light on Motor Theory; and Psychophysiological and Neurophysiological Predictions of ART.

...read moreread less

1,196 citations

Book•

A Connectionist Machine for Genetic Hillclimbing

[...]

David H. Ackley¹•Institutions (1)

Carnegie Mellon University¹

31 Aug 1987

TL;DR: This dissertation focuses on the development of a heuristic search algorithm that automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging discrete-time components of a genealogy tree.

...read moreread less

Abstract: 1. Introduction.- 1.1. Satisfying hidden strong constraints.- 1.2. Function optimization.- 1.2.1. The methodology of heuristic search.- 1.2.2. The shape of function spaces.- 1.3. High-dimensional binary vector spaces.- 1.3.1. Graph partitioning.- 1.4. Dissertation overview.- 1.5. Summary.- 2. The model.- 2.1. Design goal: Learning while searching.- 2.1.1. Knowledge representation.- 2.1.2. Point-based search strategies.- 2.1.3. Population-based search strategies.- 2.1.4. Combination rules.- 2.1.5. Election rules.- 2.1.6. Summary: Learning while searching.- 2.2. Design goal: Sustained exploration.- 2.2.1. Searching broadly.- 2.2.2. Convergence and divergence.- 2.2.3. Mode transitions.- 2.2.4. Resource allocation via taxation.- 2.2.5. Summary: Sustained exploration.- 2.3. Connectionist computation.- 2.3.1. Units and links.- 2.3.2. A three-state stochastic unit.- 2.3.3. Receptive fields.- 2.4. Stochastic iterated genetic hillclimbing.- 2.4.1. Knowledge representation in SIGH.- 2.4.2. The SIGH control algorithm.- 2.4.3. Formal definition.- 2.5. Summary.- 3. Empirical demonstrations.- 3.1. Methodology.- 3.1.1. Notation.- 3.1.2. Parameter tuning.- 3.1.3. Non-termination.- 3.2. Seven algorithms.- 3.2.1. Iterated hillclimbing-steepest ascent (IHC-SA).- 3.2.2. Iterated hillclimbing-next ascent (IHC-NA).- 3.2.3. Stochastic hillclimbing (SHC).- 3.2.4. Iterated simulated annealing (ISA).- 3.2.5. Iterated genetic search-Uniform combination (IGS-U).- 3.2.6. Iterated genetic search-Ordered combination (IGS-O).- 3.2.7. Stochastic iterated genetic hillclimbing (SIGH).- 3.3. Six functions.- 3.3.1. A linear space-"One Max".- 3.3.2. A local maximum-"Two Max".- 3.3.3. A large local maximum-"Trap".- 3.3.4. Fine-grained local maxima-"Porcupine".- 3.3.5. Flat areas-"Plateaus".- 3.3.6. A combination space-"Mix".- 4. Analytic properties.- 4.1. Problem definition.- 4.2. Energy functions.- 4.3. Basic properties of the learning algorithm.- 4.3.1. Motivating the approach.- 4.3.2. Defining reinforcement signals.- 4.3.3. Defining similarity measures.- 4.3.4. The equilibrium distribution.- 4.4. Convergence.- 4.5. Divergence.- 5. Graph partitioning.- 5.1. Methodology.- 5.1.1. Problems.- 5.1.2. Algorithms.- 5.1.3. Data collection.- 5.1.4. Parameter tuning.- 5.2. Adding a linear component.- 5.3. Experiments on random graphs.- 5.4. Experiments on multilevel graphs.- 6. Related work.- 6.1. The problem space formulation.- 6.2. Search and learning.- 6.2.1. Learning while searching.- 6.2.2. Symbolic learning.- 6.2.3. Hillclimbing.- 6.2.4. Stochastic hillclimbing and simulated annealing.- 6.2.5. Genetic algorithms.- 6.3. Connectionist modelling.- 6.3.1. Competitive learning.- 6.3.2. Back propagation.- 6.3.3. Boltzmann machines.- 6.3.4. Stochastic iterated genetic hillclimbing.- 6.3.5. Harmony theory.- 6.3.6. Reinforcement models.- 7. Limitations and variations.- 7.1. Current limitations.- 7.1.1. The problem.- 7.1.2. The SIGH model.- 7.2. Possible variations.- 7.2.1. Exchanging parameters.- 7.2.2. Beyond symmetric connections.- 7.2.3. Simultaneous optimization.- 7.2.4. Widening the bottleneck.- 7.2.5. Temporal credit assignment.- 7.2.6. Learning a function.- 8. Discussion and conclusions.- 8.1. Stability and change.- 8.2. Architectural goals.- 8.2.1 High potential parallelism.- 8.2.2 Highly incremental.- 8.2.3 "Generalized Hebbian" learning.- 8.2.4 Unsupervised learning.- 8.2.5 "Closed loop" interactions.- 8.2.6 Emergent properties.- 8.3. Discussion.- 8.3.1 The processor/memory distinction.- 8.3.2 Physical computation systems.- 8.3.3 Between mind and brain.- 8.4. Conclusions.- 8.4.1. Recapitulation.- 8.4.2. Contributions.- References.

...read moreread less

750 citations

Journal Article•

A mean field theory learning algorithm for neural networks

[...]

Carsten Peterson, James R. Anderson

01 Jan 1987-Complex Systems

503 citations

Proceedings Article•

Supervised Learning of Probability Distributions by Neural Networks

[...]

Eric B. Baum¹, Frank Wilczek²•Institutions (2)

Jet Propulsion Laboratory¹, Harvard University²

01 Jan 1987

TL;DR: The back propagation algorithm for supervised learning can be generalized, put on a satisfactory conceptual footing, and very likely made more efficient by defining the values of the output and input neurons as probabilities and varying the synaptic weights in the gradient direction of the log likelihood, rather than the 'error'.

...read moreread less

Abstract: We propose that the back propagation algorithm for supervised learning can be generalized, put on a satisfactory conceptual footing, and very likely made more efficient by defining the values of the output and input neurons as probabilities and varying the synaptic weights in the gradient direction of the log likelihood, rather than the 'error'.

...read moreread less

186 citations

Proceedings Article•

Stochastic Learning Networks and their Electronic Implementation

[...]

Joshua Alspector¹, Robert B. Allen¹, Victor Hu², Srinagesh Satyanarayana³•Institutions (3)

Telcordia Technologies¹, University of California, Berkeley², Columbia University³

01 Jan 1987

TL;DR: A family of learning algorithms that operate on a recurrent, symmetrically connected, neuromorphic network that, like the Boltzmann machine, settles in the presence of noise and a version of the supervised learning algorithm for a network with analog activation functions.

...read moreread less

Abstract: We describe a family of learning algorithms that operate on a recurrent, symmetrically connected, neuromorphic network that, like the Boltzmann machine, settles in the presence of noise. These networks learn by modifying synaptic connection strengths on the basis of correlations seen locally by each synapse. We describe a version of the supervised learning algorithm for a network with analog activation functions. We also demonstrate unsupervised competitive learning with this approach, where weight saturation and decay play an important role, and describe preliminary experiments in reinforcement learning, where noise is used in the search procedure. We identify the above described phenomena as elements that can unify learning techniques at a physical microscopic level. These algorithms were chosen for ease of implementation in vlsi. We have designed a CMOS test chip in 2 micron rules that can speed up the learning about a millionfold over an equivalent simulation on a VAX 11/780. The speedup is due to parallel analog computation for summing and multiplying weights and activations, and the use of physical processes for generating random noise. The components of the test chip are a noise amplifier, a neuron amplifier, and a 300 transistor adaptive synapse, each of which is separately testable. These components are also integrated into a 6 neuron and 15 synapse network. Finally, we point out techniques for reducing the area of the electronic correlational synapse both in technology and design and show how the algorithms we study can be implemented naturally in electronic systems.

...read moreread less

55 citations

G-maximization: An unsupervised learning procedure for discovering regularities

[...]

Barak A. Pearlmutter, Geoffrey E. Hinton

01 Mar 1987

TL;DR: In this article, the authors describe an extension of the basic idea which makes it resemble competitive learning and which causes members of a population of these units to differentiate, each extracting different structure from the input.

...read moreread less

Abstract: Hill climbing is used to maximize an information theoretic measure of the difference between the actual behavior of a unit and the behavior that would be predicted by a statistician who knew the first order statistics of the inputs but believed them to be independent. This causes the unit to detect higher order correlations among its inputs. Initial simulations are presented, and seem encouraging. We describe an extension of the basic idea which makes it resemble competitive learning and which causes members of a population of these units to differentiate, each extracting different structure from the input.

...read moreread less

44 citations

Journal Article•DOI•

Dynamic speaker adaptation for feature-based isolated word recognition

[...]

Richard M. Stern¹, Moshe J. Lasry•Institutions (1)

Carnegie Mellon University¹

01 Jun 1987-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A set of dynamic adaptation procedures for updating expected feature values during recognition using maximum a posteriori probability (MAP) estimation techniques to update the mean vectors of sets of feature values on a speaker-by-speaker basis.

...read moreread less

Abstract: In this paper, we describe efforts to improve the performance of FEATURE, the Carnegie-Mellon University speaker-independent speech recognition system that classifies isolated letters of the English alphabet by enabling the system to learn the acoustical characteristics of individual speakers. Even when features are designed to be speaker-independent, it is frequently observed that feature values may vary more from speaker to speaker for a single letter than they vary from letter to letter. In these cases, it is necessary to adjust the system's statistical description of the features of individual speakers to obtain improved recognition performance. This paper describes a set of dynamic adaptation procedures for updating expected feature values during recognition. The algorithm uses maximum a posteriori probability (MAP) estimation techniques to update the mean vectors of sets of feature values on a speaker-by-speaker basis. The MAP estimation algorithm makes use of both knowledge of the observations input to the system from an individual speaker and the relative variability of the features' means within and across all speakers. In addition, knowledge of the covariance of the features' mean vectors across the various letters enables the system to adapt its representation of similar-sounding letters after any one of them is presented to the classifier. The use of dynamic speaker adaptation improves classification performance of FEATURE by 49 percent after four presentations of the alphabet, when the system is provided with supervised training indicating which specific utterance had been presented to the classifier from a particular user. Performance can be improved by as much as 31 percent when the system is allowed to adapt passively in an unsupervised learning mode. without any information from individual users.

...read moreread less

41 citations

Book Chapter•DOI•

A New Approach to Unsupervised Learning in Deterministic Environments

[...]

Ronald L. Rivest¹, Robert E. Schapire¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1987

TL;DR: In this paper, the authors present a new approach to inferring the structure of a deterministic finite-state environment by experimentation based on the notion of a test: a sequence of actions followed by a predicted sensation.

...read moreread less

Abstract: We present a new approach to the problem of inferring the structure of a deterministic finite-state environment by experimentation. The learner is presumed to have no a priori knowledge of the environment other than knowing how to perform the set of basic actions and knowing what elementary sensations are possible. The actions affect the state of the environment and the sensations of the learner according to deterministic rules that are to be learned. The goal of the learner is to construct a perfect model of his environment – one that enables him to predict perfectly the result of any proposed sequence of actions. Our approach is based on the notion of a “test”: a sequence of actions followed by a predicted sensation. The value (true or false) of a test at the current state can be easily determined by executing it. We define two tests to be “equivalent” if they have the same value at any global state. Our procedure uses systematic experimentation to discover the equivalence relation on tests determined by the environment, and produces a set of “canonical” tests. The equivalence classes produced correspond in many cases to a natural decomposition of the structure of the environment; one may say that our procedure discovers the appropriate set of state variables useful for describing the environment. Our procedure has been implemented, and appears to be remarkably effective in practice. For example, it has successfully inferred in a few minutes each the structure of Rubik's Cube (over 1019 global states) and a simple “grid world” environment (over 1011 global states); these examples are many orders of magnitude larger than what was possible with previous techniques.

...read moreread less

34 citations

Proceedings Article•

A comparison of concept identification in human learning and network learning with the generalized delta rule

[...]

Michael J. Pazzani¹, Michael G. Dyer²•Institutions (2)

The Aerospace Corporation¹, University of California, Los Angeles²

23 Aug 1987

TL;DR: This paper compares network learning using the generalized delta rule to human learning on two concept identification tasks and finds relative ease of concept identification and generalizing from incomplete data.

...read moreread less

Abstract: The generalized delta rule (which is also known as error back-propagation) is a significant advance over previous procedures for network learning In this paper, we compare network learning using the generalized delta rule to human learning on two concept identification tasks: • Relative ease of concept identification • Generalizing from incomplete data

...read moreread less

16 citations

Proceedings Article•DOI•

Real-Time 3-D Object Classification Using a Learning System

[...]

Raymond D. Rimey, Philip Gouin, Christopher L. Scofield, Douglas L. Reilly

27 Mar 1987

TL;DR: This work describes some experiments in real-time 3-D object classification using a learning system derived from a general neural model for supervised learning and examines the feasibility and merits of the learning system in a simple machine vision problem.

...read moreread less

Abstract: We describe some experiments in real-time 3-D object classification using a learning system derived from a general neural model for supervised learning. The primary advantages of the learning system are its ability to learn from experience to recognize patterns and its inherent massive parallelism. Our motivation is to examine the feasibility and merits of the learning system in a simple machine vision problem.

...read moreread less

14 citations

Proceedings Article•

Learning by state recurrence detection

[...]

Bruce E. Rosen¹, James M. Goodwin¹, Jacques J. Vidal¹•Institutions (1)

University of California, Los Angeles¹

01 Jan 1987

TL;DR: This research investigates a new technique for unsupervised learning of nonlinear control problems, applied both to Michie and Chambers BOXES algorithm and to Barto, Sutton and Anderson's extension, the ASE/ACE system, and has significantly improved the convergence rate of stochastically based learning automata.

...read moreread less

Abstract: This research investigates a new technique for unsupervised learning of nonlinear control problems. The approach is applied both to Michie and Chambers BOXES algorithm and to Barto, Sutton and Anderson's extension, the ASE/ACE system, and has significantly improved the convergence rate of stochastically based learning automata. Recurrence learning is a new nonlinear reward-penalty algorithm. It exploits information found during learning trials to reinforce decisions resulting in the recurrence of nonfailing states. Recurrence learning applies positive reinforcement during the exploration of the search space, whereas in the BOXES or ASE algorithms, only negative weight reinforcement is applied, and then only on failure. Simulation results show that the added information from recurrence learning increases the learning rate. Our empirical results show that recurrence learning is faster than both basic failure driven learning and failure prediction methods. Although recurrence learning has only been tested in failure driven experiments, there are goal directed learning applications where detection of recurring oscillations may provide useful information that reduces the learning time by applying negative, instead of positive reinforcement. Detection of cycles provides a heuristic to improve the balance between evidence gathering and goal directed search.

...read moreread less

Proceedings Article•

Analysis and Comparison of Different Learning Algorithms for Pattern Association Problems

[...]

Jakob Bernasconi¹•Institutions (1)

Brown, Boveri & Cie¹

01 Jan 1987

TL;DR: It is observed that networks with ±1 units quite generally exhibit a significantly better learning behavior than the corresponding 0,1 versions and an adaption of the weight-structure to the symmetries of the problem can lead to a drastic increase in learning speed.

...read moreread less

Abstract: We investigate the behavior of different learning algorithms for networks of neuron-like units. As test cases we use simple pattern association problems, such as the XOR-problem and symmetry detection problems. The algorithms considered are either versions of the Boltzmann machine learning rule or based on the backpropagation of errors. We also propose and analyze a generalized delta rule for linear threshold units. We find that the performance of a given learning algorithm depends strongly on the type of units used. In particular, we observe that networks with ±1 units quite generally exhibit a significantly better learning behavior than the corresponding 0,1 versions. We also demonstrate that an adaption of the weight-structure to the symmetries of the problem can lead to a drastic increase in learning speed.

...read moreread less

Proceedings Article•DOI•

Application Of Cluster Analysis And Unsupervised Learning To Multivariate Tissue Characterization

[...]

Reza Momenan¹, Michael F. Insana², Robert F. Wagner³, Brian S. Garra⁴, Murray H. Loew¹ - Show less +1 more•Institutions (4)

George Washington University¹, University of Kansas², Food and Drug Administration³, National Institutes of Health⁴

10 Sep 1987

TL;DR: This paper describes a procedure for classifying tissue types from unlabeled acoustic measurements (data type unknown) using unsupervised cluster analysis and a comparison with a supervised technique using labeled data.

...read moreread less

Abstract: This paper describes a procedure for classifying tissue types from unlabeled acoustic measurements (data type unknown) using unsupervised cluster analysis. These techniques are being applied to unsupervised ultrasonic image segmentation and tissue characteriza-tion. The performance of a new clustering technique is measured and compared with supervised methods, such as a linear Bayes classifier. In these comparisons two objectives are sought: a) How well does the clustering method group the data? b) Do the clusters correspond to known tissue classes? The first question is investigated by a measure of cluster similarity and dispersion. The second question involves a comparison with a supervised technique using labeled data.

...read moreread less

Analysis of the Performance of a Parametric and Nonparametric Classification System: An Application to Feature Selection and Extraction in Radar Target Identification.

[...]

A. Djouadi¹, F. D. Garber•Institutions (1)

Ohio State University¹

01 May 1987

TL;DR: This dissertation investigates new parametric and nonparametric bounds on the Bayes risk that can be used as a criterion in feature selection and extraction in radar target identification (RTI).

...read moreread less

Abstract: : This dissertation investigates new parametric and nonparametric bounds on the Bayes risk that can be used as a criterion in feature selection and extraction in radar target identification (RTI). For the parametric case, where the form of the underlying statistical distributions is known, Bayesian decision theory offers a well-motivated methodology for the design of parametric classifiers. This investigation provides new bounds on the Bayes risk for both simple and composite classes. Bounds on the Bayes risk for M classes are derived in terms of the risk functions for (M-1) classes, and so on until the result depends only on the Pairwise Bayes risks. When the parameters of the underlying distributions are unknown, an analysis of the effect of finite sample size and dimensionality on these bounds is given for the case of supervised learning. For the case of unsupervised learning, the parameters of these distributions are evaluated by using the maximum likelihood technique by means of an iterative method and an appropriate algorithm. Finally, for the nonparametric case, where the form of the underlying statistical distributions is unknown, a nonparametric technique, the nearest-neighbor (N N) rule, is used to provide estimated bounds on the Bayes risk. Two methods are proposed to produce a finite size risk close to the asymptotic one. The difference between the finite sample size risk and the asymptotic risk is used as the criterion of improvement.

...read moreread less

Unsupervised learning for characterization of tissue from acoustical speckle in ultrasound images.

[...]

Reza Momenan¹, Robert F. Wagner¹, Murray H. Loew¹, Michael F. Insana¹, Brian S. Garra¹ - Show less +1 more•Institutions (1)

George Washington University¹

01 Jan 1987

Proceedings Article•DOI•

A Pulse-Driven Learning Network

[...]

W. E. Simon¹, W. S. Cook¹, J. R. Carter¹, D.A. J. Outteridge¹•Institutions (1)

Martin Marietta Materials, Inc.¹

27 Mar 1987

TL;DR: The results presented herein demonstrate the ability of a pulse-driven learning network to exhibit learning from association, learning from reward/punishment for simple problems and the existence of a stable solution for solving a complex problem.

...read moreread less

Abstract: A pulse-driven learning network can be applied to any problem where adaptive behavior (i.e., the ability to adjust behavior to situations where a priori solutions are not known) is important. The pulse-driven learning network approach is different from other connectionist techniques in the way communication occurs between nodes. Since other connectionist techniques allow communication to occur in a continuum fashion, solutions at each compute cycle exist only when the system is in an equilibrium state. Not only is this a very computationally intensive process, but false solutions are also possible. The learning network does not have either of these problems because communication between nodes is in the form of a pulse and the correction solution is extracted from the network in as few as ten pulses from the input nodes. The results presented herein demonstrate the ability of a pulse-driven learning network to exhibit learning from association, learning from reward/punishment for simple problems and the existence of a stable solution for solving a complex problem.

...read moreread less