scispace - formally typeset
Search or ask a question

Showing papers on "Unsupervised learning published in 1991"


Book
01 Jan 1991
TL;DR: This book is a detailed, logically-developed treatment that covers the theory and uses of collective computational networks, including associative memory, feed forward networks, and unsupervised learning.
Abstract: From the Publisher: This book is a comprehensive introduction to the neural network models currently under intensive study for computational applications. It is a detailed, logically-developed treatment that covers the theory and uses of collective computational networks, including associative memory, feed forward networks, and unsupervised learning. It also provides coverage of neural network applications in a variety of problems of both theoretical and practical interest.

7,518 citations


Journal ArticleDOI
TL;DR: A new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases, which is demonstrated to be able to be solved by a very simple expert network.
Abstract: We present a new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases. The new procedure can be viewed either as a modular version of a multilayer supervised network, or as an associative version of competitive learning. It therefore provides a new link between these two apparently different approaches. We demonstrate that the learning procedure divides up a vowel discrimination task into appropriate subtasks, each of which can be solved by a very simple expert network.

4,338 citations


Book
01 Jan 1991
TL;DR: This chapter discusses supervised learning using Parametric and Nonparametric Approaches and unsupervised Learning in NeurPR, and discusses feedforward Networks and Training by Backpropagation.
Abstract: STATISTICAL PATTERN RECOGNITION (StatPR). Supervised Learning (Training) Using Parametric and Nonparametric Approaches. Linear Discriminant Functions and the Discrete and Binary Feature Cases. Unsupervised Learning and Clustering. SYNTACTIC PATTERN RECOGNITION (SyntPR). Overview. Syntactic Recognition via Parsing and Other Grammars. Graphical Approaches to SyntPR. Learning via Grammatical Inference. NEURAL PATTERN RECOGNITION (NeurPR). Introduction to Neural Networks. Introduction to Neural Pattern Associators and Matrix Approaches. Feedforward Networks and Training by Backpropagation. Content Addressable Memory Approaches and Unsupervised Learning in NeurPR. Appendices. References. Permission Source Notes. Index.

970 citations



Proceedings Article
02 Dec 1991
TL;DR: It is shown that arbitrary distributions of binary vectors can be approximated by the combination model and shown how the weight vectors in the model can be interpreted as high order correlation patterns among the input bits, and how the combination machine can be used as a mechanism for detecting these patterns.
Abstract: We present a distribution model for binary vectors, called the influence combination model and show how this model can be used as the basis for unsupervised learning algorithms for feature selection. The model can be represented by a particular type of Boltzmann machine with a bipartite graph structure that we call the combination machine. This machine is closely related to the Harmonium model defined by Smolensky. In the first part of the paper we analyze properties of this distribution representation scheme. We show that arbitrary distributions of binary vectors can be approximated by the combination model. We show how the weight vectors in the model can be interpreted as high order correlation patterns among the input bits, and how the combination machine can be used as a mechanism for detecting these patterns. We compare the combination model with the mixture model and with principle component analysis. In the second part of the paper we present two algorithms for learning the combination model from examples. The first learning algorithm is the standard gradient ascent heuristic for computing maximum likelihood estimates for the parameters of the model. Here we give a closed form for this gradient that is significantly easier to compute than the corresponding gradient for the general Boltzmann machine. The second learning algorithm is a greedy method that creates the hidden units and computes their weights one at a time. This method is a variant of projection pursuit density estimation. In the third part of the paper we give experimental results for these learning methods on synthetic data and on natural data of handwritten digit images.

350 citations


Proceedings ArticleDOI
08 Jul 1991
TL;DR: An original approach to neural modeling based on the idea of searching, with learning methods, for a synaptic learning rule which is biologically plausible and yields networks that are able to learn to perform difficult tasks is discussed.
Abstract: Summary form only given, as follows. The authors discuss an original approach to neural modeling based on the idea of searching, with learning methods, for a synaptic learning rule which is biologically plausible and yields networks that are able to learn to perform difficult tasks. The proposed method of automatically finding the learning rule relies on the idea of considering the synaptic modification rule as a parametric function. This function has local inputs and is the same in many neurons. The parameters that define this function can be estimated with known learning methods. For this optimization, particular attention is given to gradient descent and genetic algorithms. In both cases, estimation of this function consists of a joint global optimization of the synaptic modification function and the networks that are learning to perform some tasks. Both network architecture and the learning function can be designed within constraints derived from biological knowledge. >

293 citations


Proceedings Article
24 Aug 1991
TL;DR: This paper describes the input generalization problem (whereby the system must generalize to produce similar actions in similar situations) and an implemented solution, the G algorithm, which is based on recursive splitting of the state space based on statistical measures of differences in reinforcements received.
Abstract: Delayed reinforcement learning is an attractive framework for the unsupervised learning of action policies for autonomous agents Some existing delayed reinforcement learning techniques have shown promise in simple domains However, a number of hurdles must be passed before they are applicable to realistic problems This paper describes one such difficulty, the input generalization problem (whereby the system must generalize to produce similar actions in similar situations) and an implemented solution, the G algorithm This algorithm is based on recursive splitting of the state space based on statistical measures of differences in reinforcements received Connectionist backpropagation has previously been used for input generalization in reinforcement learning We compare the two techniques analytically and empirically The G algorithm's sound statistical basis makes it easy to predict when it should and should not work, whereas the behavior of back-propagation is unpredictable We found that a previous successful use of backpropagation can be explained by the linearity of the application domain We found that in another domain, G reliably found the optimal policy, whereas none of a set of runs of backpropagation with many combinations of parameters did

272 citations


Book
01 Sep 1991
TL;DR: In this article, the authors bring together results on concept formation from cognitive psychology and machine learning, including explanation-based and inductive approaches, and highlight the commonality of their research agendas.
Abstract: Concept formation lies at the center of learning and cognition. Unlike much work in machine learning and cognitive psychology, research on this topic focuses on the unsupervised and incremental acquisition of conceptual knowledge. Recent work on concept formation addresses a number of important issues. Foremost among these are the principles of similarity that guide concept learning and retrieval in human and machine, including the contribution of surface features, goals, and `deep' features. Another active area of research explores mechanisms for efficiently reorganizing memory in response to the ongoing experiences that confront intelligent agents. Finally, methods for concept formation play an increasing role in work on problem solving and planning, developmental psychology, engineering applications, and constructive induction. This book brings together results on concept formation from cognitive psychology and machine learning, including explanation-based and inductive approaches. Chapters from these differing perspectives are intermingled to highlight the commonality of their research agendas. In addition to cognitive scientists and AI researchers, the book will interest data analysts involved in clustering, philosophers concerned with the nature and origin of concepts, and any researcher dealing with issues of similarity, memory organization, and problem solving.

197 citations


Book ChapterDOI
16 Oct 1991
TL;DR: This paper shows that the existing approaches to learning from inconsistent examples are not sufficient, and a new method is suggested, which transforms the original decision table with unknown values into a new decision table in which every attribute value is known.
Abstract: In machine learning many real-life applications data are characterized by attributes with unknown values. This paper shows that the existing approaches to learning from such examples are not sufficient. A new method is suggested, which transforms the original decision table with unknown values into a new decision table in which every attribute value is known. Such a new table, in general, is inconsistent. This problem is solved by a technique of learning from inconsistent examples, based on rough set theory. Thus, two sets of rules: certain and possible are induced. Certain rules are categorical, while possible rules are supported by existing data, although conflicting data may exist as well. The presented approach may be combined with any other approach to uncertainty when processing of possible rules is concerned.

195 citations


Book ChapterDOI
14 Jul 1991
TL;DR: In this paper, error-correcting output codes are employed as a distributed output representation to improve the performance of ID3 on the NETtalk task and of backpropagation on an isolated-letter speech-recognition task.
Abstract: Multiclass learning problems involve finding a definition for an unknown function f(x) whose range is a discrete set containing k > 2 values (i.e., k "classes"). The definition is acquired by studying large collections of training examples of the form 〈Xi, f(Xi)〉. Existing approaches to this problem include (a) direct application of multiclass algorithms such as the decision-tree algorithms ID3 and CART, (b) application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and (c) application of binary concept learning algorithms with distributed output codes such as those employed by Sejnowski and Rosenberg in the NETtalk system. This paper compares these three approaches to a new technique in which BCH error-correcting codes are employed as a distributed output representation. We show that these output representations improve the performance of ID3 on the NETtalk task and of backpropagation on an isolated-letter speech-recognition task. These results demonstrate that error-correcting output codes provide a general-purpose method for improving the performance of inductive learning programs on multiclass problems.

188 citations


Proceedings ArticleDOI
15 Aug 1991
TL;DR: A systematic investigation and comparison of two fundamental quantities in learning and information theory: the probability of an incorrect prediction for an optimal learning algorithm, and the Shannon information gain is undertaken.
Abstract: In this paper we study a Bayesian or average-case model of concept learning with a twofold goal: to provide more precise characterizations of learning curve (sample complexity) behavior that depend on properties of both the prior distribution over concepts and the sequence of instances seen by the learner, and to smoothly unite in a common framework the popular statistical physics and VC dimension theories of learning curves. To achieve this, we undertake a systematic investigation and comparison of two fundamental quantities in learning and information theory: the probability of an incorrect prediction for an optimal learning algorithm, and the Shannon information gain. This study leads to a new understanding of the sample complexity of learning in several existing models.

Journal ArticleDOI
TL;DR: Neural network models for adaptive control of arm movement trajectories during visually guided reaching and, more generally, a framework for unsupervised real-time error-based learning are described.

Journal ArticleDOI
TL;DR: The possibility of using unsupervised and supervised learning paradigms to discover what combination of raw measurements are significant in determining CCT is considered and an example of a 4-machine power system is used to illustrate the suggested approach.
Abstract: It is highly desirable that the security and stability of electric power systems after exposure to large disturbances be assessable. In this connection, the critical clearing time (CCT) is an attribute which provides significant information about the quality of the post-fault system behavior. It may be regarded as a complex mapping of the prefault, fault-on, and post-fault system conditions in the time domain. Y.-H. Pao and D.J. Solajic (1989) showed that a feedforward neural network can be used to learn this mapping and successfully perform under variable system operating conditions and topologies. In that work the system was described in terms of some conventionally used parameters. In contrast to using those pragmatic features selected on the basis of the engineering understanding of the problem, the possibility of using unsupervised and supervised learning paradigms to discover what combination of raw measurements are significant in determining CCT is considered. Correlation analysis and Euclidean metric are used to specify interfeature dependencies. An example of a 4-machine power system is used to illustrate the suggested approach. >

Journal ArticleDOI
TL;DR: This paper describes the major approaches that have been taken to model unsupervised learning, and gives an in-depth review of several examples of each approach.
Abstract: Supervised learning procedures for neural networks have recently met with considerable success in learning difficult mappings. However, their range of applicability is limited by their poor scaling behavior, lack of biological plausibility, and restriction to problems for which an external teacher is available. A promising alternative is to develop unsupervised learning algorithms which can adaptively learn to encode the statistical regularities of the input patterns, without being told explicitly the correct response for each pattern. In this paper, we describe the major approaches that have been taken to model unsupervised learning, and give an in-depth review of several examples of each approach.

Book ChapterDOI
01 Jun 1991
TL;DR: Providing domain knowledge to the integrated system can decrease the amount of search required during learning and increase the accuracy of learned concepts, even when the domain knowledge is incorrect and incomplete.
Abstract: We describe a new approach to integrating explanation-based and empirical learning methods for learning relational concepts. The approach uses an information-based heuristic to evaluate components of a hypothesis that are proposed either by explanation-based or empirical learning methods. Providing domain knowledge to the integrated system can decrease the amount of search required during learning and increase the accuracy of learned concepts, even when the domain knowledge is incorrect and incomplete.

Book ChapterDOI
01 Jan 1991
TL;DR: In this paper, a cascade-correlation learning algorithm has been used to predict realvalued timeseries, and the results of learning to predict the Mackey-glass chaotic timeseries using Cascade-Correlation are compared with other neural net learning algorithms as well as standard techniques.
Abstract: The cascade-correlation learning algorithm has been shown to learn some binary output tasks 10-100 times more quickly than back-propagation. This paper shows that the cascade-correlation algorithm can be used to predict a real-valued timeseries. Results of learning to predict the Mackey-Glass chaotic timeseries using Cascade-Correlation are compared with other neural net learning algorithms as well as standard techniques. Learning speed results are presented in terms that allow easy comparison between cascade-correlation and other learning algorithms, independent of machine architecture or simulator implementation.


Journal ArticleDOI
TL;DR: The CL approach provides a general unified framework for developing new learning algorithms and shows that many different types of clamping and teacher signals are possible, as well as examining two extensions of contrastive learning to time-dependent trajectories.
Abstract: The concept of Contrastive Learning (CL) is developed as a family of possible learning algorithms for neural networks. CL is an extension of Deterministic Boltzmann Machines to more general dynamical systems. During learning, the network oscillates between two phases. One phase has a teacher signal and one phase has no teacher signal. The weights are updated using a learning rule that corresponds to gradient descent on a contrast function that measures the discrepancy between the free network and the network with a teacher signal. The CL approach provides a general unified framework for developing new learning algorithms. It also shows that many different types of clamping and teacher signals are possible. Several examples are given and an analysis of the landscape of the contrast function is proposed with some relevant predictions for the CL curves. An approach that may be suitable for collective analog implementations is described. Simulation results and possible extensions are briefly discussed together with a new conjecture regarding the function of certain oscillations in the brain. In the appendix, we also examine two extensions of contrastive learning to time-dependent trajectories.

Book ChapterDOI
01 Jan 1991
TL;DR: It is suggested that given a fixed amount of computational power available per control action, it may be better to use a direct reinforcement learning method augmented with indirect techniques than to devote all available resources to a computationally costly indirect method.
Abstract: Following terminology used in adaptive control, we distinguish between indirect learning methods, which learn explicit models of the dynamic structure of the system to be controlled, and direct learning methods, which do not. We compare an existing indirect method, which uses a conventional dynamic programming algorithm, with a closely related direct reinforcement learning method by applying both methods to an infinite horizon Markov decision problem with unknown state-transition probabilities. The simulations show that although the direct method requires much less space and dramatically less computation per control action, its learning ability in this task is superior to, or compares favorably with, that of the more complex indirect method. Although these results do not address how the methods’ performances compare as problems become more difficult, they suggest that given a fixed amount of computational power available per control action, it may be better to use a direct reinforcement learning method augmented with indirect techniques than to devote all available resources to a computationally costly indirect method. Comprehensive answers to the questions raised by this study depend on many factors making up the economic context of the computation.

Book ChapterDOI
01 Jun 1991
TL;DR: A new method for learning to refine the control rules of approximate reasoning-based controllers that can use the control knowledge of an experienced operator and fine-tune it through the process of learning.
Abstract: Previous reinforcement learning models for learning control do not use existing knowledge of a physical system's behavior, but rather train the network from scratch. The learning process is usually long, and even after the learning is completed, the resulting network can not be easily explained. On the other hand, approximate reasoning-based controllers provide a clear understanding of the control strategy but can not learn from experience. In this paper, we introduce a new method for learning to refine the control rules of approximate reasoning-based controllers. A reinforcement learning technique is used in conjunction with a multi-layer neural network model of an approximate reasoning-based controller. The model learns by updating its prediction of the physical system's behavior. Unlike previous models, our model can use the control knowledge of an experienced operator and fine-tune it through the process of learning. We demonstrate the application of the new approach to a small but challenging real-world control problem.

Proceedings ArticleDOI
08 Jul 1991
TL;DR: It is pointed out that the genetic algorithms which have been shown to yield good performance for neural network weight optimization are really genetic hill-climbers, with a strong reliance on mutation rather than hyperplane sampling.
Abstract: It is pointed out that the genetic algorithms which have been shown to yield good performance for neural network weight optimization are really genetic hill-climbers, with a strong reliance on mutation rather than hyperplane sampling. Neural control problems are more appropriate for these genetic hill-climbers than supervised learning applications because in reinforcement learning applications gradient information is not directly available. Genetic reinforcement learning produces competitive results with the adaptive heuristic critic method, another reinforcement learning paradigm for neural networks that employs temporal difference methods. The genetic hill-climbing algorithm appears to be robust over a wide range of learning conditions. >

Proceedings ArticleDOI
15 Aug 1991
TL;DR: The fundamental question of pattern recognition is investigated - the problem of correctly labeling samples given m labeled samples in a partially supervised, partially unsupervised learning environment and it is shown that the first labeled sample reduces the probability of error from 1/2 to 2 R *(1 - R *).
Abstract: Generalization begins where learning ends. We provide degrees of freedom argument in support of this statement. We then investigate the fundamental question of pattern recognition - the problem of correctly labeling samples given m labeled samples in a partially supervised, partially unsupervised learning environment. Let R* be the probability of classification error based on m=∞ labeled samples. We describe some joint work with Vittorio Castelli on the twoclass problem with n=∞ unlabeled samples drawn from an identifiable family of distributions. We show that the first labeled sample reduces the probability of error from 1/2 to 2 R *(1 - R *). Thus the remainder of the labeled samples reduces the probability of error by at most a factor of two. In this sense, one half of the classification information is contained in the first labeled sample. (The resemblance with the same answer for the nearest-neighbor probability of error is purely coincidental.)

Proceedings ArticleDOI
18 Nov 1991
TL;DR: The author presents two implemented neuronal methods for free-text database search that exhibits much better scalability than its statistical counterparts, resulting in higher speeds, less memory needs, and easier maintainability.
Abstract: The author presents two implemented neuronal methods for free-text database search in details. In the first method, a specific interest (or query) is taught to a Kohonen feature map. By using this network as a neural filter on a dynamic free-text database, only the associated subjects are selected from this database. The second method can be used in a more static environments. Statistical properties (n-grams) from various texts are taught to a feature map. A comparison of a query with this feature map results in the selection of texts with are closely related with respect to their contents. Both methods are compared with classical statistical information-retrieval algorithms. Various simulations show that the neural net converges towards a proper representation of the query as well as the objects in the database. The first algorithm exhibits much better scalability than its statistical counterparts, resulting in higher speeds, less memory needs, and easier maintainability. The second one shows an elegant and uniform generalization and association method, increasing the selection quality. >

Proceedings Article
24 Aug 1991
TL;DR: Comparative experiments show the derived Bayesian algorithm is consistently as good or better, although sometimes at computational cost, than the several mature AI and statistical families of tree learning algorithms currently in use.
Abstract: This paper describes how a competitive tree learning algorithm can be derived from first principles. The algorithm approximates the Bayesian decision theoretic solution to the learning task. Comparative experiments with the algorithm and the several mature AI and statistical families of tree learning algorithms currently in use show the derived Bayesian algorithm is consistently as good or better, although sometimes at computational cost. Using the same strategy, we can design algorithms for many other supervised and model learning tasks given just a probabilistic representation for the kind of knowledge to be learned. As an illustration, a second learning algorithm is derived for learning Bayesian networks from data. Implications to incremental learning and the use of multiple models are also discussed.

Journal ArticleDOI
TL;DR: Structural stability is proved for a large class of unsupervised nonlinear feedback neural networks, adaptive bidirectional associative memory (ABAM) models, and it is proved that this much larger family of models, random ABAM (RABAM)models, is globally stable.
Abstract: Structural stability is proved for a large class of unsupervised nonlinear feedback neural networks, adaptive bidirectional associative memory (ABAM) models. The approach extends the ABAM models to the random-process domain as systems of stochastic differential equations and appends scaled Brownian diffusions. It is also proved that this much larger family of models, random ABAM (RABAM) models, is globally stable. Intuitively, RABAM equilibria equal ABAM equilibria that vibrate randomly. The ABAM family includes many unsupervised feedback and feedforward neural models. All RABAM models permit Brownian annealing. The RABAM noise suppression theorem characterizes RABAM system vibration. The mean-squared activation and synaptic velocities decrease exponentially to their lower hounds, the respective temperature-scaled noise variances. The many neuronal and synaptic parameters missing from such neural network models are included, but as net random unmodeled effects. They do not affect the structure of real-time global computations. >

Book ChapterDOI
01 Sep 1991
TL;DR: This chapter presents the models of inductive learning, and it is suggested that an important evaluation task for these systems is attribute prediction.
Abstract: Publisher Summary This chapter presents the models of inductive learning Concept formation and unsupervised learning generally are not viewed as methods of improving performance Rather, an implicit assumption is that the primary performance task of interest for unsupervised methods is communicability or rediscovery However, it is suggested that an important evaluation task for these systems is attribute prediction An explicit consideration of suitable performance tasks can have significant implications on the design of both psychological and computational models of unsupervised learning, but the importance of this observation is sometimes overlooked In addition, research on concept formation continues to progress in several directions that are shared by other learning paradigms For example, important areas concern more complete representation languages for objects and concepts, notably structured descriptions that place an added burden on search The complications caused by noise in the environment are a traditional research topic in supervised scenarios, and it is receiving increased attention in concept formation and unsupervised learning

Proceedings ArticleDOI
08 Jul 1991
TL;DR: A framework for texture analysis based on combined unsupervised and supervised learning and suggested as a general framework for pattern classification for texture classification.
Abstract: A framework for texture analysis based on combined unsupervised and supervised learning is proposed. The textured input is represented in the frequency-orientation space via a Gabor-wavelet pyramidal decomposition. In the unsupervised learning phase a neural network vector quantization scheme is used for the quantization of the feature-vector attributes and a projection onto a reduced dimension clustered map for initial segmentation. A supervised stage follows, in which labeling of the textured map is achieved using a rule-based system. A set of informative features are extracted in the supervised stage as congruency rules between attributes using an information-theoretic measure. This learned set can now act as a classification set for test images. This approach is suggested as a general framework for pattern classification. Simulation results for the texture classification are given. >

Proceedings Article
14 Jul 1991
TL;DR: The approach integrates cost-sensitive learning with reinforcement learning to learn an efficient internal state representation and a decision policy simultaneously in a finite, deterministic environment and maximizes the long-term discounted reward per action and reduces the average sensing cost per state.
Abstract: Stadard reinforcement learning methods assume they can identify each state distinctly before making an action decision. In reality, a robot agent only has a limited sensing capability and identifying each state by extensive sensing can be time consuming. This paper describes an approach that learns active perception strategies in reinforcement learning and considers sensing costs explicitly. The approach integrates cost-sensitive learning with reinforcement learning to learn an efficient internal state representation and a decision policy simultaneously in a finite, deterministic environment. It not only maximizes the long-term discounted reward per action but also reduces the average sensing cost per state. The initial experimental results in a simulated robot navigation domain are encouraging.

Proceedings ArticleDOI
18 Nov 1991
TL;DR: The authors present a learning algorithm that uses a genetic algorithm for creating novel examples to teach multilayer feedforward networks and shows that the self-teaching neural networks not only reduce the teaching efforts of the human, but the genetically created examples also contribute robustly to the improvement of generalization performance and the interpretation of the connectionist knowledge.
Abstract: The authors introduce an active learning paradigm for neural networks. In contrast to the passive paradigm, the learning in the active paradigm is initiated by the machine learner instead of its environment or teacher. The authors present a learning algorithm that uses a genetic algorithm for creating novel examples to teach multilayer feedforward networks. The creative learning networks, based on their own knowledge, discover new examples, criticize and select useful ones, train themselves, and thereby extend their existing knowledge. Experiments on function extrapolation show that the self-teaching neural networks not only reduce the teaching efforts of the human, but the genetically created examples also contribute robustly to the improvement of generalization performance and the interpretation of the connectionist knowledge. >

Proceedings Article
02 Dec 1991
TL;DR: A more direct approach to invariant learning based on an anti-Hebbian learning rule is suggested, an unsupervised two-layer network implementing this method in a competitive setting learns to extract coherent depth information from random-dot stereograms.
Abstract: Although the detection of invariant structure in a given set of input patterns is vital to many recognition tasks, connectionist learning rules tend to focus on directions of high variance (principal components). The prediction paradigm is often used to reconcile this dichotomy; here we suggest a more direct approach to invariant learning based on an anti-Hebbian learning rule. An unsupervised two-layer network implementing this method in a competitive setting learns to extract coherent depth information from random-dot stereograms.