scispace - formally typeset
Search or ask a question

Showing papers on "Unsupervised learning published in 1993"


Journal ArticleDOI
TL;DR: An optimized self-organizing map algorithm has been used to obtain protein topological (proteinotopic) maps and analysis of the proteinotopic map reveals that the network extracts the main secondary structure features even with the small number of examples used.
Abstract: An optimized self-organizing map algorithm has been used to obtain protein topological (proteinotopic) maps. A neural network is able to arrange a set of proteins depending on their ultraviolet circular dichroism spectra in a completely unsupervised learning process. Analysis of the proteinotopic map reveals that the network extracts the main secondary structure features even with the small number of examples used. Some methods to use the proteinotopic map for protein secondary structure prediction are tested showing a good performance in the 200-240 nm wavelength range that is likely to increase as new protein structures are known.

1,010 citations


Book ChapterDOI
27 Jun 1993
TL;DR: A general method is presented that allows predictions to use both instance-based and model-based learning, and improves with three approaches to constructing models and with eight datasets demonstrate improvements due to the composite method.
Abstract: This paper concerns learning tasks that require the prediction of a continuous value rather than a discrete class. A general method is presented that allows predictions to use both instance-based and model-based learning. Results with three approaches to constructing models and with eight datasets demonstrate improvements due to the composite method. Keywords: learning with continuous classes, instance-based learning, model-based learning, empirical evaluation.

705 citations


Journal ArticleDOI
TL;DR: Experimental results show that RPCL outperforms FSCL when used for unsupervised classification, for training a radial basis function (RBF) network, and for curve detection in digital images.
Abstract: It is shown that frequency sensitive competitive learning (FSCL), one version of the recently improved competitive learning (CL) algorithms, significantly deteriorates in performance when the number of units is inappropriately selected. An algorithm called rival penalized competitive learning (RPCL) is proposed. In this algorithm, not only is the winner unit modified to adapt to the input for each input, but its rival (the 2nd winner) is delearned by a smaller learning rate. RPCL can be regarded as an unsupervised extension of Kohonen's supervised LVQ2. RPCL has the ability to automatically allocate an appropriate number of units for an input data set. The experimental results show that RPCL outperforms FSCL when used for unsupervised classification, for training a radial basis function (RBF) network, and for curve detection in digital images. >

634 citations


Proceedings Article
29 Nov 1993
TL;DR: A framework based on maximum likelihood density estimation for learning from high-dimensional data sets with arbitrary patterns of missing data is presented and results from a classification benchmark--the iris data set--are presented.
Abstract: Real-world learning tasks may involve high-dimensional data sets with arbitrary patterns of missing data. In this paper we present a framework based on maximum likelihood density estimation for learning from such data set.s. We use mixture models for the density estimates and make two distinct appeals to the Expectation-Maximization (EM) principle (Dempster et al., 1977) in deriving a learning algorithm--EM is used both for the estimation of mixture components and for coping with missing data. The resulting algorithm is applicable to a wide range of supervised as well as unsupervised learning problems. Results from a classification benchmark--the iris data set--are presented.

630 citations


Journal ArticleDOI
TL;DR: The method of abstracting the genetic algorithm to the problem level, described here for the supervised inductive learning, can be also extended to other domains and tasks, since it provides a framework for combining recently popular genetic algorithm methods with traditional problem-solving methodologies.
Abstract: Supervised learning in attribute-based spaces is one of the most popular machine learning problems studied and, consequently, has attracted considerable attention of the genetic algorithm community. The full-memory approach developed here uses the same high-level descriptive language that is used in rule-based systems. This allows for an easy utilization of inference rules of the well-known inductive learning methodology, which replace the traditional domain-independent operators and make the search task-specific. Moreover, a closer relationship between the underlying task and the processing mechanisms provides a setting for an application of more powerful task-specific heuristics. Initial results obtained with a prototype implementation for the simplest case of single concepts indicate that genetic algorithms can be effectively used to process high-level concepts and incorporate task-specific knowledge. The method of abstracting the genetic algorithm to the problem level, described here for the supervised inductive learning, can be also extended to other domains and tasks, since it provides a framework for combining recently popular genetic algorithm methods with traditional problem-solving methodologies. Moreover, in this particular case, it provides a very powerful tool enabling study of the widely accepted but not so well understood inductive learning methodology.

260 citations


Proceedings Article
29 Nov 1993
TL;DR: This paper shows that minimizing the disagreement between the outputs of networks processing patterns from these different modalities is a sensible approximation to minimizing the number of misclassifications in each modality, and leads to similar results.
Abstract: One of the advantages of supervised learning is that the final error metric is available during training. For classifiers, the algorithm can directly reduce the number of misclassifications on the training set. Unfortunately, when modeling human learning or constructing classifiers for autonomous robots, supervisory labels are often not available or too expensive. In this paper we show that we can substitute for the labels by making use of structure between the pattern distributions to different sensory modalities. We show that minimizing the disagreement between the outputs of networks processing patterns from these different modalities is a sensible approximation to minimizing the number of misclassifications in each modality, and leads to similar results. Using the Peterson-Barney vowel dataset we show that the algorithm performs well in finding appropriate placement for the codebook vectors particularly when the confuseable classes are different for the two modalities.

245 citations


Journal ArticleDOI
TL;DR: On a simulated inverted-pendulum control problem, “genetic reinforcement learning” produces competitive results with AHC, another well-known reinforcement learning paradigm for neural networks that employs the temporal difference method.
Abstract: Empirical tests indicate that at least one class of genetic algorithms yields good performance for neural network weight optimization in terms of learning rates and scalability. The successful application of these genetic algorithms to supervised learning problems sets the stage for the use of genetic algorithms in reinforcement learning problems. On a simulated inverted-pendulum control problem, “genetic reinforcement learning” produces competitive results with AHC, another well-known reinforcement learning paradigm for neural networks that employs the temporal difference method. These algorithms are compared in terms of learning rates, performance-based generalization, and control behavior over time.

186 citations


11 Jul 1993
TL;DR: Several meta-learning strategies for integrating independently learned classifiers by the same learner in a parallel and distributed computing environment are outlined, particularly suited for massive amounts of data that main-memory-based learning algorithms cannot efficiently handle.
Abstract: Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of very large network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world importance. Learning techniques are central to knowledge discovery and the approach proposed in this paper may substantially increase the amount of data a knowledge discovery system can handle effectively. Meta-learning is proposed as a general technique to integrating a number of distinct learning processes. This paper details several meta-learning strategies for integrating independently learned classifiers by the same learner in a parallel and distributed computing environment. Our strategies are particularly suited for massive amounts of data that main-memory-based learning algorithms cannot efficiently handle. The strategies are also independent of the particular learning algorithm used and the underlying parallel and distributed platform. Preliminary experiments using different data sets and algorithms demonstrate encouraging results: parallel learning by meta-learning can achieve comparable prediction accuracy in less space and time than purely serial learning.

185 citations


Proceedings ArticleDOI
01 Dec 1993
TL;DR: Meta-leaming is proposed as a general technique to combine the results of multiple learning algorithms, each applied to a set of training data, to improve overall prediction accuracy.
Abstract: In this paper, we propose meta-leaming as a general technique to combine the results of multiple learning algorithms, each applied to a set of training data. We detail several metalearning strategies for combining independently learned classifiers, each computed by different algorithms, to improve overall prediction accuracy. The overall resulting classifier is composed of the classifiers generated by the different learning algorithms and a meta-classifier generated by a meta-learning strategy. The strategies described here are independent of the learning algorithms used. Preliminm-y experiments using different strategies and learning algorithms on two molecular biology sequence analysis data sets demonstrate encouraging results. Machine learning techniques are central to automated knowledge discovery systems and hence our approach can enhance the effectiveness of such systems.

181 citations


Proceedings Article
11 Jul 1993
TL;DR: This paper analyzes the complexity of on-line reinforcement learning algorithms, namely asynchronous realtime versions of Q-learning and value-iteration, applied to the problem of reaching a goal state in deterministic domains and shows that the algorithms are tractable with only a simple change in the task representation or initialization.
Abstract: This paper analyzes the complexity of on-line reinforcement learning algorithms, namely asynchronous realtime versions of Q-learning and value-iteration, applied to the problem of reaching a goal state in deterministic domains. Previous work had concluded that, in many cases, tabula rasa reinforcement learning was exponential for such problems, or was tractable only if the learning algorithm was augmented. We show that, to the contrary, the algorithms are tractable with only a simple change in the task representation or initialization. We provide tight bounds on the worst-case complexity, and show how the complexity is even smaller if the reinforcement learning algorithms have initial knowledge of the topology of the state space or the domain has certain special properties. We also present a novel bidirectional Q-learning algorithm to find optimal paths from all states to a goal state and show that it is no more complex than the other algorithms.

134 citations


Book ChapterDOI
27 Jun 1993
TL;DR: It is shown that simple random-representation methods can perform as well as nearest-neighbor methods (while being more suited to online learning), and signicantly better than backpropagation, and suggest that randomness has a useful role to play in online supervised learning and constructive induction.
Abstract: We consider the requirements of online learning|learning which must be done incrementally and in realtime, with the results of learning available soon after each new example is acquired. Despite the abundance of methods for learning from examples, there are few that can be used eectively for online learning, e.g., as components of reinforcement learning systems. Most of these few, including radial basis functions, CMACs, Kohonen’s self-organizing maps, and those developed in this paper, share the same structure. All expand the original input representation into a higher dimensional representation in an unsupervised way, and then map that representation to the nal answer using a relatively simple supervised learner, such as a perceptron or LMS rule. Such structures learn very rapidly and reliably, but have been thought either to scale poorly or to require extensive domain knowledge. To the contrary, some researchers (Rosenblatt, 1962; Gallant & Smith, 1987; Kanerva, 1988; Prager & Fallside, 1988) have argued that the expanded representation can be chosen largely at random with good results. The main contribution of this paper is to develop and test this hypothesis. We show that simple random-representation methods can perform as well as nearest-neighbor methods (while being more suited to online learning), and signicantly better than backpropagation. We nd that the size of the random representation does increase with the dimensionality of the problem, but not unreasonably so, and that the required size can be reduced substantially using unsupervisedlearning techniques. Our results suggest that randomness has a useful role to play in online supervised learning and constructive induction. 1. Online Learning Applications of supervised learning can be divided into two types: online and oine.

Journal ArticleDOI
TL;DR: A local feature measure determining how much a single feature reduces the total redundancy is derived which turns out to depend only on the probability of the feature and of its components, but not on the statistical properties of any other features.
Abstract: A redundancy reduction strategy, which can be applied in stages, is proposed as a way to learn as efficiently as possible the statistical properties of an ensemble of sensory messages. The method works best for inputs consisting of strongly correlated groups, that is features, with weaker statistical dependence between different features. This is the case for localized objects in an image or for words in a text. A local feature measure determining how much a single feature reduces the total redundancy is derived which turns out to depend only on the probability of the feature and of its components, but not on the statistical properties of any other features. The locality of this measure makes it ideal as the basis for a "neural" implementation of redundancy reduction, and an example of a very simple non-Hebbian algorithm is given. The effect of noise on learning redundancy is also discussed.

Journal ArticleDOI
TL;DR: The present work shows the capabilities of self-organizing feature maps for the analysis and representation of financial data and for aid in financial decision-making.
Abstract: Many recent papers have dealt with the application of feedforward neural networks in financial data processing. This powerful neural model can implement very complex nonlinear mappings, but when outputs are not available or clustering of patterns is required, the use of unsupervised models such as self-organizing maps is more suitable. The present work shows the capabilities of self-organizing feature maps for the analysis and representation of financial data and for aid in financial decision-making. For this purpose, we analyse the Spanish banking crisis of 1977–1985 and the Spanish economic situation in 1990 and 1991, making use of this unsupervised model. Emphasis is placed on the analysis of the synaptic weights, fundamental for delimiting regions on the map, such as bankrupt or solvent regions, where similar companies are clustered. The time evolution of the companies and other important conclusions can be drawn from the resulting maps.

Book ChapterDOI
27 Jun 1993
TL;DR: This research concludes that it is possible to build artificial agents that can acquire complex control policies effectively by reinforcement learning and enable its applications to complex robot- learning problems.
Abstract: The aim of this research is to extend the state of the art of reinforcement learning and enable its applications to complex robot- learning problems. This paper presents a series of scaling-up extensions to reinforcement learning, including: generalization by neural networks, using action models, teaching, hierarchical learning , and having a short-term memory. These extensions have been tested in a physically-realistic robot simulator, and combined to solve a complex robot-learning problem. Simulation results indicate that each of the extensions could result in either significant learning speedup or new capabilities. This research concludes that it is possible to build artificial agents that can acquire complex control policies effectively by reinforcement learning.

Book ChapterDOI
TL;DR: This work studies on-line learning processes in artificial neural networks from a general point of view, and applies the results on the transitions from “twists” in two-dimensional self-organizing maps to perfectly ordered configurations.
Abstract: We study on-line learning processes in artificial neural networks from a general point of view. On-line learning means that a learning step takes place at each presentation of a randomly drawn training pattern. It can be viewed as a stochastic process governed by a continuous-time master equation. On-line learning is necessary if not all training patterns are available all the time. This occurs in many applications when the training patterns are drawn from a time-dependent environmental distribution. Studying learning in a changing environment, we encounter a conflict between the adaptability and the confidence of the network's representation. Minimization of a criterion incorporating both effects yields an algorithm for on-line adaptation of the learning parameter. The inherent noise of on-line learning makes it possible to escape from undesired local minima of the error potential on which the learning rule performs (stochastic) gradient descent. We try to quantify these often made claims by considering the transition times between various minima. We apply our results on the transitions from “twists” in two-dimensional self-organizing maps to perfectly ordered configurations. Finally, we discuss the capabilities of on-line learning for global optimization.

Journal ArticleDOI
TL;DR: A biologically plausible extension of the HyperBF model is developed that takes into account basic features of the functional architecture of early vision and results of psychophysical experiments are reported that are consistent with the hypothesis that activity-dependent presynaptic amplification may be involved in perceptual learning in hyperacuity.
Abstract: Performance of human subjects in a wide variety of early visual processing tasks improves with practice HyperBF networks (Poggio and Girosi 1990) constitute a mathematically well-founded framework for understanding such improvement in performance, or perceptual learning, in the class of tasks known as visual hyperacuity The present article concentrates on two issues raised by the recent psychophysical and computational findings reported in Poggio et al (1992b) and Fahle and Edelman (1992) First, we develop a biologically plausible extension of the HyperBF model that takes into account basic features of the functional architecture of early vision Second, we explore various learning modes that can coexist within the HyperBF framework and focus on two unsupervised learning rules that may be involved in hyperacuity learning Finally, we report results of psychophysical experiments that are consistent with the hypothesis that activity-dependent presynaptic amplification may be involved in perceptual learning in hyperacuity

Journal ArticleDOI
01 Jul 1993
TL;DR: A novel AI-based system for generating static schedules that makes heavy use of an unsupervised learning module in acquiring significant portions of the requisite problem processing knowledge, and pursues a hybrid schedule generation strategy.
Abstract: Existing computerized systems that support scheduling decisions for flexible manufacturing systems (FMS's) rely largely on knowledge acquired through rote learning for schedule generation. In a few instances, the systems also possess some ability to learn using deduction or supervised induction. We introduce a novel AI-based system for generating static schedules that makes heavy use of an unsupervised learning module in acquiring significant portions of the requisite problem processing knowledge. This scheduler pursues a hybrid schedule generation strategy wherein it effectively combines knowledge acquired via genetics-based unsupervised induction with rote-learned knowledge in generating high-quality schedules in an efficient manner. Through a series of experiments conducted on a randomly generated problem of practical complexity, we show that the hybrid scheduler strategy is viable, promising, and, worthy of more in-depth investigations. >

Book ChapterDOI
27 Jun 1993
TL;DR: A density-adaptive reinforcement learning and a density adaptive forgetting algorithm that deletes observations from the learning set depending on whether subsequent evidence is available in a local region of the parameter space.
Abstract: We describe a density-adaptive reinforcement learning and a density-adaptive forgetting algorithm. This learning algorithm uses hybrid D κ-D/2 κ -trees to allow for a variable resolution partitioning and labelling of the input space. The density adaptive forgetting algorithm deletes observations from the learning set depending on whether subsequent evidence is available in a local region of the parameter space. The algorithms are demonstrated in a simulation for learning feasible robotic grasp approach directions and orientations and then adapting to subsequent mechanical failures in the gripper.

Book ChapterDOI
01 Jan 1993
TL;DR: The system enhances the reasoning capabilities of classical expert systems with the ability of generalise and the handling of incomplete cases and uses neural nets with unsupervised learning algorithms to extract regularities out of case data.
Abstract: In this work we present the integration of neural networks with a rule based expert system. The system realizes the automatic acquisition of knowledge out of a set of examples. It enhances the reasoning capabilities of classical expert systems with the ability of generalise and the handling of incomplete cases. It uses neural nets with unsupervised learning algorithms to extract regularities out of case data. A symbolic rule generator transforms these regularities into Prolog rules. The generated rules and the trained neural nets are embedded into the expert system as knowledge bases. In the system’s diagnosis phase it is possible to use these knowledge bases together with human expert’s knowledge bases in order to diagnose a unknown case. Furthermore the system is able to diagnose and to complete inconsistent data using the trained neural nets exploiting their ability to generalise.

Proceedings ArticleDOI
28 Mar 1993
TL;DR: It is shown how reinforcement learning can be made practical for complex problems by introducing hierarchical learning and artificial neural networks are used to generalize experiences.
Abstract: It is shown how reinforcement learning can be made practical for complex problems by introducing hierarchical learning. The agent at first learns elementary skills for solving elementary problems. To learn a new skill for solving a complex problem later on, the agent can ignore the low-level details and focus on the problem of coordinating the elementary skills it has developed. A physically-realistic mobile robot simulator is used to demonstrate the success and importance of hierarchical learning. For fast learning, artificial neural networks are used to generalize experiences, and a teaching technique is employed to save many learning trials of the simulated robot. >

Proceedings Article
01 Jan 1993
TL;DR: This paper presents a method using likelihood ratios attached to clauses to classify test examples, and demonstrates that attaching weights and allowing concept descriptions to compete to classify examples reduces an algorithm's susceptibility to noise.
Abstract: Author(s): Ali, Kamal M.; Pazzani, Michael J. | Abstract: Many learning algorithms form concept descriptions composed of clauses, each of which covers some proportion of the positive training data and a small to zero proportion of the negative training data. This paper presents a method for attaching likelihood ratios to clauses and a method for using such ratios to classify test examples. This paper presents the relational concept learner HYDRA that learns a concept description for each class. Each concept description competes to classify the test example using the likelihood ratios assigned to clauses of that concept description. By testing on several artificial and "real world" domains, we demonstrate that attaching weights and allowing concept descriptions to compete to classify examples reduces an algorithm's susceptibility to noise.

Journal ArticleDOI
TL;DR: It is demonstrated with a system that learns to classify mirror symmetric pixel patterns from single examples with a different learning style where significant relations in the input pattern are recognized and expressed by the unsupervised self-organization of dynamic links.
Abstract: A large attraction of neural systems lies in their promise of replacing programming by learning. A problem with many current neural models is that with realistically large input patterns learning time explodes. This is a problem inherent in a notion of learning that is based almost entirely on statistical estimation. We propose here a different learning style where significant relations in the input pattern are recognized and expressed by the unsupervised self-organization of dynamic links. The power of this mechanism is due to the very general a priori principle of conservation of topological structure. We demonstrate that style with a system that learns to classify mirror symmetric pixel patterns from single examples.

Journal ArticleDOI
TL;DR: Experimental results are presented to show that oriented dynamic learning is far more efficient than dynamic learning in SOCRATES.
Abstract: An efficient technique for dynamic learning called oriented dynamic learning is proposed. Instead of learning being performed for almost all signals in the circuit, it is shown that it is possible to determine a subset of these signals to which all learning operations can be restricted. It is further shown that learning for this set of signals provides the same knowledge about the nonsolution areas in the decision trees as the dynamic learning of SOCRATES. High efficiency is achieved by limiting learning to certain learning lines that lie within a certain area of the circuit, called the active area. Experimental results are presented to show that oriented dynamic learning is far more efficient than dynamic learning in SOCRATES. >

Journal ArticleDOI
TL;DR: Two new models that handle surfaces with discontinuities are proposed that develop a mixture of expert interpolators and specialized, asymmetric interpolators that do not cross the discontinUities.
Abstract: We have previously described an unsupervised learning procedure that discovers spatially coherent properties of the world by maximizing the information that parameters extracted from different parts of the sensory input convey about some common underlying cause. When given random dot stereograms of curved surfaces, this procedure learns to extract surface depth because that is the property that is coherent across space. It also learns how to interpolate the depth at one location from the depths at nearby locations (Becker and Hinton 1992b). In this paper, we propose two new models that handle surfaces with discontinuities. The first model attempts to detect cases of discontinuities and reject them. The second model develops a mixture of expert interpolators. It learns to detect the locations of discontinuities and to invoke specialized, asymmetric interpolators that do not cross the discontinuities.

Proceedings Article
28 Aug 1993
TL;DR: The learning system SMART+ is described, that embeds sophisticated knowledge-based heuristics to control the search process and is able to deal with numerical features.
Abstract: Inducing concept descriptions in First Order Logic is inherently a complex task. There are two main reasons: on one hand, the task is usually formulated as a search problem inside a very large space of logical descriptions which needs strong heuristics to be kept to manageable size. On the other hand, most developed algorithms are unable to handle numerical features, typically occurring in realworld data. In this paper, we describe the learning system SMART+, that embeds sophisticated knowledge-based heuristics to control the search process and is able to deal with numerical features. SMART+ can use different learning strategies, such as inductive, deductive and abductive ones, and exploits both backgruond knowledge and statistical evaluation criteria. Furthermore, it can use simple Genetic Algorithms to refine predicate semantics and this aspect will be described in detail. Finally, an evaluation of SMART+ performances is made on a complex task.

Book ChapterDOI
13 Sep 1993
TL;DR: This work denominates the method projective mapping, which is the most common method in feed forward neural networks, where an input vector is projected on a “weight vector”.
Abstract: A response generating system can be seen as a mapping from a set of external states (inputs) to a set of actions (outputs). This mapping can be done in principally different ways. One method is to divide the state space into a set of discrete states and store the optimal response for each state. This is denominated a memory mapping system. Another method is to approximate continuous functions from the input space to the output space. I denominate this method projective mapping, although the function does not have to be linear. The latter method is the most common one in feed forward neural networks, where an input vector is projected on a “weight vector”.

Proceedings Article
28 Aug 1993
TL;DR: A learning method that combines explanation-based learning from a previously learned approximate domain theory, together with inductive learning from observations, based on a neural network representation of domain knowledge that is robust to errors in the domain theory.
Abstract: Many researchers have noted the importance of combining inductive and analytical learning, yet we still lack combined learning methods that are effective in practice. We present here a learning method that combines explanation-based learning from a previously learned approximate domain theory, together with inductive learning from observations. This method, called explanation-based neural network learning (EBNN), is based on a neural network representation of domain knowledge. Explanations are constructed by chaining together inferences from multiple neural networks. In contrast with symbolic approaches to explanation-based learning which extract weakest preconditions from the explanation, EBNN extracts the derivatives of the target concept with respect to the training example features. These derivatives summarize the dependencies within the explanation, and are used to bias the inductive learning of the target concept. Experimental results on a simulated robot control task show that EBNN requires significantly fewer training examples than standard inductive learning. Furthermore, the method is shown to be robust to errors in the domain theory, operating effectively over a broad spectrum from very strong to very weak domain theories.

Journal ArticleDOI
10 Mar 1993-EPL
TL;DR: It is found that the optimally-trained spherical perceptron may learn a linearly-separable rule as well as any possible network, and simulation results support these conclusions.
Abstract: We introduce optimal learning with a neural network, which we define as minimising the expectation generalisation error. We find that the optimally-trained spherical perceptron may learn a linearly-separable rule as well as any possible network. We sketch an algorithm to generate optimal learning, and simulation results support our conclusions. Optimal learning of a well-known, significant unlearnable problem, the mismatched weight problem, gives better asymptotic learning than conventional techniques, and may be simulated enormously more easily. Unlike many other learning schemes, optimal learning extends to more general networks learning more complex rules.

Journal ArticleDOI
TL;DR: An unsupervised segmentation strategy for textured images, based on a hierarchical model in terms of discrete Markov Random Fields, where the textures are modeled as Gaussian Gibbs Fields, while the image partition is modeled as a Markov Mesh Random Field.

02 Jan 1993
TL;DR: This thesis proposes a class of information-theoretic learning algorithms which cause a network to become tuned to spatially coherent features of visual images, and shows that this method works well for learning depth from random dot stereograms of curved surfaces.
Abstract: In the unsupervised learning paradigm, a network of neuron-like units is presented an ensemble of input patterns from a structured-environment, such as the visual world, and learns to represent the regularities in that input. The major goal in developing unsupervised learning algorithms is to find objective functions that characterize the quality of the network's representation without explicitly specifying the desired outputs of any of the units. Previous approaches in unsupervised learning, such as clustering, principal components analysis, and information-transmission-based methods, make minimal assumptions about the kind of structure in the environment, and they are good for preprocessing raw signal input. These methods try to model all of the structure in the environment in a single processing stage. The approach taken in this thesis is novel, in that our unsupervised learning algorithms do not try to preserve all of the information in the signal. Rather, we start by making strongly constraining assumptions about the kind of structure of interest in the environment. We then proceed to design learning algorithms which will discover precisely that structure. By constraining what kind of structure will be extracted by the network, we can force the network to discover higher level, more abstract features. Additionally, the constraining assumptions we make can provide a way of decomposing difficult learning problems into multiple simpler feature extraction stages. We propose a class of information-theoretic learning algorithms which cause a network to become tuned to spatially coherent features of visual images. Under Gaussian assumptions about the spatially coherent features in the environment, we have shown that this method works well for learning depth from random dot stereograms of curved surfaces. Using mixture models of coherence, these algorithms can be extended to deal with discontinuities, and to form multiple models of the regularities in the environment. Our simulations demonstrate the general utility of the Imax algorithms in discovering interesting, non-trivial structure (disparity and depth discontinuities) in artificial stereo images. This is the first attempt we know of to model perceptual learning beyond the earliest stages of low-level feature extraction, and to model multiple stages of unsupervised learning.