scispace - formally typeset
Search or ask a question

Showing papers in "Neural Computation in 1991"


Journal ArticleDOI
TL;DR: A new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases, which is demonstrated to be able to be solved by a very simple expert network.
Abstract: We present a new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases. The new procedure can be viewed either as a modular version of a multilayer supervised network, or as an associative version of competitive learning. It therefore provides a new link between these two apparently different approaches. We demonstrate that the learning procedure divides up a vowel discrimination task into appropriate subtasks, each of which can be solved by a very simple expert network.

4,338 citations


Journal ArticleDOI
TL;DR: It is proved thatRBF networks having one hidden layer are capable of universal approximation, and a certain class of RBF networks with the same smoothing factor in each kernel node is broad enough for universal approximation.
Abstract: There have been several recent studies concerning feedforward networks and the problem of approximating arbitrary functionals of a finite number of real variables. Some of these studies deal with cases in which the hidden-layer nonlinearity is not a sigmoid. This was motivated by successful applications of feedforward networks with nonsigmoidal hidden-layer units. This paper reports on a related study of radial-basis-function (RBF) networks, and it is proved that RBF networks having one hidden layer are capable of universal approximation. Here the emphasis is on the case of typical RBF networks, and the results show that a certain class of RBF networks with the same smoothing factor in each kernel node is broad enough for universal approximation.

3,755 citations


Journal ArticleDOI
TL;DR: A network that allocates a new computational unit whenever an unusual pattern is presented to the network, which learns much faster than do those using backpropagation networks and uses a comparable number of synapses.
Abstract: We have created a network that allocates a new computational unit whenever an unusual pattern is presented to the network. This network forms compact representations, yet learns easily and rapidly. The network can be used at any time in the learning process and the learning patterns do not have to be repeated. The units in this network respond to only a local region of the space of input values. The network learns by allocating new units and adjusting the parameters of existing units. If the network performs poorly on a presented pattern, then a new unit is allocated that corrects the response to the presented pattern. If the network performs well on a presented pattern, then the network parameters are updated using standard LMS gradient descent. We have obtained good results with our resource-allocating network (RAN). For predicting the Mackey-Glass chaotic time series, RAN learns much faster than do those using backpropagation networks and uses a comparable number of synapses.

1,403 citations


Journal ArticleDOI
TL;DR: Results of Monte Carlo simulations performed using multilayer perceptron (MLP) networks trained with backpropagation, radial basis function (RBF) networks, and high-order polynomial networks graphically demonstrate that network outputs provide good estimates of Bayesian probabilities.
Abstract: Many neural network classifiers provide outputs which estimate Bayesian a posteriori probabilities. When the estimation is accurate, network outputs can be treated as probabilities and sum to one. Simple proofs show that Bayesian probabilities are estimated when desired network outputs are 1 of M (one output unity, all others zero) and a squared-error or cross-entropy cost function is used. Results of Monte Carlo simulations performed using multilayer perceptron (MLP) networks trained with backpropagation, radial basis function (RBF) networks, and high-order polynomial networks graphically demonstrate that network outputs provide good estimates of Bayesian probabilities. Estimation accuracy depends on network complexity, the amount of training data, and the degree to which training data reflect true likelihood distributions and a priori class probabilities. Interpretation of network outputs as Bayesian probabilities allows outputs from multiple networks to be combined for higher level decision making, sim...

1,140 citations


Journal ArticleDOI
TL;DR: The training techniques that allow ALVINN to learn in under 5 minutes to autonomously control the Navlab by watching the reactions of a human driver are described.
Abstract: The ALVINN (Autonomous Land Vehicle In a Neural Network) project addresses the problem of training artificial neural networks in real time to perform difficult perception tasks. ALVINN is a backpropagation network designed to drive the CMU Navlab, a modified Chevy van. This paper describes the training techniques that allow ALVINN to learn in under 5 minutes to autonomously control the Navlab by watching the reactions of a human driver. Using these techniques, ALVINN has been trained to drive in a variety of circumstances including single-lane paved and unpaved roads, and multilane lined and unlined roads, at speeds of up to 20 miles per hour.

746 citations


Journal ArticleDOI
TL;DR: In this paper, a local learning rule is proposed to learn to generalize across such transformations, where the network is exposed to temporal sequences of patterns undergoing the transformation, and the network learns invariance to shift in retinal position.
Abstract: The visual system can reliably identify objects even when the retinal image is transformed considerably by commonly occurring changes in the environment. A local learning rule is proposed, which allows a network to learn to generalize across such transformations. During the learning phase, the network is exposed to temporal sequences of patterns undergoing the transformation. An application of the algorithm is presented in which the network learns invariance to shift in retinal position. Such a principle may be involved in the development of the characteristic shift invariance property of complex cells in the primary visual cortex, and also in the development of more complicated invariance properties of neurons in higher visual areas.

664 citations


Journal ArticleDOI
TL;DR: It is shown that a modification to the error functional allows smoothing to be introduced explicitly without significantly affecting the speed of training.
Abstract: An important feature of radial basis function neural networks is the existence of a fast, linear learning algorithm in a network capable of representing complex nonlinear mappings. Satisfactory generalization in these networks requires that the network mapping be sufficiently smooth. We show that a modification to the error functional allows smoothing to be introduced explicitly without significantly affecting the speed of training. A simple example is used to demonstrate the resulting improvement in the generalization properties of the network.

325 citations


Journal ArticleDOI
TL;DR: A delayed nonlinear oscillator is introduced to investigate temporal coding in neuronal networks and synchronization within two-dimensional layers consisting of oscillatory elements coupled by excitatory delay connections is shown.
Abstract: Current concepts in neurobiology of vision assume that local object features are represented by distributed neuronal populations in the brain. Such representations can lead to ambiguities if several distinct objects are simultaneously present in the visual field. Temporal characteristics of the neuronal activity have been proposed as a possible solution to this problem and have been found in various cortical areas. In this paper we introduce a delayed nonlinear oscillator to investigate temporal coding in neuronal networks. We show synchronization within two-dimensional layers consisting of oscillatory elements coupled by excitatory delay connections. The observed correlation length is large compared to coupling length. Following the experimental situation, we then demonstrate the response of such layers to two short stimulus bars of varying gap distance. Coherency of stimuli is reflected by the temporal correlation of the responses, which closely resembles the experimental observations.

313 citations


Journal ArticleDOI
TL;DR: It is shown that the local-recurrent global-feedforward model performs better than the local/local recurrent global feedforward model and the learning rule minimizing a mean square error criterion is derived.
Abstract: A new neural network architecture involving either local feedforward global feedforward, and/or local recurrent global feedforward structure is proposed. A learning rule minimizing a mean square error criterion is derived. The performance of this algorithm (local recurrent global feedforward architecture) is compared with a local-feedforward global-feedforward architecture. It is shown that the local-recurrent global-feedforward model performs better than the local-feedforward global-feedforward model.

227 citations


Journal ArticleDOI
TL;DR: These results on a large, high input dimensional problem demonstrate that practical constraints including training time, memory usage, and classification time often constrain classifier selection more strongly than small differences in overall error rate.
Abstract: Results of recent research suggest that carefully designed multiplayer neural networks with local receptive fields and shared weights may be unique in providing low error rates on handwritten digit recognition tasks. This study, however, demonstrates that these networks, radial basis function (RBF) networks, and k nearest-neighbor (kNN) classifiers, all provide similar low error rates on a large handwritten digit database. The backpropagation network is overall superior in memory usage and classification time but can provide false positive classifications when the input is not a digit. The backpropagation network also has the longest training time. The RBF classifier requires more memory and more classification time, but less training time. When high accuracy is warranted, the RBF classifier can generate a more effective confidence judgment for rejecting ambiguous inputs. The simple kNN classifier can also perform handwritten digit recognition, but requires a prohibitively large amount of memory and is much slower at classification. Nevertheless, the simplicity of the algorithm and fast training characteristics makes the kNN classifier an attractive candidate in hardware-assisted classification tasks. These results on a large, high input dimensional problem demonstrate that practical constraints including training time, memory usage, and classification time often constrain classifier selection more strongly than small differences in overall error rate.

187 citations


Journal ArticleDOI
TL;DR: In this article, it was shown that Kolmogorov's theorem on representations of continuous functions of n-variables by sums and superpositions of continuous function of one variable is relevant in the context of neural networks.
Abstract: We show that Kolmogorov's theorem on representations of continuous functions of n-variables by sums and superpositions of continuous functions of one variable is relevant in the context of neural networks. We give a version of this theorem with all of the one-variable functions approximated arbitrarily well by linear combinations of compositions of affine functions with some given sigmoidal function. We derive an upper estimate of the number of hidden units.

Journal ArticleDOI
TL;DR: The learning-based model described here demonstrates that a mechanism using only the dynamic activity in recurrent networks is sufficient to account for the observed phenomena.
Abstract: Two decades of single unit recording in monkeys performing short-term memory tasks has established that information can be stored as sustained neural activity. The mechanism of this information storage is unknown. The learning-based model described here demonstrates that a mechanism using only the dynamic activity in recurrent networks is sufficient to account for the observed phenomena. The temporal activity patterns of neurons in the model match those of real memory-associated neurons, while the model's gating properties and attractor dynamics provide explanations for puzzling aspects of the experimental data.

Journal ArticleDOI
Zhi-Quan Luo1
TL;DR: It is shown that, by dynamically decreasing the learning rate during each training cycle, the sequence of matrices generated by the LMS algorithm will converge to the optimal weight matrix.
Abstract: We consider the problem of training a linear feedforward neural network by using a gradient descent-like LMS learning algorithm. The objective is to find a weight matrix for the network, by repeatedly presenting to it a finite set of examples, so that the sum of the squares of the errors is minimized. Kohonen showed that with a small but fixed learning rate (or stepsize) some subsequences of the weight matrices generated by the algorithm will converge to certain matrices close to the optimal weight matrix. In this paper, we show that, by dynamically decreasing the learning rate during each training cycle, the sequence of matrices generated by the algorithm will converge to the optimal weight matrix. We also show that for any given ∊ > 0 the LMS algorithm, with decreasing learning rates, will generate an ∊-optimal weight matrix (i.e., a matrix of distance at most ∊ away from the optimal matrix) after O(1/∊) training cycles. This is in contrast to Ω(1/∊log 1/∊) training cycles needed to generate an ∊-optima...

Journal ArticleDOI
TL;DR: Results suggest that a large and representative training sample may be the single, most important factor in achieving high recognition accuracy in hand-printed character recognition systems, and benefits of reducing the number of net connections are discussed.
Abstract: We report on results of training backpropagation nets with samples of hand-printed digits scanned off of bank checks and hand-printed letters interactively entered into a computer through a stylus digitizer. Generalization results are reported as a function of training set size and network capacity. Given a large training set, and a net with sufficient capacity to achieve high performance on the training set, nets typically achieved error rates of 4-5% at a 0% reject rate and 1-2% at a 10% reject rate. The topology and capacity of the system, as measured by the number of connections in the net, have surprisingly little effect on generalization. For those developing hand-printed character recognition systems, these results suggest that a large and representative training sample may be the single, most important factor in achieving high recognition accuracy. Benefits of reducing the number of net connections, other than improving generalization, are discussed.

Journal ArticleDOI
TL;DR: It is postulate that an interesing behavior displayed by gaussian bar functions under gradient descent dynamics, which is called automatic connection pruning, is an important factor in the success of this representation.
Abstract: In investigating gaussian radial basis function (RBF) networks for their ability to model nonlinear time series, we have found that while RBF networks are much faster than standard sigmoid unit backpropagation for low-dimensional problems, their advantages diminish in high-dimensional input spaces. This is particularly troublesome if the input space contains irrelevant variables. We suggest that this limitation is due to the localized nature of RBFs. To gain the advantages of the highly nonlocal sigmoids and the speed advantages of RBFs, we propose a particular class of semilocal activation functions that is a natural interpolation between these two families. We present evidence that networks using these gaussian bar units avoid the slow learning problem of sigmoid unit networks, and, very importantly, are more accurate than RBF networks in the presence of irrelevant inputs. On the Mackey-Glass and Coupled Lattice Map problems, the speedup over sigmoid networks is so dramatic that the difference in training time between RBF and gaussian bar networks is minor. Gaussian bar architectures that superpose composed gaussians (gaussians-of-gaussians) to approximate the unknown function have the best performance. We postulate that an interesing behavior displayed by gaussian bar functions under gradient descent dynamics, which we call automatic connection pruning, is an important factor in the success of this representation.

Journal ArticleDOI
TL;DR: The network conjoins attributes of different objects, thus showing the phenomenon of illusory conjunctions, as in human vision, within the framework of a model of excitatory and inhibitory cell assemblies that form an oscillating neural network.
Abstract: We investigate binding within the framework of a model of excitatory and inhibitory cell assemblies that form an oscillating neural network. Our model is composed of two such networks that are connected through their inhibitory neurons. The excitatory cell assemblies represent memory patterns. The latter have different meanings in the two networks, representing two different attributes of an object, such as shape and color. The networks segment an input that contains mixtures of such pairs into staggered oscillations of the relevant activities. Moreover, the phases of the oscillating activities representing the two attributes in each pair lock with each other to demonstrate binding. The system works very well for two inputs, but displays faulty correlations when the number of objects is larger than two. In other words, the network conjoins attributes of different objects, thus showing the phenomenon of illusory conjunctions, as in human vision.

Journal ArticleDOI
TL;DR: The results suggest that the synchronized bursting observed between cortical neurons responding to coherent visual stimuli is a simple consequence of the principles of intracortical connectivity.
Abstract: We have used the morphology derived from single horseradish peroxidase-labeled neurons, known membrane conductance properties and microanatomy to construct a model neocortical network that exhibits synchronized bursting. The network was composed of interconnected pyramidal (excitatory) neurons with different intrinsic burst frequencies, and smooth (inhibitory) neurons that provided global feedback inhibition to all of the pyramids. When the network was activated by geniculocortical afferents the burst discharges of the pyramids quickly became synchronized with zero average phase-shift. The synchronization was strongly dependent on global feedback inhibition, which acted to group the coactivated bursts generated by intracortical reexcitation. Our results suggest that the synchronized bursting observed between cortical neurons responding to coherent visual stimuli is a simple consequence of the principles of intracortical connectivity.

Journal ArticleDOI
TL;DR: A feedback neural network whose elements possess dynamic thresholds has an oscillatory mode that is investigated by measuring the activities of memory patterns as functions of time and exhibits pattern segmentation, by oscillating between different memories that are included as a mixture in a constant input.
Abstract: We describe a feedback neural network whose elements possess dynamic thresholds. This network has an oscillatory mode that we investigate by measuring the activities of memory patterns as functions of time. We observe spontaneous and induced transitions between the different oscillating memories. Moreover, the network exhibits pattern segmentation, by oscillating between different memories that are included as a mixture in a constant input. The efficiency of pattern segmentation decreases strongly as the number of the input memories is increased. Using oscillatory inputs we observe resonance behavior.

Journal ArticleDOI
TL;DR: A local synaptic learning rule is described that can be used to remove the effects of certain types of systematic temporal variation in the inputs to a unit and will generate center-surround receptive fields that remove temporally varying linear gradients from the inputs.
Abstract: I describe a local synaptic learning rule that can be used to remove the effects of certain types of systematic temporal variation in the inputs to a unit According to this rule, changes in synaptic weight result from a conjunction of short-term temporal changes in the inputs and the output Formally, This is like the differential rule proposed by Klopf (1986) and Kosko (1986), except for a change of sign, which gives it an anti-Hebbian character By itself this rule is insufficient A weight conservation condition is needed to prevent the weights from collapsing to zero, and some further constraint implemented here by a biasing term to select particular sets of weights from the subspace of those which give minimal variation As an example, I show that this rule will generate center-surround receptive fields that remove temporally varying linear gradients from the inputs

Journal ArticleDOI
TL;DR: High-order models that use sigma-pi units are shown to be equivalent to the standard quadratic models with additional hidden units, and an algorithm to convert high-order networks to low-order ones is used to implement a satisfiability problem-solver on a connectionist network.
Abstract: Connectionist networks with symmetric weights (like Hopfield networks and Boltzmann Machines) use gradient descent to find a minimum for quadratic energy functions. We show equivalence between the problem of satisfiability in propositional calculus and the problem of minimizing those energy functions. The equivalence is in the sense that for any satisfiable well-formed formula (WFF) we can find a quadratic function that describes it, such that the set of solutions that minimizes the function is equal to the set of truth assignments that satisfy the WFF. We also show that in the same sense every quadratic energy function describes some satisfiable WFF. Algorithms are given to transform any propositional WFF into an energy function that describes it and vice versa. High-order models that use sigma-pi units are shown to be equivalent to the standard quadratic models with additional hidden units. An algorithm to convert high-order networks to low-order ones is used to implement a satisfiability problem-solver on a connectionist network. The results give better understanding of the role of hidden units and of the limitations and capabilities of symmetric connectionist models. The techniques developed for the satisfiability problem may be applied to a wide range of other problems, such as associative memories, finding maximal consistent subsets, automatic deduction, and even nonmonotonic reasoning.

Journal ArticleDOI
TL;DR: This work describes a biologically plausible solution to the variable binding problem and outlines how a knowledge representation and reasoning system can use this solution to perform a class of predictive inferences with extreme efficiency.
Abstract: A fundamental problem that must be addressed by connectionism is that of creating and representing dynamic structures (Feldman 1982; von der Malsburg 1985). In the context of reasoning with systematic and abstract knowledge, this problem takes the form of the variable binding problem. We describe a biologically plausible solution to this problem and outline how a knowledge representation and reasoning system can use this solution to perform a class of predictive inferences with extreme efficiency. The proposed system solves the variable binding problem by propagating rhythmic patterns of activity wherein dynamic bindings are represented as the synchronous firing of appropriate nodes.

Journal ArticleDOI
TL;DR: The main point of the elastic net algorithm is seen to be in the way one deals with the constraints when evaluating the effective cost function (free energy in the thermodynamic analogy), and not in its geometric foundation emphasized originally by Durbin and Willshaw.
Abstract: Some time ago Durbin and Willshaw proposed an interesting parallel algorithm (the “elastic net”) for approximately solving some geometric optimization problems, such as the Traveling Salesman Problem. Recently it has been shown that their algorithm is related to neural networks of Hopfield and Tank, and that they both can be understood as the semiclassical approximation to statistical mechanics of related physical models. The main point of the elastic net algorithm is seen to be in the way one deals with the constraints when evaluating the effective cost function (free energy in the thermodynamic analogy), and not in its geometric foundation emphasized originally by Durbin and Willshaw. As a consequence, the elastic net algorithm is a special case of the more general physically based computations and can be generalized to a large class of nongeometric problems. In this paper we further elaborate on this observation, and generalize the elastic net to the quadratic assignment problem. We work out in detail its special case, the graph matching problem, because it is an important problem with many applications in computational vision and neural modeling. Simulation results on random graphs, and on structured (hand-designed) graphs of moderate size (20-100 nodes) are discussed.

Journal ArticleDOI
TL;DR: A method for representing some context information so that the correct meaning for a word in a sentence can be selected and the development of more powerful context algorithms will be an important topic for future research.
Abstract: Representing and manipulating context information is one of the hardest problems in natural language processing. This paper proposes a method for representing some context information so that the correct meaning for a word in a sentence can be selected. The approach is primarily based on work by Waltz and Pollack (1985, 1984), who emphasized neutrally plausible systems. By contrast this paper focuses on computationally feasible methods applicable to full-scale natural language processing systems. There are two key elements: a collection of context vectors defined for every word used by a natural language processing system, and a context algorithm that computes a dynamic context vector at any position in a body of text. Once the dynamic context vector has been computed it is easy to choose among competing meanings for a word. This choice of definitions is essentially a neural network computation, and neural network learning algorithms should be able to improve such choices. Although context vectors do not represent all context information, their use should improve those full-scale systems that have avoided context as being too difficult to deal with. Good candidates for full-scale context vector implementations are machine translation systems and Japanese word processors. A main goal of this paper is to encourage such large-scale implementations and tests of context vector approaches. A variety of interesting directions for research in natural language processing and machine learning will be possible once a full set of context vectors has been created. In particular the development of more powerful context algorithms will be an important topic for future research.

Journal ArticleDOI
TL;DR: It is shown that a form of synaptic plasticity recently discovered in slices of the rat visual cortex can support an error-correcting learning rule and that this rule performs better than the optimal Hebbian learning rule reported by Willshaw and Dayan (1990).
Abstract: We show that a form of synaptic plasticity recently discovered in slices of the rat visual cortex (Artola et al. 1990) can support an error-correcting learning rule. The rule increases weights when both pre- and postsynaptic units are highly active, and decreases them when pre-synaptic activity is high and postsynaptic activation is less than the threshold for weight increment but greater than a lower threshold. We show that this rule corrects false positive outputs in feedforward associative memory, that in an appropriate opponent-unit architecture it corrects misses, and that it performs better than the optimal Hebbian learning rule reported by Willshaw and Dayan (1990).

Journal ArticleDOI
TL;DR: The results suggest that for a network with a single layer of hidden sigmoidal nodes, the accuracy of a functional representation is reduced as the nonlinearity of the function increases.
Abstract: A matrix method is described that optimizes the set of weights and biases for the output side of a network with a single hidden layer of neurons, given any set of weights and biases for the input side of the hidden layer. All the input patterns are included in a single optimization cycle. A simple iterative minimization procedure is used to optimize the weights and biases on the input side of the hidden layer. Many test problems have been solved, confirming the validity of the method. The results suggest that for a network with a single layer of hidden sigmoidal nodes, the accuracy of a functional representation is reduced as the nonlinearity of the function increases.

Journal ArticleDOI
TL;DR: The CL approach provides a general unified framework for developing new learning algorithms and shows that many different types of clamping and teacher signals are possible, as well as examining two extensions of contrastive learning to time-dependent trajectories.
Abstract: The concept of Contrastive Learning (CL) is developed as a family of possible learning algorithms for neural networks. CL is an extension of Deterministic Boltzmann Machines to more general dynamical systems. During learning, the network oscillates between two phases. One phase has a teacher signal and one phase has no teacher signal. The weights are updated using a learning rule that corresponds to gradient descent on a contrast function that measures the discrepancy between the free network and the network with a teacher signal. The CL approach provides a general unified framework for developing new learning algorithms. It also shows that many different types of clamping and teacher signals are possible. Several examples are given and an analysis of the landscape of the contrast function is proposed with some relevant predictions for the CL curves. An approach that may be suitable for collective analog implementations is described. Simulation results and possible extensions are briefly discussed together with a new conjecture regarding the function of certain oscillations in the brain. In the appendix, we also examine two extensions of contrastive learning to time-dependent trajectories.

Journal ArticleDOI
TL;DR: The pyloric network of the stomatogastric ganglion in crustacea is a central pattern generator that can produce the same basic rhythm over a wide frequency range and Physiological and modeling studies indicate that the PD neurons play an important role in regulating the duration of the bursts produced by the pacemaker unit.
Abstract: The pyloric network of the stomatogastric ganglion in crustacea is a central pattern generator that can produce the same basic rhythm over a wide frequency range. Three electrically coupled neurons, the anterior burster (AB) neuron and two pyloric dilator (PD) neurons, act as a pacemaker unit for the pyloric network. The functional characteristics of the pacemaker network are the result of electrical coupling between neurons with quite different intrinsic properties, each contributing a basic feature to the complete circuit. The AB neuron, a conditional oscillator, plays a dominant role in rhythm generation. In the work described here, we manipulate the frequency of the AB neuron both isolated and electrically coupled to the PD neurons. Physiological and modeling studies indicate that the PD neurons play an important role in regulating the duration of the bursts produced by the pacemaker unit.

Journal ArticleDOI
TL;DR: A realistic neural model is developed, based on structural features of visual cortex, which replicates observed oscillatory phenomena and suggests that the phase and frequency of cortical oscillations may reflect the coordination of general computational processes within and between cortical areas.
Abstract: Periodic variations in correlated cellular activity have been observed in many regions of the cerebral cortex. The recent discovery of stimulus-dependent, spatially-coherent oscillations in primary visual cortex of the cat has led to suggestions of neural information encoding schemes based on phase and/or frequency variation. To explore the mechanisms underlying this behavior and their possible functional consequences, we have developed a realistic neural model, based on structural features of visual cortex, which replicates observed oscillatory phenomena. In the model, this oscillatory behavior emerges directly from the structure of the cortical network and the properties of its intrinsic neurons; however, phase coherence is shown to be an average phenomenon seen only when measurements are made over multiple trials. Because average coherence does not ensure synchrony of firing over the course of single stimuli, oscillatory phase may not be a robust strategy for directly encoding stimulus-specific information. Instead, the phase and frequency of cortical oscillations may reflect the coordination of general computational processes within and between cortical areas. Under this interpretation, coherence emerges as a result of horizontal interactions that could be involved in the formation of receptive field properties.

Journal ArticleDOI
TL;DR: A new algorithm for approximating continuous functions in high-dimensional input spaces and an example that predicts future values of the Mackey-Glass differential delay equation are presented.
Abstract: I describe a new algorithm for approximating continuous functions in high-dimensional input spaces. The algorithm builds a tree-structured network of variable size, which is determined both by the distribution of the input data and by the function to be approximated. Unlike other tree-structured algorithms, learning occurs through completely local mechanisms and the weights and structure are modified incrementally as data arrives. Efficient computation in the tree structure takes advantage of the potential for low-order dependencies between the output and the individual dimensions of the input. This algorithm is related to the ideas behind k-d trees (Bentley 1975), CART (Breiman et al. 1984), and MARS (Friedman 1988). I present an example that predicts future values of the Mackey-Glass differential delay equation.

Journal ArticleDOI
TL;DR: It is shown how to learn a hint and how to incorporate it into the learning algorithm, and modifications in the net structure and its operation are suggested, which allow for a better generalization.
Abstract: The aim of a neural net is to partition the data space into near optimal decision regions. Learning such a partitioning solely from examples has proven to be a very hard problem (Blum and Rivest 1988; Judd 1988). To remedy this, we use the idea of supplying hints to the network as discussed by Abu-Mostafa (1990). Hints reduce the solution space, and as a consequence speed up the learning process. The minimum Hamming distance between the patterns serves as the hint. Next, it is shown how to learn such a hint and how to incorporate it into the learning algorithm. Modifications in the net structure and its operation are suggested, which allow for a better generalization. The sensitivity to errors in such a hint is studied through some simulations.