scispace - formally typeset
Search or ask a question

Showing papers on "Backpropagation published in 1992"


Journal ArticleDOI
TL;DR: This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.
Abstract: This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.

7,930 citations


Journal ArticleDOI
TL;DR: A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks that automatically embodies "Occam's razor," penalizing overflexible and overcomplex models.
Abstract: A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible (1) objective comparisons between solutions using alternative network architectures, (2) objective stopping rules for network pruning or growing procedures, (3) objective choice of magnitude and type of weight decay terms or additive regularizers (for penalizing large weights, etc.), (4) a measure of the effective number of well-determined parameters in a model, (5) quantified estimates of the error bars on network parameters and on network output, and (6) objective comparisons with alternative learning and interpolation models such as splines and radial basis functions. The Bayesian "evidence" automatically embodies "Occam's razor," penalizing overflexible and overcomplex models. The Bayesian approach helps detect poor underlying assumptions in learning models. For learning models well matched to a problem, a good correlation between generalization ability and the Bayesian evidence is obtained.

2,906 citations


Proceedings Article
30 Nov 1992
TL;DR: Of OBS, Optimal Brain Damage, and magnitude-based methods, only OBS deletes the correct weights from a trained XOR network in every case, and thus yields better generalization on test data.
Abstract: We investigate the use of information from all second order derivatives of the error function to perform network pruning (i.e., removing unimportant weights from a trained network) in order to improve generalization, simplify networks, reduce hardware or storage requirements, increase the speed of further training, and in some cases enable rule extraction. Our method, Optimal Brain Surgeon (OBS), is Significantly better than magnitude-based methods and Optimal Brain Damage [Le Cun, Denker and Solla, 1990], which often remove the wrong weights. OBS permits the pruning of more weights than other methods (for the same error on the training set), and thus yields better generalization on test data. Crucial to OBS is a recursion relation for calculating the inverse Hessian matrix H-1 from training data and structural information of the net. OBS permits a 90%, a 76%, and a 62% reduction in weights over backpropagation with weight decay on three benchmark MONK's problems [Thrun et al., 1991]. Of OBS, Optimal Brain Damage, and magnitude-based methods, only OBS deletes the correct weights from a trained XOR network in every case. Finally, whereas Sejnowski and Rosenberg [1987] used 18,000 weights in their NETtalk network, we used OBS to prune a network to just 1560 weights, yielding better generalization.

1,785 citations


Book ChapterDOI
01 Jan 1992
TL;DR: A speculative neurophysiological model illustrating how the backpropagation neural network architecture might plausibly be implemented in the mammalian brain for corticocortical learning between nearby regions of the cerebral cortex is presented.
Abstract: Publisher Summary This chapter presents a survey of the elementary theory of the basic backpropagation neural network architecture, covering the areas of architectural design, performance measurement, function approximation capability, and learning. The survey includes a formulation of the backpropagation neural network architecture to make it a valid neural network and a proof that the backpropagation mean squared error function exists and is differentiable. Also included in the survey is a theorem showing that any L2 function can be implemented to any desired degree of accuracy with a three-layer backpropagation neural network. An appendix presents a speculative neurophysiological model illustrating the way in which the backpropagation neural network architecture might plausibly be implemented in the mammalian brain for corticocortical learning between nearby regions of cerebral cortex. One of the crucial decisions in the design of the backpropagation architecture is the selection of a sigmoidal activation function.

1,729 citations


Journal ArticleDOI
TL;DR: In this paper, the authors focus on the promise of artificial neural networks in the realm of modelling, identification and control of nonlinear systems and explore the links between the fields of control science and neural networks.

1,721 citations


Journal ArticleDOI
TL;DR: First- and second-order optimization methods for learning in feedforward neural networks are reviewed to illustrate the main characteristics of the different methods and their mutual relations.
Abstract: On-line first-order backpropagation is sufficiently fast and effective for many large-scale classification problems but for very high precision mappings, batch processing may be the method of choice. This paper reviews first- and second-order optimization methods for learning in feedforward neural networks. The viewpoint is that of optimization: many methods can be cast in the language of optimization techniques, allowing the transfer to neural nets of detailed results about computational complexity and safety procedures to ensure convergence and to avoid numerical problems. The review is not intended to deliver detailed prescriptions for the most appropriate methods in specific applications, but to illustrate the main characteristics of the different methods and their mutual relations.

1,218 citations


Journal ArticleDOI
TL;DR: A fuzzy neural network model based on the multilayer perceptron, using the backpropagation algorithm, and capable of fuzzy classification of patterns is described, and the results are compared with those of the conventional MLP, the Bayes classifier, and other related models.
Abstract: A fuzzy neural network model based on the multilayer perceptron, using the backpropagation algorithm, and capable of fuzzy classification of patterns is described. The input vector consists of membership values to linguistic properties while the output vector is defined in terms of fuzzy class membership values. This allows efficient modeling of fuzzy uncertain patterns with appropriate weights being assigned to the backpropagated errors depending upon the membership values at the corresponding outputs. During training, the learning rate is gradually decreased in discrete steps until the network converges to a minimum error solution. The effectiveness of the algorithm is demonstrated on a speech recognition problem. The results are compared with those of the conventional MLP, the Bayes classifier, and other related models. >

1,031 citations


Journal ArticleDOI
TL;DR: A fuzzy modeling method using fuzzy neural networks with the backpropagation algorithm is presented that can identify the fuzzy model of a nonlinear system automatically.
Abstract: A fuzzy modeling method using fuzzy neural networks with the backpropagation algorithm is presented. The method can identify the fuzzy model of a nonlinear system automatically. The feasibility of the method is examined using simple numerical data. >

894 citations


Journal ArticleDOI
TL;DR: A neural network is developed to forecast rainfall intensity fields in space and time using a three-layer learning network with input, hidden, and output layers and is shown to perform well when a relatively large number of hidden nodes are utilized.

675 citations


Journal ArticleDOI
TL;DR: A theoretical framework for backpropagation (BP) is proposed and it is proven in particular that the convergence holds if the classes are linearly separable and that multilayered neural networks (MLNs) exceed perceptrons in generalization to new examples.
Abstract: The authors propose a theoretical framework for backpropagation (BP) in order to identify some of its limitations as a general learning procedure and the reasons for its success in several experiments on pattern recognition. The first important conclusion is that examples can be found in which BP gets stuck in local minima. A simple example in which BP can get stuck during gradient descent without having learned the entire training set is presented. This example guarantees the existence of a solution with null cost. Some conditions on the network architecture and the learning environment that ensure the convergence of the BP algorithm are proposed. It is proven in particular that the convergence holds if the classes are linearly separable. In this case, the experience gained in several experiments shows that multilayered neural networks (MLNs) exceed perceptrons in generalization to new examples. >

659 citations


Proceedings ArticleDOI
08 Mar 1992
TL;DR: The authors develop a training algorithm, similar to the backpropagation algorithm for neural networks, to train fuzzy systems to match desired input-output pairs and demonstrate how the fuzzy system learns to match an unknown nonlinear mapping as training progresses and that performance is improved by incorporating linguistic rules.
Abstract: The authors develop a training algorithm, similar to the backpropagation algorithm for neural networks, to train fuzzy systems to match desired input-output pairs. The key ideas in developing this training algorithm are to view a fuzzy system as a three-layer feedforward network, and to use the chain rule to determine gradients of the output errors of the fuzzy system with respect to its design parameters. It is shown that this training algorithm performs an error backpropagation procedure: hence, the fuzzy system equipped with the backpropagation training algorithm is called the backpropagation fuzzy system (BP FS). An online initial parameter choosing method is proposed for the BP FS, and it is shown that it is straightforward to incorporate linguistic if-then rules into the BP FS. Two examples are presented which demonstrate (1) how the fuzzy system learns to match an unknown nonlinear mapping as training progresses and (2) that performance is improved by incorporating linguistic rules. >

Journal ArticleDOI
TL;DR: Results show that the network approach is more effective than competitive linear techniques in this application of autoassociative networks used to preprocess data so that sensor-based calculations can be performed correctly even in the presence of large sensor biases and failures.

Journal ArticleDOI
TL;DR: A recursive algorithm for updating the coefficients of a neural network structure for complex signals is presented and the method yields the complex form of the conventional backpropagation algorithm.
Abstract: A recursive algorithm for updating the coefficients of a neural network structure for complex signals is presented. Various complex activation functions are considered and a practical definition is proposed. The method, associated to a mean-square-error criterion, yields the complex form of the conventional backpropagation algorithm. >

Journal ArticleDOI
07 Sep 1992
TL;DR: Gelenbe et al. as mentioned in this paper presented a learning algorithm for the recurrent random network model using gradient descent of a quadratic error function, which requires the solution of a system of n linear and n nonlinear equations each time the n-neuron network "learns" a new input-output pair.
Abstract: The capacity to learn from examples is one of the most desirable features of neural network models. We present a learning algorithm for the recurrent random network model (Gelenbe 1989, 1990) using gradient descent of a quadratic error function. The analytical properties of the model lead to a "backpropagation" type algorithm that requires the solution of a system of n linear and n nonlinear equations each time the n-neuron network "learns" a new input-output pair.

Journal ArticleDOI
TL;DR: It is not proved that the introduction of additive noise to the training vectors always improves network generalization, but the analysis suggests mathematically justified rules for choosing the characteristics of noise if additive noise is used in training.
Abstract: The possibility of improving the generalization capability of a neural network by introducing additive noise to the training samples is discussed. The network considered is a feedforward layered neural network trained with the back-propagation algorithm. Back-propagation training is viewed as nonlinear least-squares regression and the additive noise is interpreted as generating a kernel estimate of the probability density that describes the training vector distribution. Two specific application types are considered: pattern classifier networks and estimation of a nonstochastic mapping from data corrupted by measurement errors. It is not proved that the introduction of additive noise to the training vectors always improves network generalization. However, the analysis suggests mathematically justified rules for choosing the characteristics of noise if additive noise is used in training. Results of mathematical statistics are used to establish various asymptotic consistency results for the proposed method. Numerical simulations support the applicability of the training method. >

Journal ArticleDOI
TL;DR: In this article, the backpropagation algorithm is extended to complex domain back-propagations, which can be used to train neural networks for which the inputs, weights, activation functions, and outputs are complex-valued.
Abstract: The backpropagation algorithm is extended to complex domain backpropagation (CDBP) which can be used to train neural networks for which the inputs, weights, activation functions, and outputs are complex-valued Previous derivations of CDBP were necessarily admitting activation functions that have singularities, which is highly undesirable In the derivation, CDBP is derived so that that it accommodates classes of suitable activation functions One such function is found and the circuit implementation of the corresponding neuron is given CDBP hardware circuits can be used to process sinusoidal signals all at the same frequency (phasors) >

Book ChapterDOI
08 Mar 1992
TL;DR: The authors propose a learning method for fuzzy inference rules by a descent method that has the capability to express the knowledge acquired from input-output data in the form of fuzzy inferencerules.
Abstract: The authors propose a learning method for fuzzy inference rules by a descent method. From input-output data gathered from specialists, the inference rules expressing the input-output relation of the data are obtained automatically. The membership functions in the antecedent part and the real number in the consequent part of the inference rules are tuned by means of the descent method. The learning speed and the generalization capability of this method are higher than those of a conventional backpropagation type neural network. This method has the capability to express the knowledge acquired from input-output data in the form of fuzzy inference rules. Some numerical examples are described to show these advantages over the conventional neural network. An application of the method to a mobile robot that avoids a moving obstacle and its computer simulation are reported. >

Journal ArticleDOI
TL;DR: The fuzzy systems performed well until over 50% of their fuzzy-associative-memory (FAM) rules were removed, and they also performed well when the key FAM equilibration rule was replaced with destructive, or ;sabotage', rules.
Abstract: Fuzzy control systems and neural-network control systems for backing up a simulated truck, and truck-and-trailer, to a loading dock in a parking lot are presented. The supervised backpropagation learning algorithm trained the neural network systems. The robustness of the neural systems was tested by removing random subsets of training data in learning sequences. The neural systems performed well but required extensive computation for training. The fuzzy systems performed well until over 50% of their fuzzy-associative-memory (FAM) rules were removed. They also performed well when the key FAM equilibration rule was replaced with destructive, or 'sabotage', rules. Unsupervised differential competitive learning (DCL) and product-space clustering adaptively generated FAM rules from training data. The original fuzzy control systems and neural control systems generated trajectory data. The DCL system rapidly recovered the underlying FAM rules. Product-space clustering converted the neural truck systems into structured sets of FAM rules that approximated the neural system's behavior. >

Journal ArticleDOI
H. Drucker1, Y. Le Cun
TL;DR: It is shown that double backpropagation, as compared to backpropAGation, creates weights that are smaller, thereby causing the output of the neurons to spend more time in the linear region.
Abstract: In order to generalize from a training set to a test set, it is desirable that small changes in the input space of a pattern do not change the output components. This can be done by forcing this behavior as part of the training algorithm. This is done in double backpropagation by forming an energy function that is the sum of the normal energy term found in backpropagation and an additional term that is a function of the Jacobian. Significant improvement is shown with different architectures and different test sets, especially with architectures that had previously been shown to have very good performance when trained using backpropagation. It is shown that double backpropagation, as compared to backpropagation, creates weights that are smaller, thereby causing the output of the neurons to spend more time in the linear region. >

Journal ArticleDOI
TL;DR: The gamma neural model as mentioned in this paper is a neural network architecture for processing temporal patterns, where only current signal values are presented to the neural net, which adapts its own internal memory to store the past.

Journal ArticleDOI
TL;DR: A fast new algorithm is presented for training multilayer perceptrons as an alternative to the back-propagation algorithm that reduces the required training time considerably and overcomes many of the shortcomings presented by the conventional back- Propagation algorithms.
Abstract: A fast algorithm is presented for training multilayer perceptrons as an alternative to the backpropagation algorithm. The number of iterations required by the new algorithm to converge is less than 20% of what is required by the backpropagation algorithm. Also, it is less affected by the choice of initial weights and setup parameters. The algorithm uses a modified form of the backpropagation algorithm to minimize the mean-squared error between the desired and actual outputs with respect to the inputs to the nonlinearities. This is in contrast to the standard algorithm which minimizes the mean-squared error with respect to the weights. Error signals, generated by the modified backpropagation algorithm, are used to estimate the inputs to the nonlinearities, which along with the input vectors to the respective nodes, are used to produce an updated set of weights through a system of linear equations at each node. These systems of linear equations are solved using a Kalman filter at each layer. >

Proceedings ArticleDOI
31 Aug 1992
TL;DR: The authors propose a new methodology for creating the first automatically adapting learning rates that achieve the optimal rate of convergence for stochastic gradient descent, which agrees with theoretical expectations that drift can be used to determine whether the crucial parameter c is large enough.
Abstract: The authors propose a new methodology for creating the first automatically adapting learning rates that achieve the optimal rate of convergence for stochastic gradient descent. Empirical tests agree with theoretical expectations that drift can be used to determine whether the crucial parameter c is large enough. Using this statistic, it will be possible to produce the first adaptive learning rates which converge at optimal speed. >

Journal ArticleDOI
TL;DR: This paper applies optimal filtering techniques to train feedforward networks in the standard supervised learning framework, and presents three algorithms which are computationally more expensive than standard back propagation, but local at the neuron level.

Proceedings ArticleDOI
06 Jun 1992
TL;DR: The empirical studies show that the SGA can efficiently determine the network size and topology along with the optimal set of connection weights appropriate for desired tasks, without using backpropagation or any other learning algorithm.
Abstract: Presents a different type of genetic algorithm called the structured genetic algorithm (SGA) for the design of application-specific neural networks. The novelty of this new genetic approach is that it can determine the network structures and their weights solely by an evolutionary process. This is made possible for the SGA primarily due to its redundant genetic material and a gene activation mechanism which in combination provide a multi-layered structure to the chromosome. The authors focus on the use of this learning algorithm for automatic generation of a complete application specific neural network. With this approach, no a priori assumptions about topology are needed and the only information required is the input and output characteristics of the task. The empirical studies show that the SGA can efficiently determine the network size and topology along with the optimal set of connection weights appropriate for desired tasks, without using backpropagation or any other learning algorithm. >

Journal ArticleDOI
TL;DR: This paper presents an extended backpropagation algorithm that allows all elements of the Hessian matrix to be evaluated exactly for a feedforward network of arbitrary topology.
Abstract: The elements of the Hessian matrix consist of the second derivatives of the error measure with respect to the weights and thresholds in the network They are needed in Bayesian estimation of network regularization parameters, for estimation of error bars on the network outputs, for network pruning algorithms, and for fast retraining of the network following a small change in the training data In this paper we present an extended backpropagation algorithm that allows all elements of the Hessian matrix to be evaluated exactly for a feedforward network of arbitrary topology Software implementation of the algorithm is straightforward

Proceedings ArticleDOI
07 Jun 1992
TL;DR: A neural network system developed for forecasting stock prices in the Japanese market that combines the modified BP (backpropagation) method with the random optimization method is presented.
Abstract: A neural network system developed for forecasting stock prices in the Japanese market is presented. The hybrid algorithm, which combines the modified BP (backpropagation) method with the random optimization method, has been used for training the parameters in the neural network. It has been shown by several simulation results that this neural network system is quite helpful for making a good forecast of stock prices. >

Journal ArticleDOI
TL;DR: It is shown that the neural network behaves in the problem as a Bayesian classifier, i.e. it assigns the a posteriori probability for each of the five classes considered in the catalogue, and the network highest probability choice agrees with the catalogue classification.
Abstract: We explore a method for automatic morphological classification of galaxies by an Artificial Neural Network algorithm. The method is illustrated using 13 galaxy parameters measured by machine (ESO-LV), and classified into five types (E, SO, Sa + Sb, Sc + Sd and Irr). A simple Backpropagation algorithm allows us to train a network on a subset of the catalogue according to human classification, and then to predict, using the measured parameters, the classification for the rest of the catalogue. We show that the neural network behaves in our problem as a Bayesian classifier, i.e. it assigns the a posteriori probability for each of the five classes considered. The network highest probability choice agrees with the catalogue classification for 64 per cent of the galaxies. If either the first or the second highest probability choice of the network is considered, the success rate is 90 per cent. The technique allows uniform and more objective classification of very large extragalactic data sets.

Journal ArticleDOI
TL;DR: The authors develop back-propagation learning for acyclic, event-driven networks in general and derive a specific algorithm for learning in EMYCIN-derived expert networks, which offers automation of the knowledge acquisition task for certainty factors, often the most difficult part of knowledge extraction.
Abstract: Expert networks are event-driven, acyclic networks of neural objects derived from expert systems. The neural objects process information through a nonlinear combining function that is different from, and more complex than, typical neural network node processors. The authors develop back-propagation learning for acyclic, event-driven networks in general and derive a specific algorithm for learning in EMYCIN-derived expert networks. The algorithm combines back-propagation learning with other features of expert networks, including calculation of gradients of the nonlinear combining functions and the hypercube nature of the knowledge space. It offers automation of the knowledge acquisition task for certainty factors, often the most difficult part of knowledge extraction. Results of testing the learning algorithm with a medium-scale (97-node) expert network are presented. >

Journal ArticleDOI
TL;DR: The report shows that the backpropagation neural network can learn a functional mapping between input and output based on a set of training examples and demonstrated for both static single-variable and multivariable systems and for linear and nonlinear dynamic systems.

Journal ArticleDOI
TL;DR: This paper proposes simple but powerful methods for fuzzy regression analysis using neural networks and shows two methods for deriving nonlinear fuzzy models from the interval model determined by the proposed algorithms.