scispace - formally typeset
Open Access

Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization

Raymond L. Watrous
- Vol. 2, pp 619-627
Reads0
Chats0
TLDR
It is shown that in plateau regions of relatively constant gradient, the momentum term acts to increase the step size by a factor of 1/1-μ, where μ is the momentumTerm, and in valley regions with steep sides,The momentum constant acts to focus the search direction toward the local minimum by averaging oscillations in the gradient.
Abstract
The problem of learning using connectionist networks, in which network connection strengths are modified systematically so that the response of the network increasingly approximates the desired response can be structured as an optimization problem. The widely used back propagation method of connectionist learning [19, 21, 18] is set in the context of nonlinear optimization. In this framework, the issues of stability, convergence and parallelism are considered. As a form of gradient descent with fixed step size, back propagation is known to be unstable, which is illustrated using Rosenbrock's function. This is contrasted with stable methods which involve a line search in the gradient direction. The convergence criterion for connectionist problems involving binary functions is discussed relative to the behavior of gradient descent in the vicinity of local minima. A minimax criterion is compared with the least squares criterion. The contribution of the momentum term [19, 18] to more rapid convergence is interpreted relative to the geometry of the weight space. It is shown that in plateau regions of relatively constant gradient, the momentum term acts to increase the step size by a factor of 1/1-μ, where μ is the momentum term. In valley regions with steep sides, the momentum constant acts to focus the search direction toward the local minimum by averaging oscillations in the gradient. Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-88-62. This technical report is available at ScholarlyCommons: http://repository.upenn.edu/cis_reports/597 LEARNING ALGORITHMS FOR CONNECTIONIST NETWORKS: APPLIED GRADIENT METHODS OF NONLINEAR OPTIMIZATION

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

ANFIS: adaptive-network-based fuzzy inference system

TL;DR: The architecture and learning procedure underlying ANFIS (adaptive-network-based fuzzy inference system) is presented, which is a fuzzy inference System implemented in the framework of adaptive networks.
Journal ArticleDOI

Original Contribution: A scaled conjugate gradient algorithm for fast supervised learning

TL;DR: Experiments show that SCG is considerably faster than BP, CGL, and BFGS, and avoids a time consuming line search.
Proceedings Article

The Cascade-Correlation Learning Architecture

TL;DR: The Cascade-Correlation architecture has several advantages over existing algorithms: it learns very quickly, the network determines its own size and topology, it retains the structures it has built even if the training set changes, and it requires no back-propagation of error signals through the connections of the network.
Journal ArticleDOI

Fuzzy logic systems for engineering: a tutorial

TL;DR: After synthesizing a FLS, it is demonstrated that it can be expressed mathematically as a linear combination of fuzzy basis functions, and is a nonlinear universal function approximator, a property that it shares with feedforward neural networks.
Book ChapterDOI

Theory of the backpropagation neural network

TL;DR: A speculative neurophysiological model illustrating how the backpropagation neural network architecture might plausibly be implemented in the mammalian brain for corticocortical learning between nearby regions of the cerebral cortex is presented.
References
More filters
Journal ArticleDOI

A simplex method for function minimization

TL;DR: A method is described for the minimization of a function of n variables, which depends on the comparison of function values at the (n 41) vertices of a general simplex, followed by the replacement of the vertex with the highest value by another point.
Book ChapterDOI

Learning internal representations by error propagation

TL;DR: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion.
Journal ArticleDOI

Neural networks and physical systems with emergent collective computational abilities

TL;DR: A model of a system having a large number of simple equivalent components, based on aspects of neurobiology but readily adapted to integrated circuits, produces a content-addressable memory which correctly yields an entire memory from any subpart of sufficient size.
Book

Learning internal representations by error propagation

TL;DR: In this paper, the problem of the generalized delta rule is discussed and the Generalized Delta Rule is applied to the simulation results of simulation results in terms of the generalized delta rule.