scispace - formally typeset
Search or ask a question

Showing papers on "Recurrent neural network published in 2000"


Journal ArticleDOI
TL;DR: This work identifies a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset, and proposes a novel, adaptive forget gate that enables an LSTm cell to learn to reset itself at appropriate times, thus releasing internal resources.
Abstract: Long short-term memory (LSTM; Hochreiter & Schmidhuber, 1997) can solve numerous tasks not solvable by previous learning algorithms for recurrent neural networks (RNNs). We identify a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset. Without resets, the state may grow indefinitely and eventually cause the network to break down. Our remedy is a novel, adaptive "forget gate" that enables an LSTM cell to learn to reset itself at appropriate times, thus releasing internal resources. We review illustrative benchmark problems on which standard LSTM outperforms other RNN algorithms. All algorithms (including LSTM) fail to solve continual versions of these problems. LSTM with forget gates, however, easily solves them, and in an elegant way.

3,135 citations


BookDOI
01 Jan 2000
TL;DR: This chapter discusses Neural-Network-based Control, a method for automating the design and execution of nonlinear control systems, and its application to Predictive Control.
Abstract: 1. Introduction.- 1.1 Background.- 1.1.1 Inferring Models and Controllers from Data.- 1.1.2 Why Use Neural Networks?.- 1.2 Introduction to Multilayer Perceptron Networks.- 1.2.1 The Neuron.- 1.2.2 The Multilayer Perceptron.- 1.2.3 Choice of Neural Network Architecture.- 1.2.4 Models of Dynamic Systems.- 1.2.5 Recurrent Networks.- 1.2.6 Other Neural Network Architectures.- 2. System Identification with Neural Networks.- 2.1 Introduction to System Identification.- 2.1.1 The Procedure.- 2.2 Model Structure Selection.- 2.2.1 Some Linear Model Structures.- 2.2.2 Nonlinear Model Structures Based on Neural Networks.- 2.2.3 A Few Remarks on Stability.- 2.2.4 Terminology.- 2.2.5 Selecting the Lag Space.- 2.2.6 Section Summary.- 2.3 Experiment.- 2.3.1 When is a Linear Model Insufficient?.- 2.3.2 Issues in Experiment Design.- 2.3.3 Preparing the Data for Modelling.- 2.3.4 Section Summary.- 2.4 Determination of the Weights.- 2.4.1 The Prediction Error Method.- 2.4.2 Regularization and the Concept of Generalization.- 2.4.3 Remarks on Implementation.- 2.4.4 Section Summary.- 2.5 Validation.- 2.5.1 Looking for Correlations.- 2.5.2 Estimation of the Average Generalization Error.- 2.5.3 Visualization of the Predictions.- 2.5.4 Section Summary.- 2.6 Going Backwards in the Procedure.- 2.6.1 Training the Network Again.- 2.6.2 Finding the Optimal Network Architecture.- 2.6.3 Redoing the Experiment.- 2.6.4 Section Summary.- 2.7 Recapitulation of System Identification.- 3. Control with Neural Networks.- 3.1 Introduction to Neural-Network-based Control.- 3.1.1 The Benchmark System.- 3.2 Direct Inverse Control.- 3.2.1 General Training.- 3.2.2 Direct Inverse Control of the Benchmark System.- 3.2.3 Specialized Training.- 3.2.4 Specialized Training and Direct Inverse Control of the Benchmark System.- 3.2.5 Section Summary.- 3.3 Internal Model Control (IMC).- 3.3.1 Internal Model Control with Neural Networks.- 3.3.2 Section Summary.- 3.4 Feedback Linearization.- 3.4.1 The Basic Principle of Feedback Linearization.- 3.4.2 Feedback Linearization Using Neural Network Models..- 3.4.3 Feedback Linearization of the Benchmark System.- 3.4.4 Section Summary.- 3.5 Feedforward Control.- 3.5.1 Feedforward for Optimizing an Existing Control System.- 3.5.2 Feedforward Control of the Benchmark System.- 3.5.3 Section Summary.- 3.6 Optimal Control.- 3.6.1 Training of an Optimal Controller.- 3.6.2 Optimal Control of the Benchmark System.- 3.6.3 Section Summary.- 3.7 Controllers Based on Instantaneous Linearization.- 3.7.1 Instantaneous Linearization.- 3.7.2 Applying Instantaneous Linearization to Control.- 3.7.3 Approximate Pole Placement Design.- 3.7.4 Pole Placement Control of the Benchmark System.- 3.7.5 Approximate Minimum Variance Design.- 3.7.6 Section Summary.- 3.8 Predictive Control.- 3.8.1 Nonlinear Predictive Control (NPC).- 3.8.2 NPC Applied to the Benchmark System.- 3.8.3 Approximate Predictive Control (APC).- 3.8.4 APC applied to the Benchmark System.- 3.8.5 Extensions to the Predictive Controller.- 3.8.6 Section Summary.- 3.9 Recapitulation of Control Design Methods.- 4. Case Studies.- 4.1 The Sunspot Benchmark.- 4.1.1 Modelling with a Fully Connected Network.- 4.1.2 Pruning of the Network Architecture.- 4.1.3 Section Summary.- 4.2 Modelling of a Hydraulic Actuator.- 4.2.1 Estimation of a Linear Model.- 4.2.2 Neural Network Modelling of the Actuator.- 4.2.3 Section Summary.- 4.3 Pneumatic Servomechanism.- 4.3.1 Identification of the Pneumatic Servomechanism.- 4.3.2 Nonlinear Predictive Control of the Servo.- 4.3.3 Approximate Predictive Control of the Servo.- 4.3.4 Section Summary.- 4.4 Control of Water Level in a Conic Tank.- 4.4.1 Linear Analysis and Control.- 4.4.2 Direct Inverse Control of the Water Level.- 4.4.3 Section Summary.- References.

889 citations


Proceedings ArticleDOI
26 Jan 2000
TL;DR: Surprisingly, LSTM augmented by "peephole connections" from its internal cells to its multiplicative gates can learn the fine distinction between sequences of spikes separated by either 50 or 49 discrete time steps, without the help of any short training exemplars.
Abstract: The size of the time intervals between events conveys information essential for numerous sequential tasks such as motor control and rhythm detection. While hidden Markov models tend to ignore this information, recurrent neural networks (RNN) can in principle learn to make use of it. We focus on long short-term memory (LSTM) because it usually outperforms other RNN. Surprisingly, LSTM augmented by "peephole connections" from its internal cells to its multiplicative gates can learn the fine distinction between sequences of spikes separated by either 50 or 49 discrete time steps, without the help of any short training exemplars. Without external resets or teacher forcing or loss of performance on tasks reported earlier, our LSTM variant also learns to generate very stable sequences of highly nonlinear, precisely timed spikes. This makes LSTM a promising approach for real-world tasks that require to time and count.

613 citations


Journal ArticleDOI
TL;DR: A comparison of various forecasting approaches, using time series analysis, on mean hourly wind speed data, including the traditional linear (ARMA) models and the commonly used feed forward and recurrent neural networks is presented.

355 citations


Journal ArticleDOI
TL;DR: This paper introduces SVM's within the context of recurrent neural networks and considers a least squares version of Vapnik's epsilon insensitive loss function related to a cost function with equality constraints for a recurrent network.
Abstract: The method of support vector machines (SVM's) has been developed for solving classification and static function approximation problems. In this paper we introduce SVM's within the context of recurrent neural networks. Instead of Vapnik's epsilon insensitive loss function, we consider a least squares version related to a cost function with equality constraints for a recurrent network. Essential features of SVM's remain, such as Mercer's condition and the fact that the output weights are a Lagrange multiplier weighted sum of the data points. The solution to recurrent least squares (LS-SVM's) is characterized by a set of nonlinear equations. Due to its high computational complexity, we focus on a limited case of assigning the squared error an infinitely large penalty factor with early stopping as a form of regularization. The effectiveness of the approach is demonstrated on trajectory learning of the double scroll attractor in Chua's circuit.

320 citations


Book
19 Oct 2000
TL;DR: The inner workings of the human mind, consciousness, language-mind relationships, learning, and emotions are explored mathematically in amazing details in Neural Networks and Intellect: Using Model-Based Concepts.
Abstract: With the first few words of Neural Networks and Intellect: Using Model-Based Concepts, Leonid Perlovsky embarks on the daring task of creating a mathematical concept of “the mind.” The content of the book actually exceeds even the most daring of expectations. A wide variety of concepts are linked together intertwining the development of artificial intelligence, evolutionary computation, and even the philosophical observations ranging from Aristotle and Plato to Kant and Gvdel. Perlovsky discusses fundamental questions with a number of engineering applications to filter them through philosophical categories (both ontological and epistemological). In such a fashion, the inner workings of the human mind, consciousness, language-mind relationships, learning, and emotions are explored mathematically in amazing details. Perlovsky even manages to discuss the concept of beauty perception in mathematical terms. Beginners will appreciate that Perlovsky starts with the basics. The first chapter contains an introduction to probability, statistics, and pattern recognition, along with the intuitive explanation of the complicated mathematical concepts. The second chapter reviews numerous mathematical approaches, algorithms, neural networks, and the fundamental mathematical ideas underlying each method. It analyzes fundamental limitations of the nearest neighbor methods and the simple neural network. Vapnik’s statistical learning theory, support vector machines, and Grossberg’s neural field theories are clearly explained. Roles of hierarchical organization and evolutionary computation are analyzed. Even experts in the field might find interesting the relationships among various algorithms and approaches. Fundamental mathematical issues include origins of combinatorial complexity (CC) of many algorithms and neural networks (operations or training) and its relationship to di-

266 citations


Journal ArticleDOI
TL;DR: A biologically inspired neural network approach to real-time collision-free motion planning of mobile robots or robot manipulators in a nonstationary environment is proposed and is guaranteed by qualitative analysis and the Lyapunov stability theory.

229 citations


Journal ArticleDOI
TL;DR: This paper reports experiments on three phonological feature systems: the Sound Pattern of English (SPE) system, amulti-valued (MV) feature system which uses traditional phonetic categories such as manner, place, etc., and government Phonology which uses a set of structured primes.

199 citations


Journal ArticleDOI
TL;DR: This method, called Automated ANNs, is an attempt to develop an automatic procedure for selecting the architecture of an artificial neural network for forecasting purposes and shows that ANNs compete well with the other methods investigated, but may produce poor results if used under certain conditions.

190 citations


BookDOI
01 Jan 2000
TL;DR: An overview of Hybrid Neural Systems and Lessons from Past, Current Issues, and Future Research Directions in Extracting the Knowledge Embedded in Artificial Neural Networks are presented.
Abstract: An Overview of Hybrid Neural Systems.- An Overview of Hybrid Neural Systems.- Structured Connectionism and Rule Representation.- Layered Hybrid Connectionist Models for Cognitive Science.- Types and Quantifiers in SHRUTI - A Connectionist Model of Rapid Reasoning and Relational Processing.- A Recursive Neural Network for Reflexive Reasoning.- A Novel Modular Neural Architecture for Rule-Based and Similarity-Based Reasoning.- Addressing Knowledge-Representation Issues in Connectionist Symbolic Rule Encoding for General Inference.- Towards a Hybrid Model of First-Order Theory Refinement.- Distributed Neural Architectures and Language Processing.- Dynamical Recurrent Networks for Sequential Data Processing.- Fuzzy Knowledge and Recurrent Neural Networks: A Dynamical Systems Perspective.- Combining Maps and Distributed Representations for Shift-Reduce Parsing.- Towards Hybrid Neural Learning Internet Agents.- A Connectionist Simulation of the Empirical Acquisition of Grammatical Relations.- Large Patterns Make Great Symbols: An Example of Learning from Example.- Context Vectors: A Step Toward a "Grand Unified Representation".- Integration of Graphical Rules with Adaptive Learning of Structured Information.- Transformation and Explanation.- Lessons from Past, Current Issues, and Future Research Directions in Extracting the Knowledge Embedded in Artificial Neural Networks.- Symbolic Rule Extraction from the DIMLP Neural Network.- Understanding State Space Organization in Recurrent Neural Networks with Iterative Function Systems Dynamics.- Direct Explanations and Knowledge Extraction from a Multilayer Perceptron Network that Performs Low Back Pain Classification.- High Order Eigentensors as Symbolic Rules in Competitive Learning.- Holistic Symbol Processing and the Sequential RAAM: An Evaluation.- Robotics, Vision and Cognitive Approaches.- Life, Mind, and Robots.- Supplementing Neural Reinforcement Learning with Symbolic Methods.- Self-Organizing Maps in Symbol Processing.- Evolution of Symbolisation: Signposts to a Bridge between Connectionist and Symbolic Systems.- A Cellular Neural Associative Array for Symbolic Vision.- Application of Neurosymbolic Integration for Environment Modelling in Mobile Robots.

172 citations


Journal ArticleDOI
TL;DR: The research demonstrates that the proposed network architecture and the associated learning algorithm are quite effective in modeling the dynamics of complex processes and performing accurate MS predictions.

Journal ArticleDOI
H. Tsukimoto1
TL;DR: The algorithm is a decompositional approach which can be applied to any neural network whose output function is monotone such as sigmoid function, and it does not depend on training algorithms, and its computational complexity is polynomial.
Abstract: Presents an algorithm for extracting rules from trained neural networks. The algorithm is a decompositional approach which can be applied to any neural network whose output function is monotone such as a sigmoid function. Therefore, the algorithm can be applied to multilayer neural networks, recurrent neural networks and so on. It does not depend on training algorithms, and its computational complexity is polynomial. The basic idea is that the units of neural networks are approximated by Boolean functions. But the computational complexity of the approximation is exponential, and so a polynomial algorithm is presented. The author has applied the algorithm to several problems to extract understandable and accurate rules. The paper shows the results for the votes data, mushroom data, and others. The algorithm is extended to the continuous domain, where extracted rules are continuous Boolean functions. Roughly speaking, the representation by continuous Boolean functions means the representation using conjunction, disjunction, direct proportion, and reverse proportion. This paper shows the results for iris data.

Journal ArticleDOI
TL;DR: A method for determining the parameters of genetic regulatory networks, given expression level time series data, is introduced and evaluated using artificial data and applied to a set of actual expression data from the development of rat central nervous system.
Abstract: We have modeled genetic regulatory networks in the framework of continuous-time recurrent neural networks. A method for determining the parameters of such networks, given expression level time series data, is introduced and evaluated using artificial data. The method is also applied to a set of actual expression data from the development of rat central nervous system.

Journal ArticleDOI
TL;DR: This paper presents a continuous-time recurrent neural-network model for nonlinear optimization with any continuously differentiable objective function and bound constraints and shows that the recurrent neural network is globally exponentially stable for almost any positive network parameters.
Abstract: This paper presents a continuous-time recurrent neural-network model for nonlinear optimization with any continuously differentiable objective function and bound constraints. Quadratic optimization with bound constraints is a special problem which can be solved by the recurrent neural network. The proposed recurrent neural network has the following characteristics. 1) It is regular in the sense that any optimum of the objective function with bound constraints is also an equilibrium point of the neural network. If the objective function to be minimized is convex, then the recurrent neural network is complete in the sense that the set of optima of the function with bound constraints coincides with the set of equilibria of the neural network. 2) The recurrent neural network is primal and quasiconvergent in the sense that its trajectory cannot escape from the feasible region and will converge to the set of equilibria of the neural network for any initial point in the feasible bound region. 3) The recurrent neural network has an attractivity property in the sense that its trajectory will eventually converge to the feasible region for any initial states even at outside of the bounded feasible region. 4) For minimizing any strictly convex quadratic objective function subject to bound constraints, the recurrent neural network is globally exponentially stable for almost any positive network parameters. Simulation results are given to demonstrate the convergence and performance of the proposed recurrent neural network for nonlinear optimization with bound constraints.

Journal ArticleDOI
TL;DR: It was found that certain architectures are better able to learn an appropriate grammar than others, and the extraction of rules in the form of deterministic finite state automata is investigated.
Abstract: This paper examines the inductive inference of a complex grammar with neural networks and specifically, the task considered is that of training a network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or Government-and-Binding theory. Neural networks are trained, without the division into learned vs. innate components assumed by Chomsky (1956), in an attempt to produce the same judgments as native speakers on sharply grammatical/ungrammatical data. How a recurrent neural network could possess linguistic capability and the properties of various common recurrent neural network architectures are discussed. The problem exhibits training behavior which is often not present with smaller grammars and training was initially difficult. However, after implementing several techniques aimed at improving the convergence of the gradient descent backpropagation-through-time training algorithm, significant learning was possible. It was found that certain architectures are better able to learn an appropriate grammar. The operation of the networks and their training is analyzed. Finally, the extraction of rules in the form of deterministic finite state automata is investigated.

Journal ArticleDOI
TL;DR: It is proved that the proposed neural network can converge globally to the solution set of the problem when the matrix involved in the problem is positive semidefinite and can converge exponentially to a unique solution when the Matrix is positive definite.

Journal ArticleDOI
01 Feb 2000
TL;DR: An approach to model reference adaptive control based on neural networks is proposed and analyzed for a class of first-order continuous-time nonlinear dynamical systems and results showing the feasibility and performance are given.
Abstract: In this paper, an approach to model reference adaptive control based on neural networks is proposed and analyzed for a class of first-order continuous-time nonlinear dynamical systems. The controller structure can employ either a radial basis function network or a feedforward neural network to compensate adaptively the nonlinearities in the plant. A stable controller-parameter adjustment mechanism, which is determined using the Lyapunov theory, is constructed using a /spl sigma/-modification-type updating law. The evaluation of control error in terms of the neural network learning error is performed. That is, the control error converges asymptotically to a neighborhood of zero, whose size is evaluated and depends on the approximation error of the neural network. In the design and analysis of neural network-based control systems, it is important to take into account the neural network learning error and its influence on the control error of the plant. Simulation results showing the feasibility and performance of the proposed approach are given.

Journal ArticleDOI
TL;DR: In this article, artificial neural networks (ANNs) were utilized for predicting runoff over three medium-sized watersheds in Kansas, and the performances of ANNs possessing different architectures and recurrent neural networks were evaluated by comparison with other empirical approaches.
Abstract: Prediction of a watershed runoff resulting from precipitation events is of great interest to hydrologists. The nonlinear response of a watershed (in terms of runoff) to rainfall events makes the problem very complicated. In addition, spatial heterogeneity of various physical and geomorphological properties of a watershed cannot be easily represented in physical models. In this study, artificial neural networks (ANNs) were utilized for predicting runoff over three medium-sized watersheds in Kansas. The performances of ANNs possessing different architectures and recurrent neural networks were evaluated by comparisons with other empirical approaches. Monthly precipitation and temperature formed the inputs, and monthly average runoff was chosen as the output. The issues of overtraining and influence of derived inputs were addressed. It appears that a direct use of feedforward neural networks without time-delayed input may not provide a significant improvement over other regression techniques. However, inclusion of feedback with recurrent neural networks generally resulted in better performance.

Journal ArticleDOI
TL;DR: Existence and uniqueness of equilibrium, as well as its stability and instability, of a continuous-time Hopfield neural network are studied and a set of new and simple sufficient conditions are derived.
Abstract: Existence and uniqueness of equilibrium, as well as its stability and instability, of a continuous-time Hopfield neural network are studied. A set of new and simple sufficient conditions are derived.

Journal ArticleDOI
TL;DR: The capability of recurrent neural networks of approximating functions from lists of real vectors to a real vector space is examined and bounds on the resources sufficient for an approximation can be derived in interesting cases.

Journal ArticleDOI
TL;DR: A new macromodeling approach is developed in which a recurrent neural network (RNN) is trained to learn the dynamic responses of nonlinear microwave circuits to provide fast prediction of the full analog behavior of the original circuit.
Abstract: A new macromodeling approach is developed in which a recurrent neural network (RNN) is trained to learn the dynamic responses of nonlinear microwave circuits. Input and output waveforms of the original circuit are used as training data. A training algorithm based on backpropagation through time is developed. Once trained, the RNN macromodel provides fast prediction of the full analog behavior of the original circuit, which can be useful for high-level simulation and optimization. Three practical examples of macromodeling a power amplifier, mixer, and MOSFET are used to demonstrate the validity of the proposed macromodeling approach.

Journal ArticleDOI
TL;DR: In this article, a particular class of n-node recurrent neural networks (RNNs) is studied, and it is shown that for a well-defined set of parameters, every orbit of the RNN is asymptotic to a periodic orbit.
Abstract: We study a particular class of n-node recurrent neural networks (RNNs). In the 3-node case we use monotone dynamical systems theory to show, for a well-defined set of parameters, that, generically, every orbit of the RNN is asymptotic to a periodic orbit. We then investigate whether RNNs of this class can adapt their internal parameters so as to "learn" and then replicate autonomously (in feedback) certain external periodic signals. Our learning algorithm is similar to the identification algorithms in adaptive control theory. The main feature of the algorithm is that global exponential convergence of parameters is guaranteed. We also obtain partial convergence results in the n-node case.

Journal ArticleDOI
TL;DR: An effective algorithm for extracting M-of-N rules from trained feedforward neural networks is proposed and the rules extracted are surprisingly simple and accurate.
Abstract: An effective algorithm for extracting M-of-N rules from trained feedforward neural networks is proposed. First, we train a network where each input of the data can only have one of the two possible values, -1 or one. Next, we apply the hyperbolic tangent function to each connection from the input layer to the hidden layer of the network. By applying this squashing function, the activation values at the hidden units are effectively computed as the hyperbolic tangent (or the sigmoid) of the weighted inputs, where the weights have magnitudes that are equal one. By restricting the inputs and the weights to binary values either -1 or one, the extraction of M-of-N rules from the networks becomes trivial. We demonstrate the effectiveness of the proposed algorithm on several widely tested datasets. For datasets consisting of thousands of patterns with many attributes, the rules extracted by the algorithm are simple and accurate.

Proceedings ArticleDOI
07 Mar 2000
TL;DR: The application of the wavelet neural networks (WNNs) to short-term load forecasting is reported and the results have been compared with an artificial neural network and show an improved forecast with fast convergence.
Abstract: The application of the wavelet neural networks (WNNs) to short-term load forecasting is reported in this work. The wavelet neural network has much higher ability of generalization and fast convergence for learning than a multilayer feedforward neural network. The Morlet wavelet has been chosen in this study as the activation function. The 3-layer backpropagation algorithm is used to train the network by learning the nonlinear relationship between input and output of the network. The input data consists of historical load and weather information, which are collected over a period of 2-years (1994-1995) to train the network and data of one year (1996) is used to test the network. The results of the network have been compared with an artificial neural network and show an improved forecast with fast convergence.

Journal ArticleDOI
TL;DR: Numerical results show that the RNN proposed here is a very powerful tool for making prediction of chaotic time series.
Abstract: A new proposed method, i.e. the recurrent neural network (RNN), is introduced to predict chaotic time series. The effectiveness of using RNN for making one-step and multi-step predictions is tested based on remarkable few datum points by computer-generated chaotic time series. Numerical results show that the RNN proposed here is a very powerful tool for making prediction of chaotic time series.

Journal ArticleDOI
TL;DR: This paper presents a genetic algorithm capable of obtaining not only the optimal topology of a recurrent neural network but also the least number of connections necessary, which is applied to a problem of grammatical inference using neural networks, with very good results.

Journal ArticleDOI
01 Aug 2000
TL;DR: This paper attempts to define hysteretic memory (rate independent memory) and examines whether or not it could be modeled in neural networks, and proposes a novel neural cell based on a notion related the submemory pool, which accumulates the stimulus and ultimately assists neural networks to achieve model hysteresis.
Abstract: Hysteresis is a unique type of dynamic, which contains an important property, rate-independent memory. In addition to other memory-related studies such as time delay neural networks, recurrent networks, and reinforcement learning, rate-independent memory deserves further attention owing to its potential applications. In this paper, we attempt to define hysteretic memory (rate independent memory) and examine whether or not it could be modeled in neural networks. Our analysis results demonstrate that other memory-related mechanisms are not hysteresis systems. A novel neural cell, referred to herein as the propulsive neural unit, is then proposed. The proposed cell is based on a notion related the submemory pool, which accumulates the stimulus and ultimately assists neural networks to achieve model hysteresis. In addition to training by backpropagation, a combination of such cells can simulate given hysteresis trajectories.

Journal ArticleDOI
TL;DR: The dynamics in recurrent neural networks that process context-free languages can also be employed in processing some context-sensitive languages, and this continuity of mechanism between language classes contributes to the understanding of neural networks in modelling language learning and processing.
Abstract: Continuous-valued recurrent neural networks can learn mechanisms for processing context-free languages. The dynamics of such networks is usually based on damped oscillation around fixed points in state space and requires that the dynamical components are arranged in certain ways. It is shown that qualitatively similar dynamics with similar constraints hold for a(n)b(n)c(n), a context-sensitive language. The additional difficulty with a(n)b(n)c(n), compared with the context-free language a(n)b(n), consists of 'counting up' and 'counting down' letters simultaneously. The network solution is to oscillate in two principal dimensions, one for counting up and one for counting down. This study focuses on the dynamics employed by the sequential cascaded network, in contrast to the simple recurrent network, and the use of backpropagation through time. Found solutions generalize well beyond training data, however, learning is not reliable. The contribution of this study lies in demonstrating how the dynamics in recurrent neural networks that process context-free languages can also be employed in processing some context-sensitive languages (traditionally thought of as requiring additional computation resources). This continuity of mechanism between language classes contributes to our understanding of neural networks in modelling language learning and processing.

Proceedings ArticleDOI
27 Jul 2000
TL;DR: The paper discusses the problems with using gradient descent to train product unit neural networks, and shows that particle swarm optimization, genetic algorithms and LeapFrog are efficient alternatives to successfully train product units.
Abstract: Product units in the hidden layer of multilayer neural networks provide a powerful mechanism for neural networks to efficiently learn higher-order combinations of inputs. Training product unit networks using local optimization algorithms is difficult due to an increased number of local minima and increased chances of network paralysis. The paper discusses the problems with using gradient descent to train product unit neural networks, and shows that particle swarm optimization, genetic algorithms and LeapFrog are efficient alternatives to successfully train product unit neural networks.

Journal ArticleDOI
TL;DR: This neural network control algorithm is not claimed to be the best approach to this problem, nor does it claim it is better than a fuzzy controller, but is a contribution to the scientific dialogue about the boundary between the two overlapping disciplines.
Abstract: The ball-and-beam problem is a benchmark for testing control algorithms. Zadeh proposed (1994) a twist to the problem, which, he suggested, would require a fuzzy logic controller. This experiment uses a beam, partially covered with a sticky substance, increasing the difficulty of predicting the ball's motion. We complicated this problem even more by not using any information concerning the ball's velocity. Although it is common to use the first differences of the ball's consecutive positions as a measure of velocity and explicit input to the controller, we preferred to exploit recurrent neural networks, inputting only consecutive positions instead. We have used truncated backpropagation through time with the node-decoupled extended Kalman filter (NDEKF) algorithm to update the weights in the networks. Our best neurocontroller uses a form of approximate dynamic programming called an adaptive critic design. A hierarchy of such designs exists. Our system uses dual heuristic programming (DHP), an upper-level design. To our best knowledge, our results are the first use of DHP to control a physical system. It is also the first system we know of to respond to Zadeh's challenge. We do not claim this neural network control algorithm is the best approach to this problem, nor do we claim it is better than a fuzzy controller. It is instead a contribution to the scientific dialogue about the boundary between the two overlapping disciplines.