scispace - formally typeset
Open AccessProceedings ArticleDOI

Learning state space trajectories in recurrent neural networks

Reads0
Chats0
TLDR
A procedure for finding learning state space trajectories in recurrent neural networks by minimizing functionals and connectionism is described.
Abstract
A number of procedures are described for finding delta E/ delta W/sub ij/ where E is an error functional of the temporal trajectory of the states of a continuous recurrent network and w/sub ij/ are the weights of that network. Computing these quantities allows one to perform gradient descent in the weights to minimize E, so these procedures form the kernels of connectionist learning algorithms. Simulations in which networks are taught to move through limit cycles are shown, along with some empirical perturbation sensitivity tests. The author describes a number of elaborations of the basic idea, including mutable time delays and teacher forcing. He includes a complexity analysis of the various learning procedures discussed and analyzed. Temporally continuous recurrent networks seems particularly suited for temporally continuous domains, such as signal processing, control, and speech. >

read more

Content maybe subject to copyright    Report

Carnegie Mellon University
Research Showcase @ CMU
Computer Science Department School of Computer Science
1988
Learning state space trajectories in recurrent neural
networks
Barak Pearlmuer
Carnegie Mellon University
Follow this and additional works at: hp://repository.cmu.edu/compsci
is Technical Report is brought to you for free and open access by the School of Computer Science at Research Showcase @ CMU. It has been
accepted for inclusion in Computer Science Department by an authorized administrator of Research Showcase @ CMU. For more information, please
contact research-showcase@andrew.cmu.edu.
Published In
.

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS:
The copyright law of the United States (title 17, U.S. Code) governs the making
of photocopies or other reproductions of copyrighted material. Any copying of this
document without permission of its author may be prohibited by law.

Learning State Space Trajectories
in Recurrent Neural Networks
Barak A. Pearlmutter
December 31, 1988
CMU-CS-88-191-
Abstract
We describe a number of procedures for finding dE/dwij where E is an error functional of the
temporal trajectory of the states of a continuous recurrent network and w,y are the weights of that
network. Computing these quantities allows one to perform gradient descent in the weights to
minimize £, so these procedures form the kernels of connectionist learning algorithms. Simulations
in which networks are taught to move through limit cycles are shown. We also describe a number of
elaborations of the basic idea, such as mutable time delays and teacher forcing, and conclude with
a complexity analysis. This type of network seems particularly suited for temporally continuous
domains, such as signal processing, control, and speech.
This research was sponsored in part by National Science Foundation grant EET-8716324 and by the Office of Naval
Research under contract number N00014-86-K-0678. Barak Pearlmutter is a Fannie and John Hertz Foundation fellow.
The views and conclusions contained in this document are those of the author and should not be interpreted as representing
the official policies, either expressed or implied, of NSF, ONR, the Fannie and John Hertz Foundation or the U.S.
Government.

1
Introduction
Note: this is an expanded version of an earlier paper of the
same title [9].
Pineda [11] has shown how to train the fixpoints of a re-
current temporally continuous generalization of backprop-
agation networks [8,12,14]. Such networks are governed
by the coupled differential equations
1
dt
= -yi +
<r(Xi)
+ Ii
where
v
0j
(1)
(2)
is the total input to unit z, y, is the state of unit /, T
t
is
the time constant of unit z, a is an arbitrary differentiate
function
1
, w
f
y are the weights, and the initial conditions
yi(t
0
) and driving functions I
t
(t) are the inputs to the system.
Consider minimizing £(y), some functional of the tra-
jectory taken by y between to and t\. For instance,
&
=
f£(yo(0
~f(0)
2
dt
measures the deviation of
yo
from
the function/, and minimizing this £ would teach the net-
work to have
yo
imitate/. Below, we develop a technique
for computing dE(y)/dwij and
dE(y)/dT
iy
thus allowing us
to do gradient descent in the weights and time constants so
as to minimize £. The computation of dE/dwij seems to
require a phase in which the network is run backwards in
time,
and tricks for avoiding this are also described.
2
A Forward/Backward Technique
We can approximate the derivative in (1) with
dt
KJ
~
At
(3)
which yields a first order difference approximation to (1),
We use tildes to indicate temporally discretized versions of
continuous functions. The notation y,(0 is being used as
shorthand for the particular variable representing the de-
screte version of y
t
(fo + nAt), where n is an integer and
t
= to +
nAt.
Let us define
In the usual case £ is of the form f£f(y(t), Odt
so e
M =
df(y(t)
t
t) /dyi(t). Intuitively, measures how much a
typically <r(0 = (1 + e~*y-
1
, in which case *'(0 = <r(0(l - *(0).
Figure 1: The infinitesimal changes to y considered in e\(t)
(left) and z
{
(t) (right).
small change to yi at time t affects £ if everything else is
left unchanged.
Let us define
<9
+
£
Ut) = -rryr (6)
where the d+ denotes an ordered derivative [15], with vari-
ables ordered by time. Intuitively, z
t
(t) measures how much
a small change to y
t
at time t affects £ when this change
is propagated forward through time and influences the re-
mainder of the trajectory, as in figure 1. Of course, z; is
the limit of z, as At
*
0.
We can use the chain rule for ordered derivatives to cal-
culate zi(t) in terms of the Zj(t+At). According to the chain
rule,
we add all the separate influences that varying yi(t)
has on £. It has a direct contribution of Atei(t)
t
which
comprises the first term of our equation for z
t
(r). Varying
yi(t) by e has an effect on y,(z+ At) of e(l - At/Ti), giving
us a second term, namely (1
At/Ti)z{t
+
At).
Each weight
vv,y
allows
y
t
(r)
to influence yj(t+At). Let us
compute this influence in stages; varying
yKO
by e varies
Xj(t)
by ewij, which varies c(xj(t)) by
€Wija'(Xj(t)),
which
varies $j(t
+
At) by ewija
f
(xj(t))At/Tj. This gives us our
third and final term,
^jWij<rXxj(t))AtZj{t
+
At)/Tj.
Combining these,
At
(7)
2,-(0 = Atei(t) + ^
1
-
J
~
Zl
(t +
At)
If we put this in the form of (3) and take the limit as
At 0 we obtain the differential equation
dzi 1 !
It
=
Ti
Zi
~
ei
~2^
T
WijCr
(Xj)Zj
-
(8)
For boundary conditions note that by (5) and (6) z
t
(fO =
Atei(t\), so in the limit as At
0 we have zi(t\) = 0.
Consider making an infinitesimal change dwij to for
a period At starting at r. This will cause a corresponding
infinitesimal change in £ of
At
yMcrXxjitV—ZjiOdwij.
Unlwrsity Libraries
Carnegie Mellon
University
Pittsburgh,
Pennsylvania 152W

Figure 2: A lattice representation of (4).
Since we wish to know the effect of making this infinites-
imal change to w
x
y throughout time, we integrate over the
entire interval yielding
dE 1 f
h
sj-U
*****
<9>
If we substitute p, = 7f
1
into (4), find dE/dpi by pro-
ceeding analogously, and substitute 7, back in we get
dE 1 f
h
dy
L
m,-7,h«i*
<
I0)
One can also derive (8), (9) and (10) using the calculus
of variations and Lagrange multipliers (William Skaggs,
personal communication), or from the continuous form of
dynamic programming [5].
3
Simulation Results
Using first order finite difference approximations, we inte-
grated the system y forward from to to t\
9
set the boundary
conditions z;(fi) = 0, and integrated the system z back-
wards from t\ to
to
while numerically integrating z
;
cr
/
(x
/
)y
l
and
Zidyi/du
thus computing dE/dwij and dE/dTi. Since
computing
dzjdt
requires knowing <7'(jcj), we stored it and
replayed it backwards as well. We also stored and replayed
y, as it is used in expressions being numerically integrated.
We used the error functional
£-5£^\<*-4)
2
*
(11)
where d
t
(t) is the desired state of unit / at time t and si(t)
is the importance of unit i achieving that state at that time.
Throughout, we used
<r(£)
= (1 + e"*)"
1
. Time constants
were initialized to 1, weights were initialized to uniformly
distributed random values between 1 and
-1,
and the initial
values
y
t
(to)
were set to /,(*<)) +
cr(0).
The simulator used
the approximations (4) and (7) with At = 0.1.
All of these networks have an extra unit which has no
incoming connections, an external input of 0.5, and out-
going connections to all other units. This unit provides a
bias,
which is equivalent to the negative of a threshold.
This detail is suppressed below.
3.1 Exclusive Or
The network of figure 3 was trained to solve the xor prob-
lem. Aside from the addition of time constants, the net-
work topology was that used by Pineda in [11]. We defined
E - Y^
k
j fify^ " d^
k)
)
2
dt where k ranges over the four
cases,
d is the correct output, and y
0
is the state of the
output unit. The inputs to the net and range over
the four possible boolean combinations in the four differ-
ent cases. With suitable choice of step size and momentum
training time was comparable to standard backpropagation,
averaging about one hundred epochs.
For this task it is to the network's benefit for units to
attain their final values as quickly as possible, so there
was a tendency to lower the time constants towards 0. In
an effort to avoid small time constants, which degrade the
numerical accuracy of the simulation, we introduced a term
to decay the time constants towards 1. This decay factor
was not used in the other simulations described below, and
was not really necessary in this task if a suitably small At
was used in the simulation.
It is interesting that even for this binary task, the network
made use of dynamical behavior. After extensive training
the network behaved as expected, saturating the output unit
to the correct value. Earlier in training, however, we oc-
casionally (about one out of every ten training sessions)
observed the output unit at nearly the correct value be-
tween t = 2 and t = 3, but then saw it move in the wrong
direction at t = 3 and end up stabilizing at a wildly incor-
rect value. Another dynamic effect, which was present in
almost every run, is shown in figure 4. Here, the output
unit heads in the wrong direction initially and then corrects
itself before the error window. A very minor case of diving
towards the correct value and then moving away is seen in
the lower left hand corner of figure 4.
input
hidden output
Figure 3: The XOR network.
3

Citations
More filters
Journal ArticleDOI

Backpropagation through time: what it does and how to do it

TL;DR: This paper first reviews basic backpropagation, a simple method which is now being widely used in areas like pattern recognition and fault diagnosis, and describes further extensions of this method, to deal with systems other than neural networks, systems involving simultaneous equations or true recurrent networks, and other practical issues which arise with this method.
Journal ArticleDOI

A learning algorithm for continually running fully recurrent neural networks

TL;DR: The exact form of a gradient-following learning algorithm for completely recurrent networks running in continually sampled time is derived and used as the basis for practical algorithms for temporal supervised learning tasks.
Journal ArticleDOI

Neural networks for control systems: a survey

TL;DR: In this paper, the authors focus on the promise of artificial neural networks in the realm of modelling, identification and control of nonlinear systems and explore the links between the fields of control science and neural networks.
Journal ArticleDOI

A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures

TL;DR: The LSTM cell and its variants are reviewed and their variants are explored to explore the learning capacity of the LSTm cell and the L STM networks are divided into two broad categories:LSTM-dominated networks and integrated LSTS networks.
Journal ArticleDOI

Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment

TL;DR: A new principle of sensorimotor control of legged locomotion in an unpredictable environment is proposed on the basis of neurophysiological knowledge and a theory of nonlinear dynamics by investigating the performance of a bipedal model investigated by computer simulation.
References
More filters
Journal ArticleDOI

Optimization by Simulated Annealing

TL;DR: There is a deep and useful connection between statistical mechanics and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters), and a detailed analogy with annealing in solids provides a framework for optimization of very large and complex systems.
Book ChapterDOI

Learning internal representations by error propagation

TL;DR: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion.
Book

Learning internal representations by error propagation

TL;DR: In this paper, the problem of the generalized delta rule is discussed and the Generalized Delta Rule is applied to the simulation results of simulation results in terms of the generalized delta rule.
Journal ArticleDOI

Neural computation of decisions in optimization problems

TL;DR: Results of computer simulations of a network designed to solve a difficult but well-defined optimization problem-the Traveling-Salesman Problem-are presented and used to illustrate the computational power of the networks.
Journal ArticleDOI

A learning algorithm for continually running fully recurrent neural networks

TL;DR: The exact form of a gradient-following learning algorithm for completely recurrent networks running in continually sampled time is derived and used as the basis for practical algorithms for temporal supervised learning tasks.
Related Papers (5)