scispace - formally typeset
Open AccessJournal ArticleDOI

A small spiking neural network with LQR control applied to the acrobot

TLDR
This paper presents the results of a computer simulation which, combined a small network of spiking neurons with linear quadratic regulator (LQR) control to solve the acrobot swing-up and balance task.
Abstract
This paper presents the results of a computer simulation which, combined a small network of spiking neurons with linear quadratic regulator (LQR) control to solve the acrobot swing-up and balance task. To our knowledge, this task has not been previously solved with spiking neural networks. Input to the network was drawn from the state of the acrobot, and output was torque, either directly applied to the actuated joint, or via the switching of an LQR controller designed for balance. The neural network’s weights were tuned using a (μ + λ)-evolution strategy without recombination, and neurons’ parameters, were chosen to roughly approximate biological neurons.

read more

Content maybe subject to copyright    Report

ORIGINAL ARTICLE
A small spiking neural network with LQR control
applied to the acrobot
Lukasz Wiklendt Æ Stephan Chalup Æ
Rick Middleton
Received: 8 July 2007 / Accepted: 8 April 2008
Ó Springer-Verlag London Limited 2008
Abstract This paper presents the results of a computer
simulation which, combined a small network of spiking
neurons with linear quadratic regulator (LQR) control to
solve the acrobot swing-up and balance task. To our
knowledge, this task has not been previously solved with
spiking neural networks. Input to the network was drawn
from the state of the acrobot, and output was torque, either
directly applied to the actuated joint, or via the switching
of an LQR controller designed for balance. The neural
network’s weights were tuned using a (l + k)-evolution
strategy without recombination, and neurons’ parameters,
were chosen to roughly approximate biological neurons.
Keywords Spiking neural networks Acrobot
LQR Evolution
1 Introduction
We are studying the applicability of spiking neural net-
works (SNNs) to the control of robots with complex
nonlinear morphologies that lead to unstable states
requiring constant active feedback control. This is difficult
using conventional control due to the effort required in
modeling the robot’s equations of motion and deriving a
robust control scheme on the basis of that model.
Theoretical results of Maass [14] have shown that SNNs
(third generation NNs) are computationally more powerful
than standard sigmoid NNs (second generation NNs) or
networks of threshold units (perceptrons or first generation
NNs) provided certain conditions hold. Such conditions
require a delay-coded input which linearly maps an analog
input signal to the neuron’s fire time. As a first step we
chose not to implement these conditions since they are
biologically less plausible; but they will be considered for
future studies. In the present study we use rate-coded input
which maps an analog input signal to the rate at which the
neuron fires.
Additional motivation to investigate SNNs is that they
are better models to represent the spiking nature of bio-
logical neurons, which are used in the mechanical control
of biological systems. Recently, networks of 10,000
spiking neurons have been used in the large-scale imple-
mentations of the blue brain project [16] to model cortical
columns of the brain. However, spiking neurons not only
occur in the massive networks of the brain but also in
relatively small networks of the peripheral regions of the
nervous system such as reflex networks in the limbs [11],
which can be modeled with small networks in the order of
tens of neurons. It is an open question whether small net-
works of spiking neurons can successfully be employed to
control limb movement and reflexes of unstable robots,
such as for example, bi-ped robots. The aim of the present
paper is to shed light on these questions by applying small
SNNs to the control of an underactuated, simulated robot.
Primary work has been initiated on implementing SNNs
as controllers for robots governed by simple dynamics,
such as two wheeled robot cars, rather than underactuated
unstable robots governed by highly nonlinear dynamics.
Joshi and Maass [10] successfully used a SNN as the
‘liquid’ of a liquid state machine to control a fully actu-
ated two-link robot arm in the horizontal plane. The
controller was created by learning a filter which trans-
formed the state of the network to a control signal for the
robot’s motors, while the neurons and synaptic weights
L. Wiklendt (&) S. Chalup R. Middleton
School of Electrical Engineering and Computer Science,
The University of Newcastle, Callaghan, NSW 2308, Australia
e-mail: lukasz.wiklendt@studentmail.newcastle.edu.au
123
Neural Comput & Applic
DOI 10.1007/s00521-008-0187-1

remained constant throughout the learning process. Flor-
eano et al. [6] evolved only the structure of a SNN with ten
hidden neurons, while keeping the synaptic weights con-
stant. Even with their greatly simplified model of spiking
neurons, customized to fit the microcontroller on a small
sugar cube sized two-wheeled robot, the robot was able to
successfully navigate an oval track avoiding walls. French
and Damper [7] assembled a SNN out of two network
components, each evolved to achieve the particular subtask
‘frequency discrimination’ of an overall objective to
control a two-wheeled robot to drive towards flashing lights
of distinct frequencies. Their evolution strategy allowed for
networks of arbitrary cardinality, and neurons and synapses
of various models. Federici [5] created a novel algorithm,
which evolved the rules of development of a SNN that was
used to grow the network from a single cell to its final
structure. The technique was applied to a wall-avoidance
task for a two-wheel robot with infrared sensors.
Rather than applying SNNs to stable robotic platforms
such as the ones mentioned previously, we instead apply
them to an unstable robot with highly nonlinear dynamics.
The acrobot [19] is a two-link non-linear underactuated
robotic platform, commonly used as a benchmark for new
artificial intelligence approaches targeted at dynamics
control. A drawing of the acrobot is shown in Fig. 1. Its
motion resembles that of a gymnast swinging on a hori-
zontal bar, the difference being that unlike the gymnast the
acrobot can spin its q
2
joint ‘hip’ through complete rev-
olutions. The object of the task is to swing the acrobot from
a hanging-down position to a standing-up position. This is
quite difficult as only the q
2
joint is actuated. A more
formal description of the acrobot and the swing-up task is
given in Sect. 2.
One of the first to approach the acrobot swing-up control
problem was Spong [19]. He created two swing-up strate-
gies based on the partial feedback linearization of the
acrobot dynamics, one for each joint. He applied each
strategy with a linear quadratic regulator (LQR) to balance
the acrobot in the standing position, and both strategies
produced a successful swing-up and balance. Boone [3]
used an N-step lookahead search to select the appropriate
torque which first added the required amount of energy to
the acrobot which would allow it to reach its standing-up
position. Once this was achieved, the search objective was
changed to find a trajectory which would get the acrobot
close to the standing-up state. Torque output was only one
of two possible extreme values, either positive or negative
1 Nm, which is known as bang-bang control. Allowing
only two possible output values, and limiting the amount of
switching between these values reduced the search to a
practical size. Yoshimoto et al. [21] successfully applied
reinforcement learning to the task, which switched between
one of five conventional controllers.
The area of acrobot swing-up control is quite advanced
with many other existing successful techniques in addition
to those mentioned previously, including sigmoid NN
function approximation for reinforcement learning [4, 20],
evolving a non-feedback vector of torque values [12], a
fuzzy controller used to increase the acrobot’s energy [13],
output zeroing based on angular momentum and rotation
angle of center-of-mass [17]. Although the primary focus
of this study is the application of SNNs as control models
for complex robotic platforms, it has given some specific
insights into the techniques which may be used to improve
acrobot swing-up solutions.
This paper is organized as follows. Section 2 describes
the acrobot and the swing-up task used in this study. Sec-
tion 3 describes SNNs and the discrete time model that was
used for simulations. Section 4 describes the network
configuration and the LQR controller used in simulations.
Section 5 presents the evolution and simulation setup. A
discussion of the results is given in Sect. 6, with conclu-
sions following in Sect. 7.
2 The acrobot
The acrobot is composed of two links, an inner and outer
link connected together by an actuated hinge joint, with
their relative angle given by q
2
. The inner link is anchored
with an unactuated hinge joint, and subtends an angle of q
1
to the horizontal. The torque applied at the actuated hinge
is given by s
2
, which is limited to js
2
js
2
max
: Angular
velocities of joints q
1
and q
2
are
_
q
1
and
_
q
2
: The state of the
acrobot at time t is given by the vector xðtÞ¼
ðq
1
ðtÞ q
2
ðtÞ
_
q
1
ðtÞ
_
q
2
ðtÞÞ
T
: There are two notable equilibria,
an unstable one at q
u
¼ð
p
2
000Þ
T
; the other stable at
q
s
¼ð
p
2
000Þ
T
.
Acrobot Task: Given the initial state x(0) = q
s
, find a
control strategy which will get the acrobot to the final state
x(T)=q
u
, and keep it there as t ? ?.
Fig. 1 The acrobot showing directions for gravity, torque and joint
angles
Neural Comput & Applic
123

In this study T = 20 s was chosen, after a series of pilot
experiments which showed this value to induce a favorable
learning rate without excessive computation time. The
acrobot task is considered a difficult control problem
because the acrobot is underactuated and has highly non-
linear dynamics.
The equations of motion governing the dynamics of the
acrobot are presented in [19]. Masses of the links are given
by m
i
, lengths l
i
, and inertia tensors I
i
, where i = 1,2 for the
inner and outer links, respectively. Acceleration due to
gravity is given by g. The parameters used for simulation
are listed in Table 1.
The acrobot is simulated using a fourth-order Runge-
Kutta integrator [18] with a time step of 1 millisecond.
3 Spiking networks
This section gives an introduction to SNNs, including the
models used in our simulations. The word ‘spiking’ will
often be omitted here when mentioning spiking neurons or
networks.
Neurons and synapses in a SNN are arranged as nodes and
edges of a directed graph, respectively. Neurons send and
receive data via the timing of spike events which are sent
across synapses from a source neuron to a destination neu-
ron. Each neuron i contains its own state u
i
representing the
exponentially decaying membrane potential of biological
neurons. When a spike reaches a neuron then that neuron’s
membrane potential is momentarily raised (or lowered) in
proportion to the postsynaptic potential function . If the
membrane potential rises above the threshold h then the
neuron fires a spike to its destination neurons, and the neu-
ron’s membrane potential is decreased by g for a refraction
period, thereby preventing more immediate firings.
In biological NNs spikes require a short amount of time to
travel from their source neuron to the destination neurons.
However, in this study, spikes are not subjected to an explicit
synaptic delay. Synapses have a ‘weight’ value w
ij
, which
determines the amount a spike from a source neuron j
increases a destination neuron’s membrane potential u
i
.Itis
these weights which are tuned during learning. In future
studies tunable delays will also be used since they have the
potential to increase the computational power of SNNs [14].
The leaky integrate-and-fire model for neurons [8, 9]is
used in our simulations since it is concise, simple to
implement and fast to simulate. To reduce the number of
parameters that need tuning during learning all of the
neurons are made homogeneous. The precise neuron model
is shown in Eqs. (1)–(4).
u
i
ðtÞ¼
X
t
i
2F
i
ðtÞ
gðt t
i
Þþ
X
j
X
t
j
2F
j
ðtÞ
w
ij
ðt t
j
Þð1Þ
F
i
ðtÞ¼ t
i
j u
i
ðt
i
Þ[ h; t
i
\t
fg
ð2Þ
gðsÞ¼m exp
s
s
q

ð3Þ
ðsÞ¼exp
s
s
l

exp
s
s
r

ð4Þ
where F
i
ðtÞ contains all firing times for any neuron i that
occur before time t, and m, s
q
, s
l
, s
r
are constants which
specify the size and shape of the refraction function g and
the postsynaptic potential function . This neuron model is
approximated in discrete time to simplify implementation
and allow for easy integration with the discrete time sim-
ulation of the acrobot.
We use a discrete-time model governing the dynamics
of a neuron i, given by Eqs. (5)–(8).
v
i
ðt þ DtÞ¼Av
i
ðtÞþBu
i
ðtÞð5Þ
A ¼
d
l
00
0 d
r
0
00d
q
0
@
1
A
B ¼
10
10
0 m
0
@
1
A
u
i
ðtÞ¼
j
i
ðtÞ
f
i
ðtÞ

ð6Þ
f
i
ðtÞ¼
1; if ð1 1 1Þv
i
ðtÞ[ h
0; otherwise
ð7Þ
j
i
ðtÞ¼
X
j
w
ij
f
j
ðtÞð8Þ
where d
l
, d
r
, d
q
[ (0, 1), m [ 0, and h C 0. The values
d
l
= 0.9, d
r
= 0.8, d
q
= 0.9, m = 1, and h = 1 are used in
the simulations presented in this study. Also, a time step of
Dt = 1 ms is used, and t is an integer multiple of Dt. These
values were chosen by hand so that the neurons imitate
(quite roughly) dynamics such as firing rates and post-
synaptic potential durations of biological neurons [11].
The values d
l
, d
r
and d
q
can be derived from s
l
, s
r
and
s
q
in Eqs. (3) and (4) with the following relation
d
a
¼ exp
Dt
s
a

; for a 2fl; r; qg: ð9Þ
The state of neuron i is given by the time-varying vector
v
i
¼ðv
1
v
2
v
3
Þ
T
; where v
i
(0) = 0. The value of the inner
product (1 -1 -1) v
i
in Eq. (7) represents the membrane
potential of neuron i, and h is the threshold. When the
membrane potential rises above the threshold then the
neuron fires by setting f
i
to 1, and the membrane potential
is consequently reduced by the refraction constant m in the
Table 1 Acrobot parameters, where all units are in SI
m
1
m
2
l
1
l
2
I
1
I
2
g s
2
max
1 1 1 1 1/12 1/12 9.8 10
Neural Comput & Applic
123

next step. The f
i
term can be thought of as the spiking
‘output’ of neuron i.
The j
i
term in Eq. (8) is the ‘input’ to neuron i. The
sum is over all source neurons of neuron i, and w
ij
is the
synaptic weight from neuron j to neuron i.Ifw
ij
[ 0 then a
spike arriving at a neuron increases the neuron’s membrane
potential (with delayed onset since d
l
[ d
r
[ 0). This
represents an excitatory post-synaptic potential and its
practical purpose is to increase the chance of the neuron
firing. If w
ij
\ 0 then an incoming spike reduces the
membrane potential, which represents an inhibitory post-
synaptic potential and decreases the chance of the neuron
firing.
This model is used for the hidden neurons of the net-
work, that is, neurons that are connected only to other
neurons. Sensor and motor neurons used in our simulations
have a slightly different model as they must also receive
input from and send output to the environment.
3.1 Motor and sensor neurons
A hidden neuron can be used to model a motor neuron,
where the motor neuron’s analog output is given by
ð1 10Þv
i
ðtÞ: ð10Þ
However this is a verbose way to model a motor neuron,
since the d
q
and m parameters are no longer used because
the neuron is never required to fire spikes.
A sensor neuron can be modeled by replacing A, B and
u
i
in Eqs. (5) and (6) with A
s
, B
s
and u
s
i
respectively
A
s
¼
00 0
00 0
00d
q
0
@
1
A
B
s
¼
10
00
0 m
0
@
1
A
u
s
i
ðtÞ¼
j
s
i
ðtÞ
f
i
ðtÞ

ð11Þ
where the analog input to a sensor neuron is set to j
s
i
;
eliminating equation (8). For brevity the time-varying
analog input to a sensor neuron will be referred to as j.In
simulations the constraint j C 0 is applied. A value of
h = 0 is used for all sensor neurons, since setting h [0
results in a dead-zone for inputs Bh. This form of encoding
analog input into spikes is commonly known as rate
encoding, where the firing rate of a sensor neuron is pro-
portional to the input. Another method for encoding is
called delay encoding where the average firing rate remains
constant and the precise time at which the neuron fires is
proportional to the input.
A notable property of modeling sensor neurons this way
is that their output scales either logarithmicaly or linearly,
depending on the magnitude of the input. Assuming a
constant input j to a sensor neuron, the spiking rate of the
neuron can be calculated by finding the time between
spikes. The state of the sensor neuron is initialized as if it
had just spiked, and the time it takes for the v
3
component
of the state v to decay to the next spike is calculated using
the equation d
q
y
(m + j)=j, where y is the number of
milliseconds between spikes. Therefore, the spike rate r
(spikes per millisecond) of a sensor neuron for a constant
input j [ 0 is given by the formula
rðjÞ¼
1
log
d
q
j
mþj

ð12Þ
noting that d
q
[ (0,1) and m [ 0. For large values of j the
approximation r(j) & kj is observed, where the constant
k [0, since
lim
j!1
rðmjÞ
rðjÞ
¼ m ð13Þ
and for small values of j the approximation rðjÞ
1
log
d
q
ðj=mÞ
is observed. This allows a sensor neuron to remain
sensitive to tiny inputs by having a spiking rate that is
inversely proportional to the log of the input, without an
exponential increase in sensitivity between large inputs.
Although these properties are derived for extreme values of
j, in practice we notice logarithmic proportions for j as
large as 0.01, and linear proportions for j as small as 1,
since d
q
= 0.9 and m = 1 in our simulations. However, this
property of sensor neurons may be disadvantageous
resulting in excessive output given small inputs. Such
overly sensitive control is visible in the results, particularly
during the acrobot’s balance period (after approx.
3,300 ms) in the bottom plot of Fig. 5b.
4 Controller
A combined SNN and LQR [1] controller was used in this
study to solve the acrobot task. A picture of the network
topology is shown in Fig. 2.
The network contains eight sensor neurons, two for each
element x
i
[ x of the acrobot state. Each element x
i
is split
into a positive and negative sensor neuron, whose inputs are
j = x
i
and j =-x
i
, respectively. Without this treatment
there would be no sensor information to the network about
negative state values. Sensor neurons in Fig. 2 are labeled
with the acrobot state variable which supplies the input to
that neuron, appended with a + ’’ o r ‘‘- symbol to
discern between positive and negative input signing.
The network contains two motor neurons. The torque
motor neuron sends its output directly to the torque s
2
, and
is modeled as mentioned in Sect. 3.1. The LQR controller
is activated when it receives positive input (j [ 0) and
deactivated when it receives negative input (j \ 0). Also,
when the LQR neuron is active the torque motor neuron is
Neural Comput & Applic
123

disabled, and the LQR neuron takes control by setting its
own value for s
2
.
There are also four hidden neurons which are com-
pletely connected with each other, without loop-back
connections as shown in Fig. 2. They bridge the sensor and
motor neurons and their recursive connections allow for
complex dynamics.
4.1 LQR controller
When the acrobot is near the unstable equilibrium state q
u
it can be kept there using a LQR [1]. To create the LQR the
acrobot’s equations of motion are linearized about the state
q
u
and the control law s
2
(t)=-Kx(t) is optimized by
minimizing the quadratic cost
J ¼
Z
1
0
jjxðtÞq
u
jj
2
þ s
2
2

dt: ð14Þ
The approximate value of the gain matrix becomes K ¼
ð269:522 67:522 98:966 29:047Þ for the parameters
given in Table 1.
The LQR controller is designed for a linearized version
of the acrobot at the standing-up state q
u
, which is an
approximation of the acrobot accurate to within only a
small margin of q
u
. The linearization simplifies the acrobot
model by making, within the equations of motion,
replacements such as sin(e) ? e, cos(e) ?1 and e
2
?0, for
values of e close to 0. This means the LQR controller does
not function as intended for states too far from q
u
, which
empirically corresponds to a few degrees from the stand-
ing-up state, and so is incapable of swinging-up and
balancing the acrobot on its own from the initial state q
s
.
5 Evolution
This section covers the evolution strategy (ES) used for
evolving the SNN controller. The SNN’s weights were
evolved using a (l + k)-ES [2] for 100 generations,
with l = 10 and k = 70. To elaborate, there was a popu-
lation of 10 parents producing 70 offsprings each
generation, and the top 10 fittest of the combination of
parents and offspring survived to make up the parents for
the next generation. No recombination was used, and
mutation of the synaptic weights occurred with an evolving
standard deviation strategy parameter for each weight. This
approach was chosen, after a series of pilot experiments, in
order to produce successful swing-up strategies without
excessive computation time.
The fitness of an individual was calculated from the cost
J given by the equation
J ¼
X
210
4
t¼0
ðxðtÞq
u
Þ
T
Q ðxð tÞq
u
Þð15Þ
where the matrix Q was tuned by hand and given by
Q ¼
10 0 0 0
0500
00
1
2
0
000
1
2
0
B
B
@
1
C
C
A
ð16Þ
to prioritize q
1
over q
2
and angles over angular velocities.
Note, the closer the cost is to 0 the fitter the individual
chromosome, and only the relative fitness of individuals in
the population is important rather than the actual value of
their cost.
The fitness for each individual chromosome was deter-
mined by transcribing it to the weights of the network and
q
1
q
1
q
1
q
1
qr
q
2
q
2
q
2
q
2
1
01
32
1
0
2
3
+
+
+
+
Fig. 2 Network topology and synaptic weights. Positive and negative
weights are shown in black and gray lines, respectively. A synapse’s
thickness is shown in proportion to the size of its weight |w
ij
|, where
the thinnest line represents a weight size of 0.2, and the maximum and
minimum weights are 17.4 and -37.7, respectively. The network’s
synaptic connections have been split into two diagrams for clarity.
The diagram on the left shows connections from the sensor to the
hidden neurons, and from the hidden to the motor neurons. The
diagram on the right shows connections between hidden neurons only
Fig. 3 Stroboscopic sequences of each of the fittest individuals from
5 generations of a single evolution. From top to bottom the
generations are 0th, 5th, 11th, 21st and the elite solution at 41st.
Each row represents a separate 20-s simulation with 50 ms between
frames
Neural Comput & Applic
123

Citations
More filters
Journal ArticleDOI

Principles of Neural Science

Michael P. Alexander
- 06 Jun 1986 - 
TL;DR: The editors have done a masterful job of weaving together the biologic, the behavioral, and the clinical sciences into a single tapestry in which everyone from the molecular biologist to the practicing psychiatrist can find and appreciate his or her own research.
Journal ArticleDOI

Optimisation of a fuzzy logic controller using the Bees Algorithm

TL;DR: Simulation results confirmed that using the Bees Algorithm to optimise the membership functions and the scaling gains of the fuzzy system improved the controller performance.
Journal ArticleDOI

MBPOA-based LQR controller and its application to the double-parallel inverted pendulum system

TL;DR: A novel LQR approach based on the Pareto-based Multi-objective Binary Probability Optimization Algorithm (MBPOA) is proposed in this paper, in which MBPOA is utilized to search for the optimal weighting matrices to relieve the effort of parameter settings and improve the control performance according to the pre-defined objective functions.
Journal ArticleDOI

Multi-objective design of state feedback controllers using reinforced quantum-behaved particle swarm optimization

TL;DR: An aggregated dynamic weighting criterion is introduced that dynamically combines the soft and hard constraints with control objectives to provide the designer with a set of Pareto optimal solutions and lets her to decide the target solution based on practical preferences.
Dissertation

Reinforcement Learning in Continuous State- and Action-Space

TL;DR: This thesis investigates methods to select the optimal action when artificial neural networks are used to approximate the value function, through the application of numerical optimization techniques and proposes two novel algorithms which are based on the applications of two alternative action selection methods.
References
More filters
Book

Numerical Recipes in C: The Art of Scientific Computing

TL;DR: Numerical Recipes: The Art of Scientific Computing as discussed by the authors is a complete text and reference book on scientific computing with over 100 new routines (now well over 300 in all), plus upgraded versions of many of the original routines, with many new topics presented at the same accessible level.
Book

Principles of Neural Science

TL;DR: The principles of neural science as mentioned in this paper have been used in neural networks for the purpose of neural network engineering and neural networks have been applied in the field of neural networks, such as:
Journal ArticleDOI

Principles of Neural Science

Michael P. Alexander
- 06 Jun 1986 - 
TL;DR: The editors have done a masterful job of weaving together the biologic, the behavioral, and the clinical sciences into a single tapestry in which everyone from the molecular biologist to the practicing psychiatrist can find and appreciate his or her own research.
Related Papers (5)
Frequently Asked Questions (12)
Q1. What have the authors contributed in "A small spiking neural network with lqr control applied to the acrobot" ?

This paper presents the results of a computer simulation which, combined a small network of spiking neurons with linear quadratic regulator ( LQR ) control to solve the acrobot swing-up and balance task. To their knowledge, this task has not been previously solved with spiking neural networks. 

Assuming a constant input j to a sensor neuron, the spiking rate of the neuron can be calculated by finding the time betweenspikes. 

The leaky integrate-and-fire model for neurons [8, 9] is used in their simulations since it is concise, simple to implement and fast to simulate. 

networks of 10,000 spiking neurons have been used in the large-scale implementations of the blue brain project [16] to model cortical columns of the brain. 

In future studies tunable delays will also be used since they have the potential to increase the computational power of SNNs [14]. 

The area of acrobot swing-up control is quite advanced with many other existing successful techniques in addition to those mentioned previously, including sigmoid NN function approximation for reinforcement learning [4, 20], evolving a non-feedback vector of torque values [12], a fuzzy controller used to increase the acrobot’s energy [13], output zeroing based on angular momentum and rotation angle of center-of-mass [17]. 

The trajectory presented can be used to devise a detailed fitness function and perhaps network architecture to bias learning towards the trajectory. 

This form of encoding analog input into spikes is commonly known as rate encoding, where the firing rate of a sensor neuron is proportional to the input. 

In the bottom plot of Fig. 5b the authors can see that during the swing-up phase only extreme values of torque produced by the LQR are exploited. 

The fitness for each individual chromosome was determined by transcribing it to the weights of the network andrunning a 20-s simulation (20,000 1-ms steps) of the acrobot under the control of the network. 

Funding for this research has been supplied in part by the University of Newcastle Research Scholarship (UNRS) and by The ARC Centre for Complex Dynamic Systems and Control (CDSC). 

To elaborate, there was a population of 10 parents producing 70 offsprings each generation, and the top 10 fittest of the combination of parents and offspring survived to make up the parents for the next generation.