What is the use of the leeky integrate-and-fire model for neurons?

The leaky integrate-and-fire model for neurons [8, 9] is used in their simulations since it is concise, simple to implement and fast to simulate.

how can a spiking neural network be used to bias learning towards the trajectory?

The trajectory presented can be used to devise a detailed fitness function and perhaps network architecture to bias learning towards the trajectory.

What is the encoding of analog input into spikes?

This form of encoding analog input into spikes is commonly known as rate encoding, where the firing rate of a sensor neuron is proportional to the input.

What is the acrobot’s position in the Fig. 5b plot?

In the bottom plot of Fig. 5b the authors can see that during the swing-up phase only extreme values of torque produced by the LQR are exploited.

How was the fitness of the acrobot determined?

The fitness for each individual chromosome was determined by transcribing it to the weights of the network andrunning a 20-s simulation (20,000 1-ms steps) of the acrobot under the control of the network.

Who has provided funding for this research?

Funding for this research has been supplied in part by the University of Newcastle Research Scholarship (UNRS) and by The ARC Centre for Complex Dynamic Systems and Control (CDSC).

How many parents did the fittest survive?

To elaborate, there was a population of 10 parents producing 70 offsprings each generation, and the top 10 fittest of the combination of parents and offspring survived to make up the parents for the next generation.

(Open Access) A small spiking neural network with LQR control applied to the acrobot (2009) | Lukasz Wiklendt

ORIGINAL ARTICLE

A small spiking neural network with LQR control

applied to the acrobot

Lukasz Wiklendt Æ Stephan Chalup Æ

Rick Middleton

Received: 8 July 2007 / Accepted: 8 April 2008

Ó Springer-Verlag London Limited 2008

Abstract This paper presents the results of a computer

simulation which, combined a small network of spiking

neurons with linear quadratic regulator (LQR) control to

solve the acrobot swing-up and balance task. To our

knowledge, this task has not been previously solved with

spiking neural networks. Input to the network was drawn

from the state of the acrobot, and output was torque, either

directly applied to the actuated joint, or via the switching

of an LQR controller designed for balance. The neural

network’s weights were tuned using a (l + k)-evolution

strategy without recombination, and neurons’ parameters,

were chosen to roughly approximate biological neurons.

Keywords Spiking neural networks  Acrobot 

LQR  Evolution

1 Introduction

We are studying the applicability of spiking neural net-

works (SNNs) to the control of robots with complex

nonlinear morphologies that lead to unstable states

requiring constant active feedback control. This is difﬁcult

using conventional control due to the effort required in

modeling the robot’s equations of motion and deriving a

robust control scheme on the basis of that model.

Theoretical results of Maass [14] have shown that SNNs

(third generation NNs) are computationally more powerful

than standard sigmoid NNs (second generation NNs) or

networks of threshold units (perceptrons or ﬁrst generation

NNs) provided certain conditions hold. Such conditions

require a delay-coded input which linearly maps an analog

input signal to the neuron’s ﬁre time. As a ﬁrst step we

chose not to implement these conditions since they are

biologically less plausible; but they will be considered for

future studies. In the present study we use rate-coded input

which maps an analog input signal to the rate at which the

neuron ﬁres.

Additional motivation to investigate SNNs is that they

are better models to represent the spiking nature of bio-

logical neurons, which are used in the mechanical control

of biological systems. Recently, networks of 10,000

spiking neurons have been used in the large-scale imple-

mentations of the blue brain project [16] to model cortical

columns of the brain. However, spiking neurons not only

occur in the massive networks of the brain but also in

relatively small networks of the peripheral regions of the

nervous system such as reﬂex networks in the limbs [11],

which can be modeled with small networks in the order of

tens of neurons. It is an open question whether small net-

works of spiking neurons can successfully be employed to

control limb movement and reﬂexes of unstable robots,

such as for example, bi-ped robots. The aim of the present

paper is to shed light on these questions by applying small

SNNs to the control of an underactuated, simulated robot.

Primary work has been initiated on implementing SNNs

as controllers for robots governed by simple dynamics,

such as two wheeled robot cars, rather than underactuated

unstable robots governed by highly nonlinear dynamics.

Joshi and Maass [10] successfully used a SNN as the

‘‘liquid’’ of a liquid state machine to control a fully actu-

ated two-link robot arm in the horizontal plane. The

controller was created by learning a ﬁlter which trans-

formed the state of the network to a control signal for the

robot’s motors, while the neurons and synaptic weights

L. Wiklendt (&)  S. Chalup  R. Middleton

School of Electrical Engineering and Computer Science,

The University of Newcastle, Callaghan, NSW 2308, Australia

e-mail: lukasz.wiklendt@studentmail.newcastle.edu.au

123

Neural Comput & Applic

DOI 10.1007/s00521-008-0187-1

remained constant throughout the learning process. Flor-

eano et al. [6] evolved only the structure of a SNN with ten

hidden neurons, while keeping the synaptic weights con-

stant. Even with their greatly simpliﬁed model of spiking

neurons, customized to ﬁt the microcontroller on a small

sugar cube sized two-wheeled robot, the robot was able to

successfully navigate an oval track avoiding walls. French

and Damper [7] assembled a SNN out of two network

components, each evolved to achieve the particular subtask

‘‘frequency discrimination’’ of an overall objective to

control a two-wheeled robot to drive towards ﬂashing lights

of distinct frequencies. Their evolution strategy allowed for

networks of arbitrary cardinality, and neurons and synapses

of various models. Federici [5] created a novel algorithm,

which evolved the rules of development of a SNN that was

used to grow the network from a single cell to its ﬁnal

structure. The technique was applied to a wall-avoidance

task for a two-wheel robot with infrared sensors.

Rather than applying SNNs to stable robotic platforms

such as the ones mentioned previously, we instead apply

them to an unstable robot with highly nonlinear dynamics.

The acrobot [19] is a two-link non-linear underactuated

robotic platform, commonly used as a benchmark for new

artiﬁcial intelligence approaches targeted at dynamics

control. A drawing of the acrobot is shown in Fig. 1. Its

motion resembles that of a gymnast swinging on a hori-

zontal bar, the difference being that unlike the gymnast the

acrobot can spin its q

joint ‘‘hip’’ through complete rev-

olutions. The object of the task is to swing the acrobot from

a hanging-down position to a standing-up position. This is

quite difﬁcult as only the q

joint is actuated. A more

formal description of the acrobot and the swing-up task is

given in Sect. 2.

One of the ﬁrst to approach the acrobot swing-up control

problem was Spong [19]. He created two swing-up strate-

gies based on the partial feedback linearization of the

acrobot dynamics, one for each joint. He applied each

strategy with a linear quadratic regulator (LQR) to balance

the acrobot in the standing position, and both strategies

produced a successful swing-up and balance. Boone [3]

used an N-step lookahead search to select the appropriate

torque which ﬁrst added the required amount of energy to

the acrobot which would allow it to reach its standing-up

position. Once this was achieved, the search objective was

changed to ﬁnd a trajectory which would get the acrobot

close to the standing-up state. Torque output was only one

of two possible extreme values, either positive or negative

1 Nm, which is known as bang-bang control. Allowing

only two possible output values, and limiting the amount of

switching between these values reduced the search to a

practical size. Yoshimoto et al. [21] successfully applied

reinforcement learning to the task, which switched between

one of ﬁve conventional controllers.

The area of acrobot swing-up control is quite advanced

with many other existing successful techniques in addition

to those mentioned previously, including sigmoid NN

function approximation for reinforcement learning [4, 20],

evolving a non-feedback vector of torque values [12], a

fuzzy controller used to increase the acrobot’s energy [13],

output zeroing based on angular momentum and rotation

angle of center-of-mass [17]. Although the primary focus

of this study is the application of SNNs as control models

for complex robotic platforms, it has given some speciﬁc

insights into the techniques which may be used to improve

acrobot swing-up solutions.

This paper is organized as follows. Section 2 describes

the acrobot and the swing-up task used in this study. Sec-

tion 3 describes SNNs and the discrete time model that was

used for simulations. Section 4 describes the network

conﬁguration and the LQR controller used in simulations.

Section 5 presents the evolution and simulation setup. A

discussion of the results is given in Sect. 6, with conclu-

sions following in Sect. 7.

2 The acrobot

The acrobot is composed of two links, an inner and outer

link connected together by an actuated hinge joint, with

their relative angle given by q

. The inner link is anchored

with an unactuated hinge joint, and subtends an angle of q

to the horizontal. The torque applied at the actuated hinge

is given by s

, which is limited to js

js

max

: Angular

velocities of joints q

and q

are

and

: The state of the

acrobot at time t is given by the vector xðtÞ¼

ðq

ðtÞ q

ðtÞ

ðtÞÞ

: There are two notable equilibria,

an unstable one at q

¼ð

000Þ

; the other stable at

¼ð

000Þ

Acrobot Task: Given the initial state x(0) = q

, ﬁnd a

control strategy which will get the acrobot to the ﬁnal state

x(T)=q

, and keep it there as t ? ?.

Fig. 1 The acrobot showing directions for gravity, torque and joint

angles

Neural Comput & Applic

123

In this study T = 20 s was chosen, after a series of pilot

experiments which showed this value to induce a favorable

learning rate without excessive computation time. The

acrobot task is considered a difﬁcult control problem

because the acrobot is underactuated and has highly non-

linear dynamics.

The equations of motion governing the dynamics of the

acrobot are presented in [19]. Masses of the links are given

by m

, lengths l

, and inertia tensors I

, where i = 1,2 for the

inner and outer links, respectively. Acceleration due to

gravity is given by g. The parameters used for simulation

are listed in Table 1.

The acrobot is simulated using a fourth-order Runge-

Kutta integrator [18] with a time step of 1 millisecond.

3 Spiking networks

This section gives an introduction to SNNs, including the

models used in our simulations. The word ‘‘spiking’’ will

often be omitted here when mentioning spiking neurons or

networks.

Neurons and synapses in a SNN are arranged as nodes and

edges of a directed graph, respectively. Neurons send and

receive data via the timing of spike events which are sent

across synapses from a source neuron to a destination neu-

ron. Each neuron i contains its own state u

representing the

exponentially decaying membrane potential of biological

neurons. When a spike reaches a neuron then that neuron’s

membrane potential is momentarily raised (or lowered) in

proportion to the postsynaptic potential function . If the

membrane potential rises above the threshold h then the

neuron ﬁres a spike to its destination neurons, and the neu-

ron’s membrane potential is decreased by g for a refraction

period, thereby preventing more immediate ﬁrings.

In biological NNs spikes require a short amount of time to

travel from their source neuron to the destination neurons.

However, in this study, spikes are not subjected to an explicit

synaptic delay. Synapses have a ‘‘weight’’ value w

, which

determines the amount a spike from a source neuron j

increases a destination neuron’s membrane potential u

.Itis

these weights which are tuned during learning. In future

studies tunable delays will also be used since they have the

potential to increase the computational power of SNNs [14].

The leaky integrate-and-ﬁre model for neurons [8, 9]is

used in our simulations since it is concise, simple to

implement and fast to simulate. To reduce the number of

parameters that need tuning during learning all of the

neurons are made homogeneous. The precise neuron model

is shown in Eqs. (1)–(4).

ðtÞ¼

ðtÞ

gðt  t

Þþ

ðtÞ

ðt  t

Þð1Þ

ðtÞ¼ t

j u

ðt

Þ[ h; t

ð2Þ

gðsÞ¼m exp

s



ð3Þ

ðsÞ¼exp

s



 exp

s



ð4Þ

where F

ðtÞ contains all ﬁring times for any neuron i that

occur before time t, and m, s

, s

are constants which

specify the size and shape of the refraction function g and

the postsynaptic potential function . This neuron model is

approximated in discrete time to simplify implementation

and allow for easy integration with the discrete time sim-

ulation of the acrobot.

We use a discrete-time model governing the dynamics

of a neuron i, given by Eqs. (5)–(8).

ðt þ DtÞ¼Av

ðtÞþBu

ðtÞð5Þ

A ¼

0 d

00d

B ¼

0 m

ðtÞ¼

ðtÞ



ð6Þ

ðtÞ¼

1; if ð1 1  1Þv

ðtÞ[ h

0; otherwise



ð7Þ

ðtÞ¼

ðtÞð8Þ

where d

, d

[ (0, 1), m [ 0, and h C 0. The values

= 0.9, d

= 0.8, d

= 0.9, m = 1, and h = 1 are used in

the simulations presented in this study. Also, a time step of

Dt = 1 ms is used, and t is an integer multiple of Dt. These

values were chosen by hand so that the neurons imitate

(quite roughly) dynamics such as ﬁring rates and post-

synaptic potential durations of biological neurons [11].

The values d

, d

and d

can be derived from s

, s

and

in Eqs. (3) and (4) with the following relation

¼ exp

Dt



; for a 2fl; r; qg: ð9Þ

The state of neuron i is given by the time-varying vector

¼ðv

; where v

(0) = 0. The value of the inner

product (1 -1 -1) v

in Eq. (7) represents the membrane

potential of neuron i, and h is the threshold. When the

membrane potential rises above the threshold then the

neuron ﬁres by setting f

to 1, and the membrane potential

is consequently reduced by the refraction constant m in the

Table 1 Acrobot parameters, where all units are in SI

g s

max

1 1 1 1 1/12 1/12 9.8 10

Neural Comput & Applic

123

next step. The f

term can be thought of as the spiking

‘‘output’’ of neuron i.

The j

term in Eq. (8) is the ‘‘input’’ to neuron i. The

sum is over all source neurons of neuron i, and w

is the

synaptic weight from neuron j to neuron i.Ifw

[ 0 then a

spike arriving at a neuron increases the neuron’s membrane

potential (with delayed onset since d

[ d

[ 0). This

represents an excitatory post-synaptic potential and its

practical purpose is to increase the chance of the neuron

ﬁring. If w

\ 0 then an incoming spike reduces the

membrane potential, which represents an inhibitory post-

synaptic potential and decreases the chance of the neuron

ﬁring.

This model is used for the hidden neurons of the net-

work, that is, neurons that are connected only to other

neurons. Sensor and motor neurons used in our simulations

have a slightly different model as they must also receive

input from and send output to the environment.

3.1 Motor and sensor neurons

A hidden neuron can be used to model a motor neuron,

where the motor neuron’s analog output is given by

ð1  10Þv

ðtÞ: ð10Þ

However this is a verbose way to model a motor neuron,

since the d

and m parameters are no longer used because

the neuron is never required to ﬁre spikes.

A sensor neuron can be modeled by replacing A, B and

in Eqs. (5) and (6) with A

, B

and u

respectively

00 0

00d

0 m

ðtÞ¼

ðtÞ



ð11Þ

where the analog input to a sensor neuron is set to j

;

eliminating equation (8). For brevity the time-varying

analog input to a sensor neuron will be referred to as j.In

simulations the constraint j C 0 is applied. A value of

h = 0 is used for all sensor neurons, since setting h [0

results in a dead-zone for inputs Bh. This form of encoding

analog input into spikes is commonly known as rate

encoding, where the ﬁring rate of a sensor neuron is pro-

portional to the input. Another method for encoding is

called delay encoding where the average ﬁring rate remains

constant and the precise time at which the neuron ﬁres is

proportional to the input.

A notable property of modeling sensor neurons this way

is that their output scales either logarithmicaly or linearly,

depending on the magnitude of the input. Assuming a

constant input j to a sensor neuron, the spiking rate of the

neuron can be calculated by ﬁnding the time between

spikes. The state of the sensor neuron is initialized as if it

had just spiked, and the time it takes for the v

component

of the state v to decay to the next spike is calculated using

the equation d

(m + j)=j, where y is the number of

milliseconds between spikes. Therefore, the spike rate r

(spikes per millisecond) of a sensor neuron for a constant

input j [ 0 is given by the formula

rðjÞ¼

log

mþj



ð12Þ

noting that d

[ (0,1) and m [ 0. For large values of j the

approximation r(j) & kj is observed, where the constant

k [0, since

lim

j!1

rðmjÞ

rðjÞ

¼ m ð13Þ

and for small values of j the approximation rðjÞ

log

ðj=mÞ

is observed. This allows a sensor neuron to remain

sensitive to tiny inputs by having a spiking rate that is

inversely proportional to the log of the input, without an

exponential increase in sensitivity between large inputs.

Although these properties are derived for extreme values of

j, in practice we notice logarithmic proportions for j as

large as 0.01, and linear proportions for j as small as 1,

since d

= 0.9 and m = 1 in our simulations. However, this

property of sensor neurons may be disadvantageous

resulting in excessive output given small inputs. Such

overly sensitive control is visible in the results, particularly

during the acrobot’s balance period (after approx.

3,300 ms) in the bottom plot of Fig. 5b.

4 Controller

A combined SNN and LQR [1] controller was used in this

study to solve the acrobot task. A picture of the network

topology is shown in Fig. 2.

The network contains eight sensor neurons, two for each

element x

[ x of the acrobot state. Each element x

is split

into a positive and negative sensor neuron, whose inputs are

j = x

and j =-x

, respectively. Without this treatment

there would be no sensor information to the network about

negative state values. Sensor neurons in Fig. 2 are labeled

with the acrobot state variable which supplies the input to

that neuron, appended with a ‘‘ + ’’ o r ‘‘-’’ symbol to

discern between positive and negative input signing.

The network contains two motor neurons. The torque

motor neuron sends its output directly to the torque s

, and

is modeled as mentioned in Sect. 3.1. The LQR controller

is activated when it receives positive input (j [ 0) and

deactivated when it receives negative input (j \ 0). Also,

when the LQR neuron is active the torque motor neuron is

Neural Comput & Applic

123

disabled, and the LQR neuron takes control by setting its

own value for s

There are also four hidden neurons which are com-

pletely connected with each other, without loop-back

connections as shown in Fig. 2. They bridge the sensor and

motor neurons and their recursive connections allow for

complex dynamics.

4.1 LQR controller

When the acrobot is near the unstable equilibrium state q

it can be kept there using a LQR [1]. To create the LQR the

acrobot’s equations of motion are linearized about the state

and the control law s

(t)=-Kx(t) is optimized by

minimizing the quadratic cost

J ¼

jjxðtÞq

þ s



dt: ð14Þ

The approximate value of the gain matrix becomes K ¼

ð269:522 67:522 98:966 29:047Þ for the parameters

given in Table 1.

The LQR controller is designed for a linearized version

of the acrobot at the standing-up state q

, which is an

approximation of the acrobot accurate to within only a

small margin of q

. The linearization simpliﬁes the acrobot

model by making, within the equations of motion,

replacements such as sin(e) ? e, cos(e) ?1 and e

?0, for

values of e close to 0. This means the LQR controller does

not function as intended for states too far from q

, which

empirically corresponds to a few degrees from the stand-

ing-up state, and so is incapable of swinging-up and

balancing the acrobot on its own from the initial state q

5 Evolution

This section covers the evolution strategy (ES) used for

evolving the SNN controller. The SNN’s weights were

evolved using a (l + k)-ES [2] for 100 generations,

with l = 10 and k = 70. To elaborate, there was a popu-

lation of 10 parents producing 70 offsprings each

generation, and the top 10 ﬁttest of the combination of

parents and offspring survived to make up the parents for

the next generation. No recombination was used, and

mutation of the synaptic weights occurred with an evolving

standard deviation strategy parameter for each weight. This

approach was chosen, after a series of pilot experiments, in

order to produce successful swing-up strategies without

excessive computation time.

The ﬁtness of an individual was calculated from the cost

J given by the equation

J ¼

210

t¼0

ðxðtÞq

Q ðxð tÞq

Þð15Þ

where the matrix Q was tuned by hand and given by

Q ¼

10 0 0 0

0500

000

ð16Þ

to prioritize q

over q

and angles over angular velocities.

Note, the closer the cost is to 0 the ﬁtter the individual

chromosome, and only the relative ﬁtness of individuals in

the population is important rather than the actual value of

their cost.

The ﬁtness for each individual chromosome was deter-

mined by transcribing it to the weights of the network and

Fig. 2 Network topology and synaptic weights. Positive and negative

weights are shown in black and gray lines, respectively. A synapse’s

thickness is shown in proportion to the size of its weight |w

|, where

the thinnest line represents a weight size of 0.2, and the maximum and

minimum weights are 17.4 and -37.7, respectively. The network’s

synaptic connections have been split into two diagrams for clarity.

The diagram on the left shows connections from the sensor to the

hidden neurons, and from the hidden to the motor neurons. The

diagram on the right shows connections between hidden neurons only

Fig. 3 Stroboscopic sequences of each of the ﬁttest individuals from

5 generations of a single evolution. From top to bottom the

generations are 0th, 5th, 11th, 21st and the elite solution at 41st.

Each row represents a separate 20-s simulation with 50 ms between

frames

Neural Comput & Applic

123

A small spiking neural network with LQR control applied to the acrobot

Figures

Citations

Principles of Neural Science

Optimisation of a fuzzy logic controller using the Bees Algorithm

MBPOA-based LQR controller and its application to the double-parallel inverted pendulum system

Multi-objective design of state feedback controllers using reinforced quantum-behaved particle swarm optimization

Reinforcement Learning in Continuous State- and Action-Space

References

Numerical Recipes in C: The Art of Scientific Computing

Numerical Recipes in C: The Art of Scientific Computing

Principles of Neural Science

Principles of Neural Science

Numerical recipes in C++ : the art of scientific computing

Related Papers (5)

The swing up control problem for the Acrobot

Acrobot control by learning the switching of multiple controllers

On the Continuous Control of the Acrobot via Computational Intelligence

Minimum-time control of the Acrobot

A switch controller design for the acrobot using neural network and genetic algorithm

Frequently Asked Questions (12)

Q1. What have the authors contributed in "A small spiking neural network with lqr control applied to the acrobot" ?

Q2. How can a sensor neuron be encoded?

Q3. What is the use of the leeky integrate-and-fire model for neurons?

Q4. How many spiking neurons have been used in the blue brain project?

Q5. What is the potential of tunable delays?

Q6. What is the area of acrobot swing-up control?

Q7. how can a spiking neural network be used to bias learning towards the trajectory?

Q8. What is the encoding of analog input into spikes?

Q9. What is the acrobot’s position in the Fig. 5b plot?

Q10. How was the fitness of the acrobot determined?

Q11. Who has provided funding for this research?

Q12. How many parents did the fittest survive?