scispace - formally typeset
Open AccessJournal ArticleDOI

Overview of the SpiNNaker System Architecture

Reads0
Chats0
TLDR
Three of the principal axioms of parallel machine design (memory coherence, synchronicity, and determinism) have been discarded in the design without, surprisingly, compromising the ability to perform meaningful computations.
Abstract
SpiNNaker (a contraction of Spiking Neural Network Architecture) is a million-core computing engine whose flagship goal is to be able to simulate the behavior of aggregates of up to a billion neurons in real time. It consists of an array of ARM9 cores, communicating via packets carried by a custom interconnect fabric. The packets are small (40 or 72 bits), and their transmission is brokered entirely by hardware, giving the overall engine an extremely high bisection bandwidth of over 5 billion packets/s. Three of the principal axioms of parallel machine design (memory coherence, synchronicity, and determinism) have been discarded in the design without, surprisingly, compromising the ability to perform meaningful computations. A further attribute of the system is the acknowledgment, from the initial design stages, that the sheer size of the implementation will make component failures an inevitable aspect of day-to-day operation, and fault detection and recovery mechanisms have been built into the system at many levels of abstraction. This paper describes the architecture of the machine and outlines the underlying design philosophy; software and applications are to be described in detail elsewhere, and only introduced in passing here as necessary to illuminate the description.

read more

Content maybe subject to copyright    Report

IEEE TRANSACTIONS ON COMPUTERS, MANUSCRIPT ID 1
Overview of the SpiNNaker system
architecture
Steve B. Furber, Fellow, IEEE, David R. Lester, Luis A. Plana, Senior Member, IEEE, Jim D.
Garside, Eustace Painkras, Steve Temple, and Andrew D. Brown, Senior Member, IEEE
AbstractSpiNNaker (a contraction of Spiking Neural Network Architecture) is a million-core computing engine whose flagship
goal is to be able to simulate the behaviour of aggregates of up to a billion neurons in real time. It consists of an array of ARM9
cores, communicating via packets carried by a custom interconnect fabric. The packets are small (40 or 72 bits), and their
transmission is brokered entirely by hardware, giving the overall engine an extremely high bisection bandwidth of over 5 billion
packets/s. Three of the principle axioms of parallel machine design memory coherence, synchronicity and determinism
have been discarded in the design without, surprisingly, compromising the ability to perform meaningful computations. A further
attribute of the system is the acknowledgment, from the initial design stages, that the sheer size of the implementation will make
component failures an inevitable aspect of day-to-day operation, and fault detection and recovery mechanisms have been built
into the system at many levels of abstraction. This paper describes the architecture of the machine and outlines the underlying
design philosophy; software and applications are to be described in detail elsewhere, and only introduced in passing here as
necessary to illuminate the description.
Index Terms Interconnection architectures, parallel processors, neurocomputers, real-time distributed.
——————————
——————————
1 INTRODUCTION
he SpiNNaker engine [1] is a massively-parallel
multi-core computing system. It will contain up to
1,036,800 ARM9 cores and 7Tbytes of RAM distri-
buted throughout the system in 57K nodes, each node be-
ing a System-in-Package (SiP) containing 18 cores plus a
128Mbyte off-die SDRAM (Synchronous Dynamic Ran-
dom Access Memory). Each core has associated with it
64Kbytes of data tightly-coupled memory (DTCM) and
32Kbytes of instruction tightly-coupled memory (ITCM).
The cores have a variety of ways of communicating
with each other and with the memory, the dominant of
which is by packets. These are 5- or 9-byte (40- or 72-bit)
quanta of information that are transmitted around the
system under the aegis of a bespoke concurrent hardware
routing system.
The physical hierarchy of the system has each node
containing two silicon dies the SpiNNaker chip itself,
plus the Mobile DDR (Double Data Rate) SDRAM, which
is physically mounted on top of the SpiNNaker die and
stitch-bonded to it see Fig. 1. The nodes are packaged
and mounted in a 48-node hexagonal array on a PCB
(Printed Circuit Board), the full system requiring 1,200
such boards. In operation, the engine consumes at most
90kW of electrical power.
This paper will describe architectural and physical de-
sign aspects of the system. Clearly, there are many chal-
lenges associated with the design, construction and use of
a system as large and complex as this the software and
application portfolio will be described in detail elsewhere.
While previous papers have presented aspects of the ar-
chitecture (e.g. [2], [3]; a complete list of SpiNNaker pub-
lications is available on the project web site [1]), the con-
tribution here is to offer a comprehensive overview focus-
ing on the motivation and rationale for the architectural
decisions taken in the design of the machine.
2 HIGH-LEVEL PROJECT GOALS AND BACKGROUND
Multi-core processors are now clearly established as the
way forward on the desktop, and highly-parallel systems
xxxx-xxxx/0x/$xx.00 © 200x IEEE
————————————————
S.B. Furber, D.R Lester, L.A. Plana, J.D. Garside, E. Painkras and S. Tem-
ple are with the School of Computer Science, the University of Manchester,
UK.
A.D. Brown is with Electronics and Computer Science, the University of
Southampton, UK.
Manuscript received 14
th
January 2012.
T
Fig 1: SDRAM stitch-bonded to the underlying SpiNNaker die.
3D packaging by UNISEM (Europe) Ltd.

2 IEEE TRANSACTIONS ON COMPUTERS, MANUSCRIPT ID
have been the norm for high-performance computing for
some while. In a surprisingly short space of time, indus-
try has abandoned the exploitation of Moore’s Law
through ever more complex uniprocessors, and is embrac-
ing the ‘new’ Moore’s Law: the number of processor cores
on a chip will double roughly every 18 months. If pro-
jected over the next 25 years this leads inevitably to the
landmark of a million-core processor system.
Much work is required to understand how to optimize
the scheduling of workloads on such machines, but the
nature of this task is changing: in the past, a large applica-
tion was distributed ‘evenly’ over a few processors and
much effort went into scheduling to keep all of the proc-
essor resources busy; today, the nature of the cost func-
tion is different: processing is effectively a free resource.
Although the automatic parallelization of general-
purpose codes remains a ‘holy grail’ of computer science,
biological systems achieve much higher levels of parallel-
ism, and we turn for inspiration to connectivity patterns
and computational models based on our (extremely lim-
ited) understanding of the brain.
This biological inspiration draws us to two parallel,
synergistic directions of enquiry [4]; significant progress
in either direction will represent a major scientific break-
through:
How can massively-parallel computing resources ac-
celerate our understanding of brain function?
How can our growing understanding of brain func-
tion point the way to more efficient, parallel, fault-
tolerant computation?
We start from the following question: what will hap-
pen when processors become so cheap that there is, in
effect, an unlimited supply of them? The goal is now to
get the job done as quickly and/or energy-efficiently as
possible, and as many processors can be brought into
play as is useful; this may well result in a significant
number of processors doing identical calculations, or in-
deed nothing at all - they are a free resource.
2.1 The mammalian nervous system
The mammalian nervous system by any metric is one
of the most remarkable, effective and efficient structures
occurring in nature. The human brain exhibits massive
parallelism (10
11
neurons), and massive connectivity (10
15
synapses). It consumes around 25W, and is composed of
very low-performance components (neurons ‘behave’ at
up to around 100Hz; the biological interconnect propa-
gates information at speeds of a few ms
-1
). It is massively
tolerant of component-level failure typically a human
will lose neurons at a rate of about 1s
-1
throughout their
adult life [5].
For a computer engineer, the similarities between the
nervous system and a digital system are overwhelming.
The principal component of the nervous system, the neu-
ron [6], is a unidirectional device, connected to its peers
via a single output, the axon. Near its terminal the axon
branches and forms connections (synapses) with the inputs
of its fellow neurons. The input structure of a neuron is
termed the dendritic tree - see Fig. 2. Specialised neurons
interface to muscles (and drive the system ‘actuators’),
and others to various sensors.
2.2 Spiking communication
Most biological neurons communicate predominantly via
an electrochemical impulse known as an action potential
[6]. This is a complex, propagating electrochemical pulse,
supported mainly by transient sodium, potassium, chlo-
ride and electron fluxes, and perturbations of the electro-
chemical impedance to these species in the axon cell
walls. To a zeroth approximation, these impulses can be
viewed as spikes. The size and shape of the spike is
largely invariant, (and, indeed, probably irrelevant) being
determined by local instabilities in the cell membrane
current balance, so a spike can be viewed as a unit im-
pulse that conveys information solely in the time at which
it occurs. It costs the axon energy to transmit an event,
but this is provided by a kind of electrochemical ‘gain’
distributed along the length of the fibre: the net effect is
that again to a zeroth approximation the axon can be
viewed as a lossless dispersion free transmission line,
although it has to have a ‘rest’ just after a pulse has gone
by to ‘charge itself up’ again.
2.3 Point neuron model
SpiNNaker is optimized for what is commonly known as
the ‘point neuron model’ [4], where the details of the den-
dritic structure of the neuron are ignored and all inputs
are effectively applied direct to the soma (the ‘body’ of
the neuron). The inputs arrive in the correct temporal
order more or less but there is no attempt to model the
Fig. 3. T h e c o r r e s p o n d i n g p o i n t n e u r o n m o d e l .
Fig. 2. A biological neuron.

FURBER ET AL.: OVERVIEW OF THE SPINNAKER SYSTEM ARCHITECTURE 3
geometry of the dendritic tree. The abstract synaptic in-
puts are summed to form a net soma input that drives a
system of simple differential equations that compute
when an output spike should be issued.
2.4 Synapses
A synapse is the ‘component’ whereby a spike from one
neuron couples into the input to another neuron. A spike
has unit impulse, but the synapse has a variable efficiency
which is often represented by a numericalweight’ [4]. If
the weight is positive, the synapse is excitatory. If the
weight is negative, the synapse is inhibitory.
The modeling abstraction is summarized in Fig. 3. In
the jargon of electronic circuits, a neural circuit is repre-
sented by a devices-on-devices graph. Biology as one
might expect is vastly more complex than this extreme
abstraction. An unresolved issue is how much of the
complexity is biological artifact, and how much is neces-
sary for the information processing required to support a
viable organism? The performance of electronic circuits is
ultimately dictated by the speed and efficiency with
which the flow of electrons through silicon can be cho-
reographed by the designer and there are physical lim-
its. In biology, the information carriers are more diverse
(ionic species) and they are controlled by an electro-
chemical field gradient. Ions are necessarily big, electro-
chemical fields necessarily small. Nature compensates by
utilizing massive parallelism, but there will always be
huge functional compromises. It is interesting to note that
almost every creature on the planet today utilizes broadly
the same structure for its controlling neural system.
Comprehensive descriptions of the many types of real
neurons and synapses are available elsewhere [6], [7].
2.5 Address Event Representation
The central idea of the standard SpiNNaker execution
model is that of Address Event Representation (AER) [8], [9].
The underlying principle of AER, which is well-
established in the neuromorphic community, is that when
a neuron fires the spike is a pure asynchronous ‘event’.
All of the information is conveyed solely in the time of the
spike and the identity of the neuron that emitted the spike.
In a real-time system, time models itself, so in an AER
system the identity (‘address’) of a neuron that spikes is
simply broadcast at the time that it spikes to all neurons
to which the spiking neuron connects.
In SpiNNaker, AER is implemented using packet-
switched communication and multicast routing. Al-
though the communication system introduces some tem-
poral latency, provided this is small compared with bio-
logical time constants (which in practice means provided
it is well under 1ms) then the error introduced by this
latency is negligible (when modeling biological neural
systems).
2.6 Topological virtualization
Biological neural systems develop and operate in three
dimensions, and both their topologies and geometries are
constrained by their physical structures. SpiNNaker em-
ploys a two-dimensional physical communication struc-
ture, but this in no way limits its capacity to model three-
(or higher-) dimensional networks. Because electronic
communication is effectively instantaneous on biological
time-scales, every neuron in a SpiNNaker system can be
connected to any other neuron with a time delay that
equates to adjacency in the biological three-dimensional
space. Thus the mapping of neurons from the biological
3-D space into the SpiNNaker 2-D network of processors
can be arbitrary any neuron can be mapped to any
processor. In practice, the SpiNNaker model will be more
efficient if the mapping is chosen carefully, and this, in
turn, means mapping physically close neurons into
physically close processors, but this is only a matter of
efficiency and is in no way fundamentally constrained by
the SpiNNaker implementation.
2.7 Time models itself
Biological systems have no central synchronising clock.
Spikes are launched, spikes propagate, spikes arrive
(usually), target neurons react. In a conventional elec-
tronic synchronous system, data is expected to be at the
right place at the right time. If it isn't, the system is bro-
ken. In an asynchronous electronic system, data arrives, is
processed and passed on, and a non-trivial choreography
of request and acknowledge signals ensures that the in-
tegrity of the dataflow is maintained. In biology, data is
transmitted in the hope that most of it will get to the right
place in a timely but strictly undefined manner.
Strangely, it is clear by inspection that it is possible to
create hugely complex systems mammals operating
successfully on this principle.
In SpiNNaker, cores react to packets, process packets,
and optionally emit further packets. These are transmit-
ted to their target by the routing subsystem, to the best of
its ability. If the routing fabric becomes congested an
unpredictable function of the workload packets will, in
the first instance, be re-routed (causing them to arrive
late) or even dropped (if there is no space to hold them).
A design axiom of SpiNNaker is that nothing can ever
prevent a packet from being launched. A consequence of
the effects described above is that not only is the arrival
time of the packets non-deterministic, the packet ordering
is non-transitive.
A single SpiNNaker core is a single ARM9 processor.
This is deterministic and is expected to multiplex the be-
haviour of around 1,000 neurons. The nodes, each con-
taining 18 such cores, cores are equipped with six bi-
directional fast links, and embedded in a communication
mesh see the next section which intelligently redirects
and duplicates packets as necessary. The speed at which
packets are transmitted over the network is about
0.2!s/node hop, all of which means that we can reason-
ably expect the neuron models to react to stimuli on a
wall-clock timescale of ms just like biology.
3 THE TECHNOLOGY LANDSCAPE
There are other approaches to brain modelling with objec-
tives broadly similar to, though approaches rather differ-
ent from, the work described here.

4 IEEE TRANSACTIONS ON COMPUTERS, MANUSCRIPT ID
3.1 BlueBrain
The Blue Brain project at EPFL [10], is bringing together
wet neuroscience with high-performance computing to
deliver high-fidelity computer models of biological neural
systems. The computing resource available to the project
is an IBM Blue Gene supercomputer [11] with very so-
phisticated visualisation facilities.
3.2 SyNAPSE
An IBM project, funded under the DARPA SyNAPSE
programme [12] claims the successful modelling of a neu-
ral network on the scale of a cat cortex (which is around a
billion neurons with 10
13
synapses).
3.3 Izhikevich
Eugene Izhikevich, at the Neuroscience Research Institute
in San Diego, developed a 100 billion neuron model based
on the mammalian thalamo-cortical system [13], [14]. One
second of simulation took 50 days on a 27-processor Be-
owulf cluster.
3.4 Issues
These major projects demonstrate the debate (that is as
yet unresolved within the brain modelling research com-
munity): to what extent are the finer details of biological
neurons essential to the accurate modelling of the infor-
mation processing capabilities of the brain, and to what
extent can they be ignored as artifacts resulting from the
evolutionary development of the biological neuron and
its need to grow and find energy?
The SpiNNaker architecture is biased towards the
simpler side of this debate the machine is optimised for
simple point neuron models and it is capable of model-
ling very complex networks of these simple models.
The principal differentiator of the SpiNNaker project
from other large-scale neural models is our objective to
run in biological real time. None of the above systems are
close to this goal, but we believe this to be essential if the
neural experiments are to benefit from ‘embodiment’ by
integration with robotic systems.
Other approaches to large-scale neural modelling are,
of course, possible, for example using GPGPUs or FPGAs.
It is difficult with such approaches to achieve the balance
of computation, memory hierarchy and communication
that SpiNNaker achieves, though of course they do avoid
the high development cost of the bespoke chip approach.
4 ARCHITECTURE OVERVIEW
4.1 Overview
A block diagram of a single SpiNNaker node is shown in
Fig. 4. The six communications links are used to connect
the nodes in a triangular lattice; this lattice is then folded
onto the surface of a toroid, as in Fig. 5. Other tilings are
obviously possible; this design decision was guided by
the pragmatics of assembling the system onto a set of two
dimensional printed circuit boards.
Fig. 4. A SpiNNaker node.
Fig. 6. The SpiNNaker die.

FURBER ET AL.: OVERVIEW OF THE SPINNAKER SYSTEM ARCHITECTURE 5
Fig. 6 depicts the individual SpiNNaker die. Each chip
contains 18 identical processing subsystems (ARM cores).
The die is fabricated by UMC on a 130nm CMOS process,
and was designed using Synopsys, Inc., synthesis tools
for the clocked subsystems and Silistix Ltd tools and li-
braries for the self-timed on-chip and inter-chip networks.
At start-up, following self-test, one of the processors is
elected to a special role as Monitor Processor, achieved by a
deliberate hardware race, and thereafter performs system
management tasks. The other processors are available for
application processing; normally 16 will be used to sup-
port the application and one reserved as a spare for fault-
tolerance and manufacturing yield-enhancement pur-
poses.
The router is responsible for routing neural event
packets both between the on-chip processors and from
and to other SpiNNaker nodes. The Tx and Rx interface
components (Fig. 4) are used to extend the on-chip Com-
munications NoC (Network-on-Chip) to other SpiNNaker
chips. Inputs from the various on- and off-chip sources
are assembled into a single serial stream which is then
passed to the router.
Various resources are accessible from the processor
systems via the System NoC. Each of the processors has
access to the shared off-die SDRAM, and various system
components also connect through the System NoC in
order that, whichever processor is the monitor, it will
have access to these components.
4.2 Quantitative drivers
The SpiNNaker architecture is driven by the quantitative
characteristics of the biological neural systems it is de-
signed to model. The human brain comprises in the re-
gion of 10
11
neurons; the objective of the SpiNNaker work
is to model 1% of this scale, which amounts to a billion
neurons. This corresponds approximately to 10% of the
human cortex, or ten complete mouse brains. Each neu-
ron in the brain connects to thousands of other neurons.
The mean firing rate of neurons is below 10 Hz, with the
peak rate being 100s of Hz. These numerical points of
reference can be summarized in the following deductions:
10
9
neurons, mean fan in/out 10
3
=> 10
12
synapses.
10
12
synapses, ~4 bytes/synapse => 4x10
6
Mbytes.
10
12
synapses switching at ~10Hz => 10
13
connections/s.
10
13
conn/s, 20 instr/conn => 2x10
8
MIPS.
2x10
8
MIPS, ~200MHz ARM => 10
6
ARMs.
So 10
9
neurons need 10
6
ARMs, whence:
1 ARM at ~200MHz => 10
3
neurons.
1 node: 16 ARM968 + 64MB => 1.6x10
4
neurons.
6 x10
5
nodes, 1.6x10
4
neur/node => 10
9
neurons.
The above numbers all assume each neuron has 1,000
inputs. In biology, this number varies from 1 to of the
order of 10
5
, and it is probably most useful to think of
each ARM being able to model about 1M synapses, so it
can model 100 neurons each with 10,000 inputs, and so
on.
The system will be inefficient unless there is some
commonality across the inputs to the set of neurons mod-
eled on a processor, so that each input event typically
connects to tens or hundreds of neurons modeled by a
processor. In biology, connections tend to be sparse, so,
for example, a processor could model 1000 neurons each
of which connects to a random 10% of the 10
4
inputs that
are routed to the processor. The standard model assumes
sparse connectivity.
4.3 Routing
With a billion neurons a 32-bit address is (more than) suf-
ficient. The AER packets incur a small overhead for con-
trol purposes, which amounts to one byte in the current
design. This is generally transparent to the software run-
ning on the ARM cores and exists only while the packet is
in transit. Since spike events are unit impulses, all the
packet need carry is the control byte and the 32-bit ID of
the neuron that fired. SpiNNaker packet formats support
an optional 32-bit data payload in addition, but that is not
used for neural system modeling directly. The payload
will be used for other applications and for debug and di-
agnostics. Thus the communication traffic generated by
one node is:
1.6x10
4
neurons x 10Hz x 5 bytes => 0.8Mbyte/s.
Each chip incorporates a router that implements AER-
based routing of neural spike-event packets. The total
traffic from neurons modeled by the processors on the
same chip as the router averages 1.6x10
5
packets/s, which
is undemanding, although the router also handles incom-
ing and passing traffic.
4-bit symbols @ 60MHz/link => 6x10
6
pkts/s
6 incoming links => 3.6x10
7
pkts/s
So a router operating at 100MHz processing one packet
per clock cycle can easily handle all local, incoming and
passing traffic.
4.4 Bisection bandwidth
If a 57K-node system is organized in such a way that all
of the neurons in one half are connected to at least one
neuron in the other half, the traffic across the border from
one half to the other is 29K x 160K = 4.6G packets/s. The
border is 480 nodes long (assuming a square layout,
mapped to a toroid), so each node must carry 10M pack-
ets/s, which is well within the capacity of the router, and
960 links connect the two halves, each carrying 5M pack-
ets/s, which is again within a link’s capability.
5 SYSTEM COMPONENTS
The routing subsystem, which is a crucial component of
SpiNNaker, is described in section 7.
5.1 ARM968
The ARM subsystem [15] organisation is shown in Fig. 7.
The system is memory-mapped (see section 6), and the
map for the ARM968 spans a number of devices and

Citations
More filters

Neuroscience 細胞死:最近の知見

廣瀬雄一
TL;DR: In this paper, the authors describe a scenario where a group of people are attempting to find a solution to the problem of "finding the needle in a haystack" in the environment.
Journal ArticleDOI

Neurogrid: A Mixed-Analog-Digital Multichip System for Large-Scale Neural Simulations

TL;DR: Neurogrid as discussed by the authors is a real-time neuromorphic system for simulating large-scale neural models in real time using 16 Neurocores, including axonal arbor, synapse, dendritic tree, and soma.
Journal ArticleDOI

The SpiNNaker Project

TL;DR: SpiNNaker as discussed by the authors is a massively parallel million-core computer whose interconnect architecture is inspired by the connectivity characteristics of the mammalian brain, and which is suited to the modeling of large-scale spiking neural networks in biological real time.
Journal ArticleDOI

Training Deep Spiking Neural Networks Using Backpropagation.

TL;DR: In this paper, the membrane potentials of spiking neurons are treated as differentiable signals, where discontinuities at spike times are considered as noise, which enables an error backpropagation mechanism for deep spiking neural networks.
Proceedings Article

The SpiNNaker project

Steve Furber
TL;DR: The current state of the spiking neural network architecture project is reviewed, and the real-time event-driven programming model that supports flexible access to the resources of the machine and has enabled its use by a wide range of collaborators around the world is presented.
References
More filters

Neuroscience 細胞死:最近の知見

廣瀬雄一
TL;DR: In this paper, the authors describe a scenario where a group of people are attempting to find a solution to the problem of "finding the needle in a haystack" in the environment.
Journal ArticleDOI

The blue brain project.

TL;DR: It is argued that the time is right to begin assimilating the wealth of data that has been accumulated over the past century and start building biologically accurate models of the brain from first principles to aid the understanding of brain function and dysfunction.
Journal ArticleDOI

Large-scale model of mammalian thalamocortical systems

TL;DR: The model exhibits behavioral regimes of normal brain activity that were not explicitly built-in but emerged spontaneously as the result of interactions among anatomical and dynamic processes, including spontaneous activity, sensitivity to changes in individual neurons, emergence of waves and rhythms, and functional connectivity on different scales.
Journal ArticleDOI

PyNN: A Common Interface for Neuronal Network Simulators.

TL;DR: PyNN increases the productivity of neuronal network modelling by providing high-level abstraction, by promoting code sharing and reuse, and by providing a foundation for simulator-agnostic analysis, visualization and data-management tools.
Book

Fault-Tolerant Systems

TL;DR: This book is the first book on fault tolerance design with a systems approach and offers comprehensive coverage of both hardware and software fault tolerance, as well as information and time redundancy.
Related Papers (5)