scispace - formally typeset
Open AccessJournal ArticleDOI

CAVIAR: A 45k Neuron, 5M Synapse, 12G Connects/s AER Hardware Sensory–Processing– Learning–Actuating System for High-Speed Visual Object Recognition and Tracking

Reads0
Chats0
TLDR
CAVIAR is a massively parallel hardware implementation of a spike-based sensing-processing-learning-actuating system inspired by the physiology of the nervous system that achieves millisecond object recognition and tracking latencies.
Abstract
This paper describes CAVIAR, a massively parallel hardware implementation of a spike-based sensing-processing-learning-actuating system inspired by the physiology of the nervous system. CAVIAR uses the asynchronous address-event representation (AER) communication framework and was developed in the context of a European Union funded project. It has four custom mixed-signal AER chips, five custom digital AER interface components, 45 k neurons (spiking cells), up to 5 M synapses, performs 12 G synaptic operations per second, and achieves millisecond object recognition and tracking latencies.

read more

Content maybe subject to copyright    Report

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 9, SEPTEMBER 2009 1417
CAVIAR: A 45k Neuron, 5M Synapse, 12G
Connects/s AER Hardware Sensory–Processing–
Learning–Actuating System for High-Speed
Visual Object Recognition and Tracking
Rafael Serrano-Gotarredona, Matthias Oster, Patrick Lichtsteiner, Alejandro Linares-Barranco,
Rafael Paz-Vicente, Francisco Gómez-Rodríguez, Luis Camuñas-Mesa, Raphael Berner, Manuel Rivas-Pérez,
Tobi Delbrück, Shih-Chii Liu, Rodney Douglas, Philipp Häfliger, Gabriel Jiménez-Moreno, Anton Civit Ballcels,
Teresa Serrano-Gotarredona
, Member, IEEE, Antonio J. Acosta-Jiménez, and Bernabé Linares-Barranco
Abstract—This paper describes CAVIAR, a massively par-
allel hardware implementation of a spike-based sensing–pro-
cessing–learning–actuating system inspired by the physiology of
the nervous system. CAVIAR uses the asychronous address–event
representation (AER) communication framework and was de-
veloped in the context of a European Union funded project. It
has four custom mixed-signal AER chips, five custom digital
AER interface components, 45k neurons (spiking cells), up to
5M synapses, performs 12G synaptic operations per second, and
achieves millisecond object recognition and tracking latencies.
Manuscript received June 29, 2008; revised November 11, 2008 and April 06,
2009; accepted April 24, 2009. First published July 24, 2009; current version
published September 02, 2009. This work was supported by the European Com-
mission under Grant IST-2001-34124 (CAVIAR). The work of R. Serrano-Go-
tarredona was supported by the Spanish Ministry of Education and Science
under FPU scholarship. The work of L. Camuñas-Mesa was supported by the
Spanish Ministry of Education and Science under FPI scholarship. The work of
S.-C. Liu and T. Delbrück was supported by the Institute of Neuroinformatics
(INI), ETH Zürich/University of Zürich, Zürich, Switzerland and some fabrica-
tion costs were paid by Austria Research Corporation.
R. Serrano-Gotarredona was with the Consejo Superior de Investigaciones
Cientificas, Seville Microelectronics Institute, Seville 41012, Spain. He is now
with the Austriamicrosystems, Valencia, Spain (e-mail: rserrano@imse.cnm.
es).
M. Oster was with the Institute of Neuroinformatics (INI), ETH Zürich/Uni-
versity of Zürich, Zürich CH-8057, Switzerland. He is now with the Varian
Medical Systems, Baden CH-5405, Switzerland (e-mail: matthias.oster@gmail.
com).
P. Lichtsteiner was with the Institute of Neuroinformatics (INI), ETH Zürich/
University of Zürich, Zürich CH-8057, Switzerland. He is now with the Es-
pros Photonics Corporation, Baar CH-6340, Switzerland (e-mail: patrick.licht-
steiner@espros.ch).
A. Linares-Barranco, R. Paz-Vicente, F. Gómez-Rodríguez, M. Rivas-Pérez,
G. Jiménez-Moreno, and A. Civit Ballcels are with the Computer Archi-
tecture and Technology Department, University of Seville, Seville 41012,
Spain (e-mail: alinares@atc.us.es; rpaz@atc.us.es; gomezroz@atc.us.es;
mrivas@us.es; gaji@atc.us.es; civit@atc.us.es).
L. Camuñas-Mesa, T. Serrano-Gotarredona, A. Acosta-Jiménez, and B.
Linares-Barranco are with the Consejo Superior de Investigaciones Ci-
entificas, Seville Microelectronics Institute, Seville 41092, Spain (e-mail:
luiscamu@imse.cnm.es; terese@imse.cnm.es; acojim@imse.cnm.es;
bernabe@imse.cnm.es).
R. Berner, T. Delbrück, S.-C. Liu, and R. Douglas are with the Insti-
tute of Neuroinformatics (INI), ETH Zürich/University of Zürich, Zürich
CH-8057, Switzerland (e-mail: bernerr@ee.ethz.ch; tobi@ini.phys.ethz.ch;
shih@ini.phys.ethz.ch; rjd@ini.phys.ethz.ch).
P. Häfliger is with the Informatics, University of Oslo, Oslo NO-0316,
Norway (e-mail: hafliger@ifi.uio.no).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TNN.2009.2023653
Index Terms—Address–event representation (AER), neuromor-
phic chips, neuromorphic systems, vision.
I. INTRODUCTION
B
RAINS perform powerful and fast vision processing in a
way conceptually different from that of machine vision
systems. Machine vision systems process sequences of still
frames from a camera. For performing scale- and rotation-in-
variant 3-D object recognition, for example, sequences of
computationally demanding operations need to be performed
on each acquired frame. The computational power and speed
required for such tasks make it difficult to develop real-time
autonomous systems for such applications.
On the other hand, vision sensing and object recognition in
brains are performed without using the “frame” concept, at least
not in the usual sense of implying a fixed-rate sequence of still
images. Throughout this paper, we intentionally avoid the use
of the expression “image processing, because in our hardware
technology, there never is an “image” or a “frame, but rather a
continuous flow of visual information in the form of temporal
spikes.
The visual cortex is structured as a sequence of layers (8–10
layers in the human cortex [1], [13]), starting from the retina,
which does its own preprocessing in a more compact and analog
architecture. Although cortex has massive feedback and recur-
rent connections, it is known that a very fast and purely feed-
forward recognition path exists within the ventral stream of the
visual cortex [1], [2]. Here we exploited this feedforward path
concept to build a fast vision recognition system. A concep-
tual block diagram of such a cortically inspired feedforward
hierarchically structured autonomous system for sensing/pro-
cessing/decision–actuation can be seen in Fig. 1(a) [1]–[11].
The pattern of connectivity in cortex follows a basic structure:
each neuron in a layer connects to a “cluster of neurons” or “pro-
jective field” in the next layer [12], [13].
In most cases, these projective fields can be approximated by
computing 2-D convolutions. A single layer of a single con-
volution kernel can detect and localize a preprogrammed or
prelearned object, independent of its position. Using multiple
1045-9227/$26.00 © 2009 IEEE
Authorized licensed use limited to: MAIN LIBRARY UNIVERSITY OF ZURICH. Downloaded on March 06,2010 at 10:45:19 EST from IEEE Xplore. Restrictions apply.

1418 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 9, SEPTEMBER 2009
kernels of different sizes and rotations can make the compu-
tation scale and rotation invariant. Multilayered convolutional
networks are capable of complex object recognition [3]–[8].
Spiking neurons receive synaptic input from other cells in the
form of electrical spikes, and they autonomously decide when
to generate their own output spikes. Hardware that combines
spike-based multineuron modules to compute projective fields
can enable powerful and fast frame-free vision processing. If the
components generate short-latency meaningful, nonredundant
spikes, then spike-based systems can efficiently compute “on-
demand” compared to conventional approaches. The processing
delay depends mainly on the number of layers, and not on the
complexity of objects and shapes to be recognized. Their latency
and throughput are not limited by a conventional sampling
rate.
In recent years, significant progress has been made towards
the understanding of the computational principles exploited by
visual cortex. Many artificial systems that implement bioin-
spired software models use biological-like (convolution-based)
processing that outperforms more conventionally engineered
machines [3]–[11], [14]–[17]. However, these systems gen-
erally run at extremely low speeds because the models are
implemented as software programs on conventional computers.
For real-time solutions, direct hardware implementations of
these models are required. However, hardware engineers face
a large hurdle when trying to mimic the bioinspired layered
structure and the massive connectivity within and between
layers. A growing number of research groups worldwide are
mapping some of these computational principles onto real-time
spiking hardware through the development and exploitation of
the so-called address–event representation (AER) technology.
In this paper, we report on the results of our European Union
consortium project “Convolution AER Vision Architecture for
Real-Time” (CAVIAR), where the largest ever built multichip
multilayer AER real-time frame-free vision system to date has
been developed.
The purpose of this paper is to introduce to various commu-
nities, including computational neuroscience and machine vi-
sion, the promising and effective AER hardware technology that
allows the construction of modular, multilayered, hierarchical,
and scalable (visual) sensory processing learning and actuating
systems. Throughout this paper, we will illustrate the power and
potential of the AER hardware technology through the demon-
strator assembled in the CAVIAR project.
The AER is a spike-based representation technique for
communicating asynchronous spikes between layers of neurons
in different chips. The spikes in AER are carried as addresses of
sending or receiving neurons on a digital bus. Time “represents
itself” as the asynchronous occurrence of the event. AER was
first proposed in 1991 by Mead’s Lab at California Institute
of Technology (Caltech, Pasadena) [24]–[28], and has been
used since then by a wide community of hardware engineers.
Unarbitrated and simpler event readout have been used [29],
[30], and more elaborate and efficient arbitrated versions have
also been proposed, based on winner-take-all (WTA) [31],
or the use of arbiter trees [32], which have evolved to row
parallel [33] and burst-mode word-serial [34]–[36] readout
schemes by Boahen’s Lab. The AER has been used in image
and vision sensors, for simple light intensity to frequency
transformations [38], time-to-first-spike codings [40]–[42],
foveated sensors [43], [44], spatial contrast sensors [23],
[45], temporal intensity difference [39] and temporal contrast
sensors [19], [20], and motion sensing and computation systems
[46]–[50]. AER has also been used for auditory systems
[51]–[53], competition and WTA networks [54]–[56], and
even for systems distributed over wireless networks [57]. For
AER-based 2-D convolution, Vernier
et al. [58] and Choi et al.
[59] reported on 2-D convolution chips with hard-wired elliptic
or Gabor-shaped kernels for orientation extraction. AER has
made it feasible to emulate large scale neurocortical-like
multilayered realistic structures since the development of
scalable and reprogrammable kernel 2-D convolution chips,
either with some minor restrictions on symmetry [60], or
without any restrictions on shape or size [18]. Of great
importance for the spread and success of AER systems has also
been the availability of open-source reusable silicon IP [37], a
better understanding by the community of asynchronous logic
design, and the development of conventional synchronous
interfacing logic and computer interfaces [61]–[64].
In CAVIAR, an AER infrastructure was developed to support
a set of AER modules (chips and interfaces) [Fig. 1(b)] that are
connected in series and parallel to embody the abstract layered
architecture in Fig. 1(a). The following modules were devel-
oped: 1) a temporal contrast retina (motion sensing camera)
chip; 2) a programmable kernel 2-D convolution processing
chip; 3) a 2-D WTA object chip; 4) spatio–temporal processing
and learning chips; 5) AER remapping, splitting, and merging
field-programmable gate array (FPGA)-based modules; and
6) computer–AER interfacing FPGA modules for generating
and/or capturing AER. These modules were then used for
building a multilayer artificial vision demonstrator system for
detecting and tracking balls moving at high speeds.
The overall architecture of the CAVIAR vision system is il-
lustrated in Fig. 1(b) and in more detail in Fig. 13. Moving
objects in the field of view of the retina cause spikes. Each
spike from the retina causes a splat of each convolution chip’s
kernel onto its own integrator array. When the integrator array
pixels exceed positive or negative thresholds they in turn emit
spikes. In the CAVIAR system experiments, we generally used
circular kernels such as the ones in Fig. 3(c) and (d), which de-
tect circular objects of particular sizes. The resulting convolu-
tion spike outputs are noise filtered by the WTA object chip.
The WTA output spikes, whose addresses represent the loca-
tion of the “best” circular object, are fed into a configurable
delay line chip that spreads time into space. This spatial pat-
tern of temporal delayed spikes is then learned by the learning
chip. The WTA spikes also control a mechanical or electronic
tracking system that stabilizes the programmed object in the
field-of-view center.
The rest of this paper is structured as follows. Section II
describes the temporal contrast retina, Section III the pro-
grammable kernel 2-D convolution chip, Section IV the 2-D
WTA chip, Section V the learning chips, Section VI the
different interfaces, and finally, Section VII describes the com-
plete CAVIAR vision system and shows experimental results.
Section VIII concludes the paper and gives future outlooks.
Authorized licensed use limited to: MAIN LIBRARY UNIVERSITY OF ZURICH. Downloaded on March 06,2010 at 10:45:19 EST from IEEE Xplore. Restrictions apply.

SERRANO-GOTARREDONA et al.: CAVIAR: A 45K NEURON, 5M SYNAPSE, 12G CONNECTS/S AER HARDWARE 1419
Fig. 1. CAVIAR system overview. (a) A bioinspired system architecture performing feedforward sensing
+
processing
+
actuation tends to have the following
conceptual hierarchical structure: 1) a sensing layer; 2) a set of low-level processing layers usually implemented through projection fields (convolutions) for feature
extraction and combination; 3) a set of high level processing layers that operate on “abstractions” and progressively compress information through, for example,
dimension reduction, competition, and learning; 4) once a reduced set of signals/decisions is obtained they are conveyed to (usually mechanical) actuators. (b)
The CAVIAR system components and multilayer architecture; an example output of each component is shown in response to the rotating stimulus and the basic
functionality is illustrated below each chip component.
TABLE I
T
EMPORAL CONTRAST VISION SENSOR PROPERTIES ADAPTED FROM [20]
II. AER T
EMPORAL CONTRAST
RETINA
The temporal contrast silicon retina is an asynchronous vi-
sion sensor that emits spike address–events (AEs) (Fig. 2 and
Table I) [19], [20]. Each AE from the chip is the address
of a
pixel and signifies that the log intensity at pixel
changed by an
amount
since the last event from that pixel. is a global event
threshold that we typically set to about 15% contrast. In addi-
tion, one bit of the address encodes the sign of the change (
ON
or
OFF). This representation of “change in log intensity” gen-
erally encodes scene reflectance change. The compressive log-
arithmic transformation in each pixel allows for wide dynamic
range operation (120 dB, compared with for example, 60 dB
for a high-quality traditional image sensor). This wide dynamic
TABLE II
C
ONVOLUTION CHIP
PROPERTIES
range means that the sensor can be used with uncontrolled nat-
ural lighting. The asynchronous response property also means
that the events have a latency down to 15
s with bright lighting
and typically about 1 ms under indoor illumination, resulting in
an effective frame rate of typically several kilohertz. The tem-
poral redundancy reduction greatly reduces the output data rate
for scenes in which most pixels are not changing. The design of
the pixel also allows for unprecedented uniformity of response:
the mismatch between pixel contrast thresholds is 2.1% con-
trast. The event threshold can be set down to 10% contrast, al-
lowing the device to sense natural scenes rather than only artifi-
cial high-contrast stimuli. The vision sensor also has integrated
digitally controlled biases that greatly reduce chip-to-chip vari-
ation in parameters and temperature sensitivity [21].
Authorized licensed use limited to: MAIN LIBRARY UNIVERSITY OF ZURICH. Downloaded on March 06,2010 at 10:45:19 EST from IEEE Xplore. Restrictions apply.

1420 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 9, SEPTEMBER 2009
Fig. 2. Temporal contrast silicon retina vision sensor. (a) Silicon retina USB2
system. The vision sensor with its lens and USB2.0 interface. (b) Chip micro-
graph. A die photograph labeled with the row and column from a pixel that
generates an event with
x
,
y
, type output, where type is ON or OFF. (c) Simpli-
fied pixel core schematic that responds with events to fixed-size changes of log
intensity. (d) Principle of operation. How the
ON and OFF events are internally
represented and output in response to an input signal. Figure adapted from [20].
III. AER PROGRAMMABLE KERNEL 2-D CONVOLUTION CHIP
The convolution chip is an AER transceiver with an array
of event integrators, already reported elsewhere [18]. Table II
summarizes the chip performance figures and specifications. For
each incoming event, integrators within a projection field around
the addressed pixel compute a weighted event integration. The
weight of this integration is defined by the convolution kernel
[18], [60]. Each incoming event computation splats the kernel
onto the integrators.
Fig. 3(a) shows the block diagram of the convolution chip.
The main parts of the chip are as follows.
1) An array of 32
32 pixels where each pixel contains a
binary weighted signed current source and an integrate-
and-fire signed integrator. The current source is controlled
by the kernel weight read from the RAM and stored in a
dynamic register for each input event.
2) A 32
32 kernel static RAM where each kernel weight
value is stored with signed 4-b resolution.
3) A digital controller that handles all sequence of operations.
4) For each incoming event, a monostable generates a pulse of
fixed duration that enables the integration simultaneously
in all active pixels.
Fig. 3. Convolution chip. (a) Architecture of the convolution chip. (b) Mi-
crophotograph of fabricated chip. (c) Kernel for detecting circumferences of
radius close to four pixels and (d) close to nine pixels.
5) An -neighborhood block that performs a displacement of
the kernel in the
direction.
6) Arbitration and decoding circuitry that generate the output
AEs and which uses Boahen’s burst mode word parallel
AER [33].
The chip operation sequence is as follows.
1) The digital control block stores the
address of an
incoming event and acknowledges reception of the event
through the
and signals.
2) The control block computes the
-displacement that has to
be applied to the kernel and the limits in the
addresses
where the kernel has to be copied.
3) The control block copies the kernel from the kernel RAM
row by row to the corresponding rows in the pixel array.
4) The control block activates the generation of a monostable
pulse. This way, in each pixel a current weighted by the
corresponding kernel weight is integrated during a fixed
time interval.
5) Kernel weights in the pixels are erased.
A pixel (Fig. 4) contains two digitally controlled pulsing
current sources (pulsing CDAC) which provide a current pulse
of fixed width [equal to the width of the signal “event pulse”
coming from the monostable in Fig. 3(a)] and amplitude de-
pendent on the kernel weight stored in the dynamic register
Authorized licensed use limited to: MAIN LIBRARY UNIVERSITY OF ZURICH. Downloaded on March 06,2010 at 10:45:19 EST from IEEE Xplore. Restrictions apply.

SERRANO-GOTARREDONA et al.: CAVIAR: A 45K NEURON, 5M SYNAPSE, 12G CONNECTS/S AER HARDWARE 1421
Fig. 4. Simplified block diagram of convolution chip pixel.
“weight” in Fig. 4. Depending on the combination of kernel
weight sign and input event sign, the current pulse has to
be positive (provided by CDACp) or negative (provided by
CDACn). The current of each CDAC is proportional to a locally
trimmable current (
or ) to compensate for interpixel
mismatches. Calibration values are loaded from an external
source over a serial interface. Current pulses are integrated
onto a capacitor, whose voltage is monitored by two compara-
tors. If an upper (lower) threshold
is reached, the
pixel sends a positive (negative) output event, and resets the
capacitor voltage to the intermediate resting level. This event is
arbitrated and decoded in the periphery of the chip. In parallel,
all pixels receive a periodic signal “forgetting pulse” which
discharges (charges) the capacitor voltage to the intermediate
resting voltage if CapSign is high (low), by generating fixed
amplitude current pulses at CDACn (CDACp).
Both the size of the pixel array and the size of the kernel
storage RAM are 32
32. The input address space can be up to
128
128 (14 b) and the chip is programmed to receive input
from a part of this space. Fig. 3(b) shows the microphotograph
of the fabricated chip. AER events can be fed-in up to a peak
rate of 50 million events per second (Meps). The chip can gen-
erate output events at a maximum rate of 25 Meps. Input event
throughput depends on kernel size and internal clock frequency.
The event cycle time is given by
, where
is the number of programmed kernel lines (from 1 to 32) and
is the internal clock period. The internal clock is tunable
and could be set up to 200 MHz (
5 ns) before ob-
serving operation degradation although in our setup we gener-
ally used 100 MHz. Maximum sustained input event throughput
can, therefore, vary between 33 Meps for a one line kernel down
to 3 Meps for a full 32 line kernel. Further details are given in
Table II and elsewhere [18].
Each convolution chip can process an input space of up to
128
128 pixels, but can produce outputs for only 32 32
pixels. This is useful for multichip assembly. For example,
Fig. 5 illustrates how an array of 4
4 chips, each with 32 32
pixels, could be used to process a visual input of 128
128
pixels. Each chip stores into internal registers its own limit
coordinates
within the total 128 128
pixel space. All chips share the same input AER bus (this is
done in practice using AER splitters). Maximum kernel size
can be 31
31 (i.e., ; see Fig. 5), which means
Fig. 5. Multichip assembly of convolution chips. All chips “see” the same input
space (up to 128
2
128 pixels), but each chip can process only 32
2
32 pixels.
Each pixel stores its limit coordinates (
x
,
x
,
y
, and
y
). In general,
when an event is received at coordinate
(
x ;y
)
, up to four chips process it.
that pixels up to 30 positions apart from a chip might need to
be processed by it. For example, in Fig. 5, we can see how an
event with address
is processed simultaneously by four
neighboring chips. The output events produced by all chips are
merged on a single AER bus by an external merger.
For the vision system described in Section VII, we assembled
four convolution chips on a single printed circuit board (PCB).
The PCB has one AER input bus connector and one AER output
bus connector. The input bus goes to a 1-to-4 splitter, imple-
mented on a complex programmable logic device (CPLD) chip,
that feeds the input AER ports of the four chips. The chips’
output AER ports connect to a merger circuit, implemented on
another CPLD circuit, whose output goes to the PCB output
AER connector. The four chips can be programmed to “see” the
same input space and each compute a different 2-D filter (con-
volution) on the same 32
32 pixel space, or the four chips can
be programmed to process the same kernel while operating on
an expanded 64
64 pixel space. In Section VII, we used this
latter option, so that the PCB would work as one single convolu-
tion processor of array size 64
64, and maximum kernel size
of 31
31.
IV. AER 2-D WTA C
HIP
The AER WTA transceiver chip [66]–[70] is designed to si-
multaneously determine the “what” and “where” of the convo-
lution chip outputs. The “whats” are the best matched features
in the case of multiple convolution chips, each with a different
kernel, and the “wheres” are the spatial locations of these fea-
tures (Fig. 6 and Table III). The WTA chip implements this
feature competition using four populations of spiking neurons
which receive the outputs of individual convolution chips and
computes the winner (strongest input) in two dimensions. First,
it performs a WTA operation on the inputs from a feature map
to determine the strongest input (which codes the location of the
strongest feature in the feature map), and second, it performs a
second level of WTA operation on the sparse feature maps to
determine the strongest feature out of all preprogrammed fea-
tures. The parameters of the network are configured so that it
implements a hard WTA with only one neuron active at a time.
The spike rate of the winning neuron is proportional to its input
Authorized licensed use limited to: MAIN LIBRARY UNIVERSITY OF ZURICH. Downloaded on March 06,2010 at 10:45:19 EST from IEEE Xplore. Restrictions apply.

Citations
More filters
Journal ArticleDOI

Deep learning in neural networks

TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.
Proceedings ArticleDOI

Convolutional networks and applications in vision

TL;DR: New unsupervised learning algorithms, and new non-linear stages that allow ConvNets to be trained with very few labeled samples are described, including one for visual object recognition and vision navigation for off-road mobile robots.
Journal ArticleDOI

The SpiNNaker Project

TL;DR: SpiNNaker as discussed by the authors is a massively parallel million-core computer whose interconnect architecture is inspired by the connectivity characteristics of the mammalian brain, and which is suited to the modeling of large-scale spiking neural networks in biological real time.
Proceedings Article

The SpiNNaker project

Steve Furber
TL;DR: The current state of the spiking neural network architecture project is reviewed, and the real-time event-driven programming model that supports flexible access to the resources of the machine and has enabled its use by a wide range of collaborators around the world is presented.
Proceedings ArticleDOI

Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing

TL;DR: In this paper, a set of optimization techniques to minimize performance loss in the conversion process for convolutional networks and fully connected deep networks are presented, which yield networks that outperform all previous SNNs on the MNIST database.
References
More filters
Book

Self Organization And Associative Memory

Teuvo Kohonen
TL;DR: The purpose and nature of Biological Memory, as well as some of the aspects of Memory Aspects, are explained.
Journal ArticleDOI

Speed of processing in the human visual system.

TL;DR: The visual processing needed to perform this highly demanding task can be achieved in under 150 ms, and ERP analysis revealed a frontal negativity specific to no-go trials that develops roughly 150 ms after stimulus onset.
Book

The synaptic organization of the brain

TL;DR: Introduction to synaptic circuits, Gordon M.Shepherd and Christof Koch membrane properties and neurotransmitter actions, David A.Brown and Anthony M.Brown.
Journal ArticleDOI

Face recognition: a convolutional neural-network approach

TL;DR: A hybrid neural-network for human face recognition which compares favourably with other methods and analyzes the computational complexity and discusses how new classes could be added to the trained recognizer.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What are the contributions in "Connects/s aer hardware sensory–processing– learning–actuating system for high-speed visual object recognition and tracking" ?

This paper describes CAVIAR, a massively parallel hardware implementation of a spike-based sensing–processing–learning–actuating system inspired by the physiology of the nervous system. CAVIAR uses the asychronous address–event representation ( AER ) communication framework and was developed in the context of a European Union funded project. This work was supported by the European Commission under Grant IST-2001-34124 ( CAVIAR ). Color versions of one or more of the figures in this paper are available online at http: //ieeexplore. 

The authors plan to miniaturize it by about 3–4 orders of magnitude within the next few years, by increasing the numbers of synapses and neurons per AER-module and by integrating more modules into a smaller physical volume. Assuming timing delays similar to those reported in this paper, preliminary results [ 91 ] suggest that these systems could perform sophisticated object recognition with delays around 100 s. With such developments, the authors will be able to provide a modular and scalable platform for real-time implementations of neural models of a really challenging complexity. Coupling this massive preprocessing power with flexible back-ends of conventional procedural computation will enable solutions to a host of practical applications. 

The CAVIAR system consists of about 45k spiking neurons and 5M synapses; and it can perform up to 12G connections/operations per second. 

Monitor PCBs connect to a host computer through a high-speed USB2.0 connection, sending AEs at a speed of up to a peak rate of 6 Meps. 

It is capable of both spike-based learning (or spike-timing-dependent plasticity [82]) to learn to classify spatio–temporal spike patterns and rate-based Hebbian learning to learn spatio–temporal activity patterns. 

For high-speed phenomena, one can configure a time slice of very short duration (down to a few microseconds) and visualize a slow-motion recorded sequence of events offline. 

One could have expected that there are still only four different input patterns and that maximally four neurons could specialize on exactly those four patterns, but since this real-world input is changing its state in a continuous fashion rather than just assuming four discrete states, some of the neurons have become selective for transitory states “between” the four positions. 

Those delay lines are tapped at three different delays (approximately 0 s, 200 ms, and 400 ms) and the resulting 2 2 3 spike trains are passed on to the learning chip. 

The task of the learning classifier chip is now to provide a good representation of at most 32 categories (since there are 32 neurons) from the repeated spatio–temporal pattern. 

The authors also implemented a fully electronic (without mirrors, mechanical parts, or motors) servo system for changing the central view point. 

Their convolution chips are very efficient in this sense, because for each input event, they can process up to synaptic connections in 330 ns connections/s/chip. 

this neuron will activate the global inhibitory neurons 2 of the remaining three quadrants the most if its quadrant receives the highest input rate out of the four quadrants. 

The learning chip can then, for example, track the 3-D movement of an object in space by programming the same feature shape at different sizes in the different convolution chips.