Training Deep Spiking Neural Networks Using Backpropagation.

doi:10.3389/FNINS.2016.00508

ORIGINAL RESEARCH

published: 08 November 2016

doi: 10.3389/fnins.2016.00508

Frontiers in Neuroscience | www.frontiersin.org 1 November 2016 | Volume 10 | Article 508

Edited by:

Bernabe Linares-Barranco,

Instituto de Microelectrónica de

Sevilla, Spain

Reviewed by:

Tara Julia Hamilton,

Western Sydney University, Australia

Thomas Nowotny,

University of Sussex, UK

*Correspondence:

Jun Haeng Lee

junhaeng.lee@gmail.com;

junhaeng2.lee@samsung.com

Specialty section:

This article was submitted to

Neuromorphic Engineering,

a section of the journal

Frontiers in Neuroscience

Received: 30 August 2016

Accepted: 24 October 2016

Published: 08 November 2016

Citation:

Lee JH, Delbruck T and Pfeiffer M

(2016) Training Deep Spiking Neural

Networks Using Backpropagation.

Front. Neurosci. 10:508.

doi: 10.3389/fnins.2016.00508

Training Deep Spiking Neural

Networks Using Backpropagation

Jun Haeng Lee

1, 2

*

, Tobi Delbruck

2

and Michael Pfeiffer

2

1

Samsung Advanced Institute of Technology, Samsung Electronics, Suwon, South Korea,

2

Institute of Neuroinformatics,

University of Zurich and ETH Zurich, Zurich, Switzerland

Deep spiking neural networks (SNNs) hold the potential for improving the latency and

energy efﬁciency of deep neural networks through data-driven event-based computation.

However, training such networks is difﬁcult due to the non-differentiable nature of spike

events. In this paper, we introduce a novel technique, which treats the membrane

potentials of spiking neurons as differentiable signals, where discontinuities at spike

times are considered as noise. This enables an error backpropagation mechanism for

deep SNNs that follows the same principles as in conventional deep networks, but

works directly on spike signals and membrane potentials. Compared with previous

methods relying on indirect training and conversion, our technique has the potential to

capture the statistics of spikes more precisely. We evaluate the proposed framework

on artiﬁcially generated events from the original MNIST handwritten digit benchmark,

and also on the N-MNIST benchmark recorded with an event-based dynamic vision

sensor, in which the proposed method reduces the error rate by a factor of more than

three compared to the best previous SNN, and also achieves a higher accuracy than a

conventional convolutional neural network (CNN) trained and tested on the same data.

We demonstrate in the context of the MNIST task that thanks to their event-driven

operation, deep SNNs (both fully connected and convolutional) trained with our method

achieve accuracy equivalent with conventional neural networks. In the N-MNIST example,

equivalent accuracy is achieved with about ﬁve times fewer computational operations.

Keywords: spiking neural network, deep neural network, backpropagation, neuromorphic, DVS, MNIST, N-MNIST

1. INTRODUCTION

Deep learning is achieving outstanding results in various machine learning tasks (

He et al.,

2

015a; LeCun et al., 2015

), but for applications that require real-time interaction with the real

environment, the repeated and often redundant update of large numbers of units becomes a

bottleneck for eﬃciency. An alternative has been proposed in the form of spiking neural networks

(SNNs), a major research topic in theoretical neuroscience and neuromorphic engineering. SNNs

exploit event-based, data-driven updates to gain eﬃciency, especially if they are combined with

inputs from event-based sensors, which reduce redundant information based on asynchronous

event processing (Camunas-Mesa et al., 2012; O’Connor et al., 2013; Merolla et al., 2014; Neil and

Liu, 2016). This feature makes spiking systems attractive for real-time applications where speed

and power consumption are important factors, especially once adequate neuromorphic hardware

platforms become more widely available. Even though in theory (

Maass and Markram, 2004) SNNs

have been shown to be as computationally powerful as conventional artiﬁcial neural networks

Lee et al. SNN Backprop

(ANNs; this term will be used to describe conventional deep

neural networks in contrast with SNNs), practically SNNs have

not quite reached the same accuracy levels of ANNs in traditional

machine learning tasks. A major reason for this is the lack of

adequate training algorithms for deep SNNs, since spike signals

(i.e., discrete events produced by a spiking neuron whenever its

internal state crosses a threshold condition) are not diﬀerentiable,

but diﬀerentiable activation functions are fundamental for using

error b ackpropagation, which is still by far the most widely used

algorithm for training deep neural networks.

A recently proposed solution is to use diﬀerent data

representations between training and processing, i.e., training a

conventional ANN and developing conversion algorithms that

transfer the weights into e quivalent deep SNNs (O’Connor et al.,

2013; Diehl et al., 2015; Esser et al., 2015; Hunsberger and

Eliasmith, 2015

). However, in these methods, details of statistics

in spike trains that go beyond ideal mean rate modeling, such as

required for processing practical e vent-based sensor data cannot

be precisely represented by the signals used for training. It is

therefore desirable to devise learning rules operating directly on

spike trains, but so far it has only been possible to train single

layers, and use unsupervised learning rules, which leads to a

deterioration of accuracy (

Masquelier and Thorpe, 2007; Neftci

et al., 2014; Diehl and Cook, 2015). An alternative approach has

recently been introduced by O’Connor and Welling (2016), in

which a SNN learns from spikes, but requires keeping statistics

for computing stochastic gradient descent (SGD) updates in

order to approximate a conventional ANN.

In this paper we introduce a novel supervised learning meth od

for SNNs, which closely follows the successful backpropagation

algorithm for deep ANNs, but here is used to train general

forms of deep SNNs directly from spike signals. This framework

includes both fully connected and convolutional SNNs, SNNs

with leaky membrane potential, and layers implementing spiking

winner-takes-all (WTA) circuits. The key idea of our approach

is to generate a continuous and diﬀerentiable signal on which

SGD can work, using low-pass ﬁltered spiking signals added

onto the membrane potential and treating abrupt changes of

the membrane potential as noise during error backpropagation.

Additional techniques are presented that address particular

challenges of SNN training: Spiking neurons typically require

large thresholds to achieve stability and reasonable ﬁring rates,

but large thresholds may result in many “dead” neurons, which

do not participate in the optimization during training. Novel

regularization and normalization techniques are proposed that

contribute to stable and balanced learning. Our techniques lay

the foundations for closing the performance gap between SNNs

and ANNs, and promote their use for practical applications.

1.1. Related Work

Gradient descent methods for SNNs have not been deeply

investigated because both spike trains and the underlying

membrane potentials are not diﬀerentiable at the time of spikes.

The most successful approaches to d ate have used indirec t

methods, such as training a network in the continuous rate

domain and converting it into a spiking version.

O’Connor

et al. (2013)

pioneered t his area by training a spiking deep

belief network based on the Siegert event-rate approximation

model. However, on the MNIST hand written digit classiﬁcation

task (

LeCun et al., 1998), which is nowadays almost perfectly

solved by ANNs (0.21% error rate in

Wan et al., 2013), their

approach only reached an accuracy around 94.09%.

Hunsberger

and Eliasmith (2015) used the softened rate model, in which a

hard threshold in the response function of leaky integrate and

ﬁre (LIF) neuron is replaced with a continuous diﬀerentiable

function to make it amenable to use in backpropagation. After

training an ANN with the rate model they converted it into a SNN

consisting of LIF neurons. With the help of pre-training based on

denoising autoencoders they achieved 98.6% in the permutation-

invariant (PI) MNIST task (see Secti on 3.1). Diehl et al. (2015)

trained deep neural networks with conventional deep learning

techniques and additional constraints necessary for conversion

to SNNs. After training, the ANN units were converted into

non-leaky spiking neurons and the performance was optimized

by normalizing weight parameters. This approach resulted in

the current state-of-the-art accuracy for SNNs of 98.64% in

the PI MNIST t ask.

Esser et al. (2015) used a diﬀerentiable

probabilistic spiking neuron model for training and statistically

sampled the trained network for deployment. In all of these

methods, training was performed indirectly using continuous

signals, which may not capture important statistics of spikes

generated by real sensors used during processing. Even though

SNNs are well-suited for processing si gnals from event-based

sensors such as the Dynamic Vision Sensor (DVS) (Lichtsteiner

et al., 2008), the previous SNN training models require removing

time information and generating image frames from the e vent

streams. Instead, in this article we use the same signal format

for training and processing deep SNNs, and can thus train SNNs

directly on spatio-temporal event streams considering non-ideal

factors such as pixel variation in sensors. This is demonstrated

on the neuromorphic N-MNIST benchmark dataset (

Orchard

et al., 2015), achieving higher accuracy with a smaller number of

neurons than all previous attempts that ignored spike timing by

using event-rate approximation models for training.

2. MATERIALS AND METHODS

2.1. Spiking Neural Networks

In this article we study two t ypes of networks: Fully connected

SNNs with multiple hidden layers and convolutional SNNs. Let

M and N be the number of synapses of a neuron and the number

of neurons in a layer, respectively. On the other hand, m and n

are the number of active synapses (i.e., synapses receiving spike

inputs) of a neuron and the number of active neurons (sending

spike outputs) in a layer during the presentation of an input

sample. We will also use the simpliﬁed form of indices for active

synapses and neurons throughout the paper as

Active synapses: {v

1

, ··· , v

m

}→{1, ··· , m}

Active neurons: {u

1

, ··· , u

n

}→{1, ··· , n}

Thus, if an index i, j, or k is used for a sy napse over [1, m] or a

neuron over [1, n] (e.g., in Equation 5), then it actually represents

an index of an active synapse (v

i

) or an active neuron (u

j

).

Frontiers in Neuroscience | www.frontiersin.org 2 November 2016 | Volume 10 | Article 508

Lee et al. SNN Backprop

2.1.1. Leaky Integrate-and-Fire (LIF) Neuron

The LIF neuron is one of the simplest models used for describing

dynamics of spiking neurons (

Gerstner and Kistler, 2002). Since

the states of LIF neurons can be updated asynchronously based

solely on the timing of input events (i.e., without timestepped

integration), LIF is computationally eﬃcient. For a given input

spike th e membrane potential of a LIF neuron can be updated as

V

mp

(t

p

) = V

mp

(t

p −1

)e

t

p −1

−t

p

τ

mp

+ w

(p)

i

w

dyn

, (1)

where V

mp

is the membrane potential, τ

mp

is the membrane time

constant, t

p

and t

p −1

are t h e present and previous input spike

times, w

(p)

i

is the synaptic weight of the i-th synapse (through

which the present p-th input spike arrives). We introduce here

a dynamic weight w

dyn

, which controls the refractory period

following

w

dyn

=

(

(1

t

/T

ref

)

2

if 1

t

< T

ref

and w

dyn

< 1

1 otherwise

(2)

where T

ref

is t he maximum duration of the refractory period, and

1

t

= t

out

− t

p

, where t

out

is the time of the latest output spike

produced by the neuron or an external trigger signal th rough

lateral inhibition as discussed in Section 2.1.2. Thus, the eﬀect

of input spikes on V

mp

is suppressed for a short period of time

T

ref

after an output spike. w

dyn

recovers quadratically to 1 after

the output spike at t

out

. Since w

dyn

is a neuron parameter and

applied to all synapses identically, it is diﬀerent from short-

term plasticity, which is a synapse speciﬁc mechanism. The

motivation to use dynamic weights instead of simpler refractory

mechanisms, such as simply blocking the generation of output

spikes, is that it allows controlling refractory states by external

mechanisms. One example is the introduction of WTA circuits

in Section 2.1.2, where lateral inhibition simultaneously puts

all neurons competing in a WTA into the refractory state.

This ensures that the winning neuron gets another chance to

win the competition, since otherwise another neuron could ﬁre

while only the winner has to reset its membrane potential after

generating a spike.

When V

mp

crosses the threshold value V

th

, the LIF neuron

generates an output spike and V

mp

is decreased by the amount

of t he threshold:

V

mp

(t

+

p

) = V

mp

(t

p

) − V

th

, (3)

where t

+

p

is the time right after t h e reset. A lower bound for the

membrane potential is set at −V

th

, and V

mp

is clipped whenever

it falls below this value. This strategy helps balancing the

participation of neurons during training by preventing neurons

from having highly negative membrane potentials. We will revisit

this issue when we introduce threshold regularization in Section

2.3.2.

2.1.2. Winner-Take-All (WTA) Circuit

We found that the accuracy of SNNs could be improved by

introducing a competitive recurrent architecture in the form of

adding WTA circuits in certain layers. In a WTA circuit, multiple

neurons form a group with lateral inhibitory connections. Thus,

as soon as any neuron produces an output spike, it inhibits all

other neurons in the circuit and prevents them from spiking

(

Rozell et al., 2008; Oster et al., 2009). In this work, all lateral

connections in a WTA circuit have the same strength, which

reduces memory and computational costs for implementing

them. The amount of lateral inhibition applied to the membrane

potential is proportional to the inhibited neuron’s membrane

potential threshold (the exact form is deﬁned in Equation 5

in Section 2.2.2). With this scheme, lateral connections inhibit

neurons having small V

th

weakly and those having large V

th

strongly. This improves the balance of activities among neurons

during training since neurons with higher activities have larger

V

th

due to the threshold regularization scheme described in

Section 2.3.2. Furthermore, as described previously in Section

2.1.1, lateral inhibition is used to put the dynamic weights of all

inhibited neurons in a WTA circuit into the refractory state. As

shown in Figure 3 and discussed later in Section 3. 1, we found

that adding WTA circuits both improves classiﬁcation accuracy,

and improves the stability and speed of convergence during

training.

2.2. Using Backpropagation in SNNs

In order to derive and apply the backpropagation equations for

training SNNs, after summarizing the classical backpropagation

method (

Rumelhart and Zipser, 1985) we derive diﬀerentiable

transfer functions for spiking neurons in WTA conﬁguration.

Furthermore, we introduce simple methods to initialize

parameters and normalize backpropagating errors to address

vanishing or exploding gradients, and to stabilize training. These

are variations of successful methods used commonly in deep

learning, but ad apted to the speciﬁc requirements of SNNs.

2.2.1. Backpropagation Revisited

Neural networks are typically optimized by SGD, meaning that

the vector of network parameters or weights θ is moved in

the direction of the negative gradient of some loss function L

according to θ = θ − η∂L/∂θ, where η is the learning rate.

The backpropagation algorithm uses the chain rule to compute

the partial derivatives ∂L/∂θ . For completeness we provide here

a summary of backprop for conventional fully-connected deep

neural networks:

1. Propagation inputs in the forward direction to compute the

pre-activations (z

(l)

) and activations (a

(l)

= f

(l)

(z

(l)

)) for all

the layers up to the output layer l

n

l

, where f is the transfer

function of units.

2. Calculate the error at the output layer:

δ

(n

l

)

=

∂L(a

(n

l

)

,y)

∂z

(n

l

)

=

∂L(a

(n

l

)

,y)

∂a

(n

l

)

· f

′

(z

(n

l

)

where y is the label vector indicating the desired output

activation and · is element-wise multiplication.

3. Backpropagate the error to lower layers l = n

l

− 1, n

l

−

2, . . . , 2:

δ

(l)

=



(W

(l)

)

T

δ

(l +1)



· f

′

(z

(l)

)

Frontiers in Neuroscience | www.frontiersin.org 3 November 2016 | Volume 10 | Article 508

Lee et al. SNN Backprop

where W

(l)

is t he weight matrix of the layer l.

4. Compute the partial derivatives for the update:

∇

W

(l)

L = δ

(l +1)

(a

(l)

)

T

∇

b

(l)

L = δ

(l +1)

where b

(l)

is t he bias vector of the layer l.

5. Update the parameters:

W

(l)

= W

(l)

− η∇

W

(l)

L

b

(l)

= b

(l)

− η∇

b

(l)

L

2.2.2. Transfer Function and Derivatives

Starting from the event-based update of the membrane potentials

in Equation (1), we ca n deﬁne the accumulated eﬀect (normalized

by synaptic weight) of the k-th active input synapse onto the

membrane potential of a target neuron as x

k

(t). Similarly, the

generation of spikes in neuron i acts on its own membrane

potential via the term a

i

, which is due to the reset in Equation

(3) (normalized by V

th

). Both x

k

and a

i

can be expressed as sums

of exponentially decaying terms

x

k

(t) =

X

p

exp



t

p

− t

τ

mp



, a

i

(t) =

X

q

exp



t

q

− t

τ

mp



, (4)

where the ﬁrst sum is over all input spik e times t

p

< t at

the k-th input synapse, and the second sum is over the output

spike times t

q

< t for a

i

. The accumulated eﬀects of lateral

inhibitory signals in WTA circuits can be expressed analogously

to Equation (4). The activities in Equation (4) are real-valued and

continuous except for the time points where spikes occur and the

activities jump up. We use these numerically computed lowpass-

ﬁltered activities for backpropagation instead of directly using

spike signals.

Ignoring the eﬀect of refractory periods for now, t h e

membrane potential of the i-th active neuron in a WTA circuit

can be written in terms of x

k

and a

i

deﬁned in Equation (4) as

V

mp,i

(t) =

m

X

k =1

w

ik

x

k

(t) −V

th,i

a

i

(t) +σ V

th,i

n

X

j =1,j 6=i

κ

ij

a

j

(t). (5)

The terms on the right side represent the input, membrane

potential resets, and lateral inhibition, respectively. κ

ij

is the

strength of lateral inhibition (−1 ≤ κ

ij

≤ 0) from the j-th

active neuron to the i-th active neuron, and σ is the expected

eﬃcacy of lateral inhibition. σ should be smaller than 1, since

lateral inhibitions can aﬀect th e membrane potential only down

to its lower bound (i.e., −V

th

). We found a value of σ ≈ 0.5

to work well in practice. Equation (5) reveals the relationship

between inputs and outputs of spiking neurons which is not

clearly shown in Equations (1) and (3). Nonlinear activation of

neurons is considered in Equation (5) by including only active

synapses and neurons. Figure 1 shows the relationship between

signals presented in Equations (4) and (5). Since the output (a

i

)

of the current layer becomes the input (x

k

) of the next layer if all

the neurons have same τ

mp

, Equation (5) provides the basis for

deriving th e backpropagation algorithm via the chain rule.

Diﬀerentiation is not deﬁned in Equation (4) at the moment of

each spike because there is a discontinuous step jump. However,

we propose here to ignore these ﬂuctuations, and treat Equations

(4) and (5) as if they were diﬀerentiable continuous signals

to derive the necessary error gradients for backpropagation.

In previous works (

O’Connor et al., 2013; Diehl et al., 2015;

Esser et al., 2015; Hunsberger and Eliasmith, 2015), continuous

variables were introduced as a surrogate for x

k

and a

i

in Equation

(5) for backpropagation. In this work, however, we directly use

the contribution of spike signals to the membrane potential

as deﬁned in Equation (4). Thus, the real statistics of spike

signals, including temporal eﬀects such as synchrony between

inputs, can inﬂuence the training process. Ignoring the step

jumps caused by spikes in the calculation of gradients might of

course introduce errors, but as our results show, in practice this

seems to have very little inﬂuence on SNN training. A potential

explanation for this robustness of our training scheme is that

by treating the signals in Equation (4) as continuous signals

that ﬂuctuate suddenly at times of spikes, we achieve a similar

positive eﬀect as th e widely used approach of noise injection

during training, which can improve the generalization capability

of neural networks (Vincent et al., 2008). In the case of SNNs,

several papers have used the trick of treating spike-induced

abrupt changes as noise for gradient descent optimization

(Bengio et al., 2015; Hunsberger and Eliasmith, 2015). However,

in these cases the model added Gaussian random noise instead

of spike-induced perturba tions. In this work, we directly use the

actual contribution of spike signals to the membrane potential as

described in Equation (4) for training SNNs. Our results show

empirically that this approach works well for learning in SNNs

where information is encoded in spike rates. Importantly, the

presented framework also provides the basis for utilizing speciﬁc

spatio-temporal codes, which we demonstrate on a task using

inputs from event-based sensors.

For the backpropagation equations it is necessary to obtain

the transfer functions of LIF neurons in WTA c ircuits (which

generalizes to non-WTA layers by setting κ

ij

= 0 for all i and j).

For this we set the residual V

mp

term in the left side of Equation

(5) to zero (since it is not rele vant to the transfer function),

resulting in the transfer function

a

i

≈

s

i

V

th,i

+ σ

n

X

j =1,j 6=i

κ

ij

a

j

, where s

i

=

m

X

k =1

w

ik

x

k

. (6)

Refractory periods are not considered here since the activity of

neurons in SNNs is rarely dominated by refractory periods in

a normal operating regime. For example, we used a refractory

period of 1 ms and the event rates of individual neurons were

kept within a few tens of events per second (eps). Equation (6)

is consistent with (4.9) in

Gerstner and Kistler (2002) with out

WTA terms. The equation can also be simpliﬁed to a spiking

version of a rectiﬁed-linear unit by introducing a unit threshold

and non-leaky membrane potential as in

O’Connor and Welling

(2016)

.

Frontiers in Neuroscience | www.frontiersin.org 4 November 2016 | Volume 10 | Article 508

Lee et al. SNN Backprop

FIGURE 1 | Conceptual diagram showing the relationship between signals in the proposed spiking neural network model. Error gradients are

back-propagated through the components of the membrane potential deﬁned in Equation (4).

Directly diﬀerentiating Equation (6) yields the

backpropagation equations

∂a

i

∂s

i

≈

1

V

th,i

,

∂a

i

∂w

ik

≈

∂a

i

∂s

i

x

k

,

∂a

i

∂V

th,i

≈

∂a

i

∂s

i

(−a

i

+ σ

n

X

j 6=i

κ

ij

a

j

),

∂a

i

∂κ

ih

≈

∂a

i

∂s

i

(σ V

th,i

a

h

), (7)







∂a

1

∂x

k

.

∂a

1

∂x

k







≈

1

σ







q ··· −κ

1n

.

−κ

n1

··· q







−1







w

1k

V

th,1

.

w

nk

V

th,n







(8)

where q = 1/σ . When all the lateral inhibitory connections have

the same strength (κ

ij

= µ, ∀i, j) and are not learned, ∂a

i

/∂κ

ih

is

not necessary and Equation (8) can be simpliﬁed to

∂a

i

∂x

k

≈

∂a

i

∂s

i

1

(1 − µσ )





w

ik

−

µσ V

th,i

1 + µσ (n − 1)

n

X

j =1

w

jk

V

th,j





. (9)

By inserting the above derivatives in Equations (7) and (9) into

the standard error backpropagation algorithm, we obtain an

eﬀective learning rule for SNNs. We consider only the ﬁrst-order

eﬀect of the lateral connections in the derivation of gradients.

Higher-order terms propagating back through multiple lateral

connections are neglected for simplicity. This is mainly because

all the lateral connections considered here are inhibitory. For

inhibitory lateral connections, the eﬀect of small parameter

changes decays rapidly with connection distance. Thus, ﬁrst-

order approximation saves a lot of computational cost without

loss of accuracy.

2.2.3. Weight Initialization and Backprop Error

Normalization

Good initialization of weight parameters in supervised learning

is critical to handle the exploding or vanishing gradients problem

in deep neural networks (

Glorot and Bengio, 2010; He et al.,

2015b). The basic idea behind those methods is to maintain

the balance of forward activations and backward propagating

errors among layers. Recently, the batch normalization technique

has been proposed to make sure that such balance is

maintained through the whole training process (Ioﬀe and

Szegedy, 2015

). However, normalization of activities as in the

batch normalization scheme is diﬃcult for SNNs, because

there is no eﬃcient method for amplifying event rates above

the input rate. The initialization methods proposed in

Glorot

and Bengio (2010) or He et al. (2015b) are not appropriate

for SNNs either, because SNNs have posit ive thresholds

that are usually much larger than individual weight values.

In this work, we propose simple methods for initializing

parameters and normalizing backprop errors for training

deep SNNs. Even though the proposed technique does not

guarantee the balance of forward activations, it is eﬀective for

addressing the exploding and vanishing gradients problems.

Error normalization is not critical for training SNNs with

a si ngle hidden layer. However, we observed that training

deep SNNs without normalizing backprop errors mostly failed

due to exploding gradients. We describe here the method in

case of fully-connected deep networks for simplici ty. However,

Frontiers in Neuroscience | www.frontiersin.org 5 November 2016 | Volume 10 | Article 508

Training Deep Spiking Neural Networks Using Backpropagation.

Citations

Towards spike-based machine intelligence with neuromorphic computing.

Memory devices and applications for in-memory computing

Deep learning in spiking neural networks

Conversion of Continuous-Valued Deep Networks to Efficient Event-Driven Networks for Image Classification.

Event-based Vision: A Survey

References

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

Deep learning

Gradient-based learning applied to document recognition

Dropout: a simple way to prevent neural networks from overfitting

Related Papers (5)

Loihi: A Neuromorphic Manycore Processor with On-Chip Learning

A million spiking-neuron integrated circuit with a scalable communication network and interface

Unsupervised learning of digit recognition using spike-timing-dependent plasticity.

Networks of spiking neurons: the third generation of neural network models

Gradient-based learning applied to document recognition