A million spiking-neuron integrated circuit with a scalable communication network and interface

doi:10.1126/SCIENCE.1254642

Home
/
Papers
/
A million spiking-neuron integrated circuit with a scalable communication network and interface

Journal Article•DOI•

A million spiking-neuron integrated circuit with a scalable communication network and interface

Paul A. Merolla¹, John V. Arthur¹, Rodrigo Alvarez-Icaza¹, Andrew S. Cassidy¹, Jun Sawada¹, Filipp Akopyan¹, Bryan L. Jackson¹, Nabil Imam², Chen Guo¹, Yutaka Nakamura¹, Bernard Brezzo¹, Ivan Vo¹, Steven K. Esser¹, Rathinakumar Appuswamy¹, Brian Taba¹, Arnon Amir¹, Myron D. Flickner¹, William P. Risk¹, Rajit Manohar², Dharmendra S. Modha¹ - Show less +16 more•Institutions (2)

IBM¹, Cornell University²

08 Aug 2014-Science (American Association for the Advancement of Science)-Vol. 345, Iss: 6197, pp 668-673

TL;DR: Inspired by the brain’s structure, an efficient, scalable, and flexible non–von Neumann architecture is developed that leverages contemporary silicon technology and is well suited to many applications that use complex neural networks in real time, for example, multiobject detection and classification.

read less

Abstract: Inspired by the brain’s structure, we have developed an efficient, scalable, and flexible non–von Neumann architecture that leverages contemporary silicon technology. To demonstrate, we built a 5.4-billion-transistor chip with 4096 neurosynaptic cores interconnected via an intrachip network that integrates 1 million programmable spiking neurons and 256 million configurable synapses. Chips can be tiled in two dimensions via an interchip communication interface, seamlessly scaling the architecture to a cortexlike sheet of arbitrary size. The architecture is well suited to many applications that use complex neural networks in real time, for example, multiobject detection and classification. With 400-pixel-by-240-pixel video input at 30 frames per second, the chip consumes 63 milliwatts.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Deep learning in neural networks

[...]

Jürgen Schmidhuber¹•Institutions (1)

University of Lugano¹

01 Jan 2015-Neural Networks

TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

...read moreread less

14,635 citations

Additional excerpts

...Future energy-efficient hardware for DL in NNsmay implement aspects of such models (e.g., Fieres, Schemmel, & Meier, 2008; Glackin, McGinnity, Maguire, Wu, & Belatreche, 2005; Indiveri et al., 2011; Jin et al., 2010; Khan et al., 2008; Liu et al., 2001; Merolla et al., 2014; Neil & Liu, 2014; Roggen, Hofmann, Thoma, & Floreano, 2003; Schemmel, Grubl,Meier, &Mueller, 2006; SerranoGotarredona et al., 2009)....
[...]
...…& Meier, 2008; Glackin, McGinnity, Maguire, Wu, & Belatreche, 2005; Indiveri et al., 2011; Jin et al., 2010; Khan et al., 2008; Liu et al., 2001; Merolla et al., 2014; Neil & Liu, 2014; Roggen, Hofmann, Thoma, & Floreano, 2003; Schemmel, Grubl,Meier, &Mueller, 2006; SerranoGotarredona et al.,…...
[...]

Journal Article•DOI•

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

[...]

Vivienne Sze¹, Yu-Hsin Chen¹, Tien-Ju Yang¹, Joel Emer¹•Institutions (1)

Massachusetts Institute of Technology¹

20 Nov 2017

TL;DR: In this paper, the authors provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs, and discuss various hardware platforms and architectures that support DNN, and highlight key trends in reducing the computation cost of deep neural networks either solely via hardware design changes or via joint hardware and DNN algorithm changes.

...read moreread less

Abstract: Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic codesigns, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the tradeoffs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.

...read moreread less

2,391 citations

Cites background from "A million spiking-neuron integrated..."

...An example of a project that was inspired by the spiking of the brain is the IBM TrueNorth [8]....
[...]

Journal Article•DOI•

Loihi: A Neuromorphic Manycore Processor with On-Chip Learning

[...]

Michael Davies¹, Narayan Srinivasa, Tsung-Han Lin¹, Gautham N. Chinya¹, Cao Yongqiang¹, Sri Harsha Choday¹, Georgios D. Dimou, Prasad Joshi¹, Nabil Imam¹, Shweta Jain¹, Yuyun Liao¹, Chit-Kwan Lin¹, Andrew Lines¹, Ruokun Liu¹, Deepak A. Mathaikutty¹, Steven McCoy¹, Arnab Paul¹, Jonathan Tse¹, Guruguhanathan Venkataramanan¹, Yi-Hsin Weng¹, Andreas Wild¹, Yoon Seok Yang¹, Hong Wang¹ - Show less +19 more•Institutions (1)

Intel¹

16 Jan 2018-IEEE Micro

TL;DR: Loihi is a 60-mm2 chip fabricated in Intels 14-nm process that advances the state-of-the-art modeling of spiking neural networks in silicon, and can solve LASSO optimization problems with over three orders of magnitude superior energy-delay-product compared to conventional solvers running on a CPU iso-process/voltage/area.

...read moreread less

Abstract: Loihi is a 60-mm2 chip fabricated in Intels 14-nm process that advances the state-of-the-art modeling of spiking neural networks in silicon. It integrates a wide range of novel features for the field, such as hierarchical connectivity, dendritic compartments, synaptic delays, and, most importantly, programmable synaptic learning rules. Running a spiking convolutional form of the Locally Competitive Algorithm, Loihi can solve LASSO optimization problems with over three orders of magnitude superior energy-delay-product compared to conventional solvers running on a CPU iso-process/voltage/area. This provides an unambiguous example of spike-based computation, outperforming all known conventional solutions.

...read moreread less

2,331 citations

Journal Article•DOI•

Training and operation of an integrated neuromorphic network based on metal-oxide memristors

[...]

Mirko Prezioso¹, Farnood Merrikh-Bayat¹, Brian D. Hoskins¹, Gina C. Adam¹, Konstantin K. Likharev², Dmitri B. Strukov¹ - Show less +2 more•Institutions (2)

University of California, Santa Barbara¹, Stony Brook University²

07 May 2015-Nature

TL;DR: The experimental implementation of transistor-free metal-oxide memristor crossbars, with device variability sufficiently low to allow operation of integrated neural networks, in a simple network: a single-layer perceptron (an algorithm for linear classification).

...read moreread less

Abstract: Despite much progress in semiconductor integrated circuit technology, the extreme complexity of the human cerebral cortex, with its approximately 10(14) synapses, makes the hardware implementation of neuromorphic networks with a comparable number of devices exceptionally challenging. To provide comparable complexity while operating much faster and with manageable power dissipation, networks based on circuits combining complementary metal-oxide-semiconductors (CMOSs) and adjustable two-terminal resistive devices (memristors) have been developed. In such circuits, the usual CMOS stack is augmented with one or several crossbar layers, with memristors at each crosspoint. There have recently been notable improvements in the fabrication of such memristive crossbars and their integration with CMOS circuits, including first demonstrations of their vertical integration. Separately, discrete memristors have been used as artificial synapses in neuromorphic networks. Very recently, such experiments have been extended to crossbar arrays of phase-change memristive devices. The adjustment of such devices, however, requires an additional transistor at each crosspoint, and hence these devices are much harder to scale than metal-oxide memristors, whose nonlinear current-voltage curves enable transistor-free operation. Here we report the experimental implementation of transistor-free metal-oxide memristor crossbars, with device variability sufficiently low to allow operation of integrated neural networks, in a simple network: a single-layer perceptron (an algorithm for linear classification). The network can be taught in situ using a coarse-grain variety of the delta rule algorithm to perform the perfect classification of 3 × 3-pixel black/white images into three classes (representing letters). This demonstration is an important step towards much larger and more complex memristive neuromorphic networks.

...read moreread less

2,222 citations

Journal Article•DOI•

The computational brain: Patricia S. Churchland and Terrence J. Sejnowski (MIT Press, Cambridge, MA, 1992); xi, 544 pages, $39.95

[...]

George N. Reeke

01 Apr 1996-Artificial Intelligence

TL;DR: The Computational Brain this paper provides a broad overview of neuroscience and computational theory, followed by a study of some of the most recent and sophisticated modeling work in the context of relevant neurobiological research.

...read moreread less

1,472 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Translinear circuits in subthreshold CMOS

[...]

A. G. Andreou

01 Jan 1996

TL;DR: An analog VLSI “translinear system” with over 590,000 transistors in subthreshold CMOS performs phototransduction, amplification, edge enhancement and local gain control at the pixel level.

...read moreread less

Abstract: In this paper we provide an overview of translinear circuit design using MOS transistors operating in subthreshold region. We contrast the bipolar and MOS subthreshold characteristics and extend the translinear principle to the subthreshold MOS ohmic region through a drain/source current decomposition. A front/back-gate current decomposition is adopted; this facilitates the analysis of translinear loops, including multiple input floating gate MOS transistors. Circuit examples drawn from working systems designed and fabricated in standard digital CMOS oriented process are used as vehicles to illustrate key design considerations, systematic analysis procedures, and limitations imposed by the structure and physics of MOS transistors. Finally, we present the design of an analog VLSI “translinear system” with over 590,000 transistors in subthreshold CMOS. This performs phototransduction, amplification, edge enhancement and local gain control at the pixel level.

...read moreread less

170 citations

Journal Article•DOI•

Retinal origin of orientation maps in visual cortex.

[...]

Se-Bum Paik¹, Dario L. Ringach¹•Institutions (1)

University of California, Los Angeles¹

01 Jul 2011-Nature Neuroscience

TL;DR: The notion that quasi-periodic orientation maps are established by moiré interference of regularly spaced ON- and OFF-center retinal ganglion cell mosaics is advanced and offers a possible account for the emergence of orientation tuning in single neurons despite the absence of orderly orientation maps in rodents species.

...read moreread less

Abstract: This paper demonstrates that orientation maps, as found in the cortex of higher mammals, are likely to arise from the spatial layout of retinal ganglion cell receptive fields in the retina. The predictions of this model are borne out in four different species.

...read moreread less

146 citations

Journal Article•DOI•

A Multicast Tree Router for Multichip Neuromorphic Systems

[...]

Paul A. Merolla¹, John V. Arthur¹, Rodrigo Alvarez¹, Jean-Marie Bussat¹, Kwabena Boahen¹ - Show less +1 more•Institutions (1)

Stanford University¹

01 Mar 2014-IEEE Transactions on Circuits and Systems

TL;DR: This design is the first fully implemented wormhole router with packet-branching that can never deadlock, and the design's effectiveness is demonstrated in Neurogrid, a million-neuron neuromorphic system consisting of sixteen chips.

...read moreread less

Abstract: We present a tree router for multichip systems that guarantees deadlock-free multicast packet routing without dropping packets or restricting their length. Multicast routing is required to efficiently connect massively parallel systems' computational units when each unit is connected to thousands of others residing on multiple chips, which is the case in neuromorphic systems. Our tree router implements this one-to-many routing by branching recursively-broadcasting the packet within a specified subtree. Within this subtree, the packet is only accepted by chips that have been programmed to do so. This approach boosts throughput because memory look-ups are avoided enroute, and keeps the header compact because it only specifies the route to the subtree's root. Deadlock is avoided by routing in two phases-an upward phase and a downward phase-and by restricting branching to the downward phase. This design is the first fully implemented wormhole router with packet-branching that can never deadlock. The design's effectiveness is demonstrated in Neurogrid, a million-neuron neuromorphic system consisting of sixteen chips. Each chip has a 256 × 256 silicon-neuron array integrated with a full-custom asynchronous VLSI implementation of the router that delivers up to 1.17 G words/s across the sixteen-chip network with less than 1 μs jitter.

...read moreread less

78 citations

Journal Article•DOI•

Special issue: SC13 – The International Conference for High Performance Computing, Networking, Storage and Analysis

[...]

William Gropp¹, William Gropp², Satoshi Matsuoka³•Institutions (3)

National Center for Supercomputing Applications¹, University of Illinois at Urbana–Champaign², Tokyo Institute of Technology³

01 Jan 2014-Scientific Programming

TL;DR: The technical papers program for SC13 received 449 submissions of which 90 where selected for the program giving an acceptance rate of 20%.

...read moreread less

Abstract: The technical papers program for SC13 received 449 submissions of which 90 where selected for the program giving an acceptance rate of 20%. A rigorous peer review process, including author rebuttals and a 1.5 day face-to-face program committee meeting ensured that selected papers were the very best in our field. One of the tasks at the face-to-face meeting was also to select finalists for the best paper award, from which one is selected by a committee during the conference. To further highlight their achievement of being selected as the very top tier of all the submitted papers to SC13, the authors of these finalist papers were offered the opportunity to publish extended versions of their papers in this special issue journal; all eight authors accepted.

...read moreread less

9 citations

"A million spiking-neuron integrated..." refers background or methods in this paper

...We used our one-to-one simulator (25) to run our processing system scaled up for the 1, 088⇥1, 920 pixel images in the dataset....
[...]
...Compared with an optimized simulator (25) running the exact same network on a modern general-purpose microprocessor, TrueNorth consumes 176, 000 times less energy per event (section S12)....
[...]
...S9 Software Ecosystem TrueNorth’s software ecosystem includes Compass (25), a highly-optimized simulator designed to simulate large networks of neurosynaptic cores, and a compositional programming language (33) for developing TrueNorth networks....
[...]
...In terms of communication, inter-processor messaging (25) explodes when simulating highly-interconnected networks that do not fit on a single processor....
[...]