scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A million spiking-neuron integrated circuit with a scalable communication network and interface

TL;DR: Inspired by the brain’s structure, an efficient, scalable, and flexible non–von Neumann architecture is developed that leverages contemporary silicon technology and is well suited to many applications that use complex neural networks in real time, for example, multiobject detection and classification.
Abstract: Inspired by the brain’s structure, we have developed an efficient, scalable, and flexible non–von Neumann architecture that leverages contemporary silicon technology. To demonstrate, we built a 5.4-billion-transistor chip with 4096 neurosynaptic cores interconnected via an intrachip network that integrates 1 million programmable spiking neurons and 256 million configurable synapses. Chips can be tiled in two dimensions via an interchip communication interface, seamlessly scaling the architecture to a cortexlike sheet of arbitrary size. The architecture is well suited to many applications that use complex neural networks in real time, for example, multiobject detection and classification. With 400-pixel-by-240-pixel video input at 30 frames per second, the chip consumes 63 milliwatts.
Citations
More filters
Journal ArticleDOI
TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

14,635 citations


Additional excerpts

  • ...Future energy-efficient hardware for DL in NNsmay implement aspects of such models (e.g., Fieres, Schemmel, & Meier, 2008; Glackin, McGinnity, Maguire, Wu, & Belatreche, 2005; Indiveri et al., 2011; Jin et al., 2010; Khan et al., 2008; Liu et al., 2001; Merolla et al., 2014; Neil & Liu, 2014; Roggen, Hofmann, Thoma, & Floreano, 2003; Schemmel, Grubl,Meier, &Mueller, 2006; SerranoGotarredona et al., 2009)....

    [...]

  • ...…& Meier, 2008; Glackin, McGinnity, Maguire, Wu, & Belatreche, 2005; Indiveri et al., 2011; Jin et al., 2010; Khan et al., 2008; Liu et al., 2001; Merolla et al., 2014; Neil & Liu, 2014; Roggen, Hofmann, Thoma, & Floreano, 2003; Schemmel, Grubl,Meier, &Mueller, 2006; SerranoGotarredona et al.,…...

    [...]

Journal ArticleDOI
20 Nov 2017
TL;DR: In this paper, the authors provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs, and discuss various hardware platforms and architectures that support DNN, and highlight key trends in reducing the computation cost of deep neural networks either solely via hardware design changes or via joint hardware and DNN algorithm changes.
Abstract: Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic codesigns, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the tradeoffs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.

2,391 citations


Cites background from "A million spiking-neuron integrated..."

  • ...An example of a project that was inspired by the spiking of the brain is the IBM TrueNorth [8]....

    [...]

Journal ArticleDOI
TL;DR: Loihi is a 60-mm2 chip fabricated in Intels 14-nm process that advances the state-of-the-art modeling of spiking neural networks in silicon, and can solve LASSO optimization problems with over three orders of magnitude superior energy-delay-product compared to conventional solvers running on a CPU iso-process/voltage/area.
Abstract: Loihi is a 60-mm2 chip fabricated in Intels 14-nm process that advances the state-of-the-art modeling of spiking neural networks in silicon. It integrates a wide range of novel features for the field, such as hierarchical connectivity, dendritic compartments, synaptic delays, and, most importantly, programmable synaptic learning rules. Running a spiking convolutional form of the Locally Competitive Algorithm, Loihi can solve LASSO optimization problems with over three orders of magnitude superior energy-delay-product compared to conventional solvers running on a CPU iso-process/voltage/area. This provides an unambiguous example of spike-based computation, outperforming all known conventional solutions.

2,331 citations

Journal ArticleDOI
07 May 2015-Nature
TL;DR: The experimental implementation of transistor-free metal-oxide memristor crossbars, with device variability sufficiently low to allow operation of integrated neural networks, in a simple network: a single-layer perceptron (an algorithm for linear classification).
Abstract: Despite much progress in semiconductor integrated circuit technology, the extreme complexity of the human cerebral cortex, with its approximately 10(14) synapses, makes the hardware implementation of neuromorphic networks with a comparable number of devices exceptionally challenging. To provide comparable complexity while operating much faster and with manageable power dissipation, networks based on circuits combining complementary metal-oxide-semiconductors (CMOSs) and adjustable two-terminal resistive devices (memristors) have been developed. In such circuits, the usual CMOS stack is augmented with one or several crossbar layers, with memristors at each crosspoint. There have recently been notable improvements in the fabrication of such memristive crossbars and their integration with CMOS circuits, including first demonstrations of their vertical integration. Separately, discrete memristors have been used as artificial synapses in neuromorphic networks. Very recently, such experiments have been extended to crossbar arrays of phase-change memristive devices. The adjustment of such devices, however, requires an additional transistor at each crosspoint, and hence these devices are much harder to scale than metal-oxide memristors, whose nonlinear current-voltage curves enable transistor-free operation. Here we report the experimental implementation of transistor-free metal-oxide memristor crossbars, with device variability sufficiently low to allow operation of integrated neural networks, in a simple network: a single-layer perceptron (an algorithm for linear classification). The network can be taught in situ using a coarse-grain variety of the delta rule algorithm to perform the perfect classification of 3 × 3-pixel black/white images into three classes (representing letters). This demonstration is an important step towards much larger and more complex memristive neuromorphic networks.

2,222 citations

Journal ArticleDOI
TL;DR: The Computational Brain this paper provides a broad overview of neuroscience and computational theory, followed by a study of some of the most recent and sophisticated modeling work in the context of relevant neurobiological research.

1,472 citations

References
More filters
01 Jan 1996
TL;DR: An analog VLSI “translinear system” with over 590,000 transistors in subthreshold CMOS performs phototransduction, amplification, edge enhancement and local gain control at the pixel level.
Abstract: In this paper we provide an overview of translinear circuit design using MOS transistors operating in subthreshold region. We contrast the bipolar and MOS subthreshold characteristics and extend the translinear principle to the subthreshold MOS ohmic region through a drain/source current decomposition. A front/back-gate current decomposition is adopted; this facilitates the analysis of translinear loops, including multiple input floating gate MOS transistors. Circuit examples drawn from working systems designed and fabricated in standard digital CMOS oriented process are used as vehicles to illustrate key design considerations, systematic analysis procedures, and limitations imposed by the structure and physics of MOS transistors. Finally, we present the design of an analog VLSI “translinear system” with over 590,000 transistors in subthreshold CMOS. This performs phototransduction, amplification, edge enhancement and local gain control at the pixel level.

170 citations

Journal ArticleDOI
TL;DR: The notion that quasi-periodic orientation maps are established by moiré interference of regularly spaced ON- and OFF-center retinal ganglion cell mosaics is advanced and offers a possible account for the emergence of orientation tuning in single neurons despite the absence of orderly orientation maps in rodents species.
Abstract: This paper demonstrates that orientation maps, as found in the cortex of higher mammals, are likely to arise from the spatial layout of retinal ganglion cell receptive fields in the retina. The predictions of this model are borne out in four different species.

146 citations

Journal ArticleDOI
TL;DR: This design is the first fully implemented wormhole router with packet-branching that can never deadlock, and the design's effectiveness is demonstrated in Neurogrid, a million-neuron neuromorphic system consisting of sixteen chips.
Abstract: We present a tree router for multichip systems that guarantees deadlock-free multicast packet routing without dropping packets or restricting their length. Multicast routing is required to efficiently connect massively parallel systems' computational units when each unit is connected to thousands of others residing on multiple chips, which is the case in neuromorphic systems. Our tree router implements this one-to-many routing by branching recursively-broadcasting the packet within a specified subtree. Within this subtree, the packet is only accepted by chips that have been programmed to do so. This approach boosts throughput because memory look-ups are avoided enroute, and keeps the header compact because it only specifies the route to the subtree's root. Deadlock is avoided by routing in two phases-an upward phase and a downward phase-and by restricting branching to the downward phase. This design is the first fully implemented wormhole router with packet-branching that can never deadlock. The design's effectiveness is demonstrated in Neurogrid, a million-neuron neuromorphic system consisting of sixteen chips. Each chip has a 256 × 256 silicon-neuron array integrated with a full-custom asynchronous VLSI implementation of the router that delivers up to 1.17 G words/s across the sixteen-chip network with less than 1 μs jitter.

78 citations

Journal ArticleDOI
TL;DR: The technical papers program for SC13 received 449 submissions of which 90 where selected for the program giving an acceptance rate of 20%.
Abstract: The technical papers program for SC13 received 449 submissions of which 90 where selected for the program giving an acceptance rate of 20%. A rigorous peer review process, including author rebuttals and a 1.5 day face-to-face program committee meeting ensured that selected papers were the very best in our field. One of the tasks at the face-to-face meeting was also to select finalists for the best paper award, from which one is selected by a committee during the conference. To further highlight their achievement of being selected as the very top tier of all the submitted papers to SC13, the authors of these finalist papers were offered the opportunity to publish extended versions of their papers in this special issue journal; all eight authors accepted.

9 citations


"A million spiking-neuron integrated..." refers background or methods in this paper

  • ...We used our one-to-one simulator (25) to run our processing system scaled up for the 1, 088⇥1, 920 pixel images in the dataset....

    [...]

  • ...Compared with an optimized simulator (25) running the exact same network on a modern general-purpose microprocessor, TrueNorth consumes 176, 000 times less energy per event (section S12)....

    [...]

  • ...S9 Software Ecosystem TrueNorth’s software ecosystem includes Compass (25), a highly-optimized simulator designed to simulate large networks of neurosynaptic cores, and a compositional programming language (33) for developing TrueNorth networks....

    [...]

  • ...In terms of communication, inter-processor messaging (25) explodes when simulating highly-interconnected networks that do not fit on a single processor....

    [...]

01 Jan 2007

8 citations


"A million spiking-neuron integrated..." refers background in this paper

  • ...The trend of increasing power densities and clock frequencies of processors (29) is headed away from the brain’s operating point....

    [...]