scispace - formally typeset
Search or ask a question
Author

Ivan Vo

Bio: Ivan Vo is an academic researcher from IBM. The author has contributed to research in topics: Voltage regulator & CMOS. The author has an hindex of 11, co-authored 22 publications receiving 2976 citations.

Papers
More filters
Journal ArticleDOI
08 Aug 2014-Science
TL;DR: Inspired by the brain’s structure, an efficient, scalable, and flexible non–von Neumann architecture is developed that leverages contemporary silicon technology and is well suited to many applications that use complex neural networks in real time, for example, multiobject detection and classification.
Abstract: Inspired by the brain’s structure, we have developed an efficient, scalable, and flexible non–von Neumann architecture that leverages contemporary silicon technology. To demonstrate, we built a 5.4-billion-transistor chip with 4096 neurosynaptic cores interconnected via an intrachip network that integrates 1 million programmable spiking neurons and 256 million configurable synapses. Chips can be tiled in two dimensions via an interchip communication interface, seamlessly scaling the architecture to a cortexlike sheet of arbitrary size. The architecture is well suited to many applications that use complex neural networks in real time, for example, multiobject detection and classification. With 400-pixel-by-240-pixel video input at 30 frames per second, the chip consumes 63 milliwatts.

3,253 citations

Patent
01 Feb 2010
TL;DR: In this paper, a bit line connected to each of a first plurality of eDRAM cells is controlled by cell control lines tied to each cell, and during a READ operation, the EDRAM cell releases its charge indicating its digital state.
Abstract: Embedded dynamic random access memory (eDRAM) sense amplifier circuitry in which a bit line connected to each of a first plurality of eDRAM cells is controlled by cell control lines tied to each of the cells. During a READ operation the eDRAM cell releases its charge indicating its digital state. The digital charge propagates through the eDRAM sense amplifier circuitry to a mid-rail amplifier inverter circuit which amplifies the charge and provides it to a latch circuit. The latch circuit, in turn, inverts the charge to correctly represent at its output the logical value stored in the eDRAM cell being read, and returns the charge through the eDRAM sense amplifier circuitry to replenish the eDRAM cell.

124 citations

Proceedings ArticleDOI
16 Nov 2014
TL;DR: True North is a 4,096 core, 1 million neuron, and 256 million synapse brain-inspired neurosynaptic processor, that consumes 65mW of power running at real-time and delivers performance of 46 Giga-Synaptic OPS/Watt.
Abstract: Drawing on neuroscience, we have developed a parallel, event-driven kernel for neurosynaptic computation, that is efficient with respect to computation, memory, and communication. Building on the previously demonstrated highly optimized software expression of the kernel, here, we demonstrate True North, a co-designed silicon expression of the kernel. True North achieves five orders of magnitude reduction in energy to-solution and two orders of magnitude speedup in time-to solution, when running computer vision applications and complex recurrent neural network simulations. Breaking path with the von Neumann architecture, True North is a 4,096 core, 1 million neuron, and 256 million synapse brain-inspired neurosynaptic processor, that consumes 65mW of power running at real-time and delivers performance of 46 Giga-Synaptic OPS/Watt. We demonstrate seamless tiling of True North chips into arrays, forming a foundation for cortex-like scalability. True North's unprecedented time-to-solution, energy-to-solution, size, scalability, and performance combined with the underlying flexibility of the kernel enable a broad range of cognitive applications.

82 citations

Journal ArticleDOI
05 Feb 1998
TL;DR: This 64 b single-issue integer processor, comprised of about one million transistors, is fabricated in a 0.15 /spl mu/m effective channel length, six-metal-layer CMOS technology and intended as a vehicle to explore circuit, clocking, microarchitecture, and methodology options for high-frequency processors.
Abstract: This 64 b single-issue integer processor, comprised of about one million transistors, is fabricated in a 0.15 /spl mu/m effective channel length, six-metal-layer CMOS technology. Intended as a vehicle to explore circuit, clocking, microarchitecture, and methodology options for high-frequency processors, the processor prototype implements 60 fixed-point compare, logical, arithmetic, and rotate-merge-mask instructions of the PowerPC instruction-set architecture with single-cycle latency. The processor executes programs written in this instruction subset from cache with a 1 ns cycle. In addition, the prototype implements 36 PowerPC load/store instructions that execute as single-cycle operations (zero wait cycles) with 1.15 ns latency. Full data forwarding and full at speed scan testing are supported.

75 citations

Proceedings ArticleDOI
05 Oct 1998
TL;DR: The design methodology used to build an experimental 1.0 GigaHertz PowerPC integer microprocessor at IBM's Austin Research Laboratory will cover design and verification tools as well as circuit constraints and microarchitecture philosophy.
Abstract: This paper describes the design methodology used to build an experimental 1.0 GigaHertz PowerPC integer microprocessor at IBM's Austin Research Laboratory. The high frequency requirements dictated the chip composition to be almost entirely custom macros using dynamic circuit techniques. The methodology presented will cover design and verification tools as well as circuit constraints and microarchitecture philosophy. The microarchitecture, circuits and tools were defined by the high frequency requirements of the processor as well as the aggressive design schedule and size of the design team.

49 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

14,635 citations

Journal ArticleDOI
20 Nov 2017
TL;DR: In this paper, the authors provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs, and discuss various hardware platforms and architectures that support DNN, and highlight key trends in reducing the computation cost of deep neural networks either solely via hardware design changes or via joint hardware and DNN algorithm changes.
Abstract: Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic codesigns, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the tradeoffs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.

2,391 citations

Journal ArticleDOI
TL;DR: Loihi is a 60-mm2 chip fabricated in Intels 14-nm process that advances the state-of-the-art modeling of spiking neural networks in silicon, and can solve LASSO optimization problems with over three orders of magnitude superior energy-delay-product compared to conventional solvers running on a CPU iso-process/voltage/area.
Abstract: Loihi is a 60-mm2 chip fabricated in Intels 14-nm process that advances the state-of-the-art modeling of spiking neural networks in silicon. It integrates a wide range of novel features for the field, such as hierarchical connectivity, dendritic compartments, synaptic delays, and, most importantly, programmable synaptic learning rules. Running a spiking convolutional form of the Locally Competitive Algorithm, Loihi can solve LASSO optimization problems with over three orders of magnitude superior energy-delay-product compared to conventional solvers running on a CPU iso-process/voltage/area. This provides an unambiguous example of spike-based computation, outperforming all known conventional solutions.

2,331 citations

Journal ArticleDOI
07 May 2015-Nature
TL;DR: The experimental implementation of transistor-free metal-oxide memristor crossbars, with device variability sufficiently low to allow operation of integrated neural networks, in a simple network: a single-layer perceptron (an algorithm for linear classification).
Abstract: Despite much progress in semiconductor integrated circuit technology, the extreme complexity of the human cerebral cortex, with its approximately 10(14) synapses, makes the hardware implementation of neuromorphic networks with a comparable number of devices exceptionally challenging. To provide comparable complexity while operating much faster and with manageable power dissipation, networks based on circuits combining complementary metal-oxide-semiconductors (CMOSs) and adjustable two-terminal resistive devices (memristors) have been developed. In such circuits, the usual CMOS stack is augmented with one or several crossbar layers, with memristors at each crosspoint. There have recently been notable improvements in the fabrication of such memristive crossbars and their integration with CMOS circuits, including first demonstrations of their vertical integration. Separately, discrete memristors have been used as artificial synapses in neuromorphic networks. Very recently, such experiments have been extended to crossbar arrays of phase-change memristive devices. The adjustment of such devices, however, requires an additional transistor at each crosspoint, and hence these devices are much harder to scale than metal-oxide memristors, whose nonlinear current-voltage curves enable transistor-free operation. Here we report the experimental implementation of transistor-free metal-oxide memristor crossbars, with device variability sufficiently low to allow operation of integrated neural networks, in a simple network: a single-layer perceptron (an algorithm for linear classification). The network can be taught in situ using a coarse-grain variety of the delta rule algorithm to perform the perfect classification of 3 × 3-pixel black/white images into three classes (representing letters). This demonstration is an important step towards much larger and more complex memristive neuromorphic networks.

2,222 citations

Journal ArticleDOI
TL;DR: The Computational Brain this paper provides a broad overview of neuroscience and computational theory, followed by a study of some of the most recent and sophisticated modeling work in the context of relevant neurobiological research.

1,472 citations