Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory

doi:10.1145/3007787.3001178

Journal ArticleDOI

Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory

- Vol. 44, Iss: 3, pp 380-392

TLDR

The basic architecture of the Neurocube is presented and an analysis of the logic tier synthesized in 28nm and 15nm process technologies are presented and the performance is evaluated through the mapping of a Convolutional Neural Network and estimating the subsequent power and performance for both training and inference.

Abstract:

This paper presents a programmable and scalable digital neuromorphic architecture based on 3D high-density memory integrated with logic tier for efficient neural computing. The proposed architecture consists of clusters of processing engines, connected by 2D mesh network as a processing tier, which is integrated in 3D with multiple tiers of DRAM. The PE clusters access multiple memory channels (vaults) in parallel. The operating principle, referred to as the memory centric computing, embeds specialized state-machines within the vault controllers of HMC to drive data into the PE clusters. The paper presents the basic architecture of the Neurocube and an analysis of the logic tier synthesized in 28nm and 15nm process technologies. The performance of the Neurocube is evaluated and illustrated through the mapping of a Convolutional Neural Network and estimating the subsequent power and performance for both training and inference.

Citations

PDF

Open Access

More filters

Posted Content

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, +74 more

- 16 Apr 2017 -

arXiv: Hardware Architecture

TL;DR: This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN) and compares it to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the samedatacenters.

...read moreread less

Proceedings ArticleDOI

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, +75 more

TL;DR: The Tensor Processing Unit (TPU) as discussed by the authors is a custom ASIC deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN) using a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS).

...read moreread less

Journal ArticleDOI

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

Vivienne Sze, +3 more

TL;DR: In this paper, the authors provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs, and discuss various hardware platforms and architectures that support DNN, and highlight key trends in reducing the computation cost of deep neural networks either solely via hardware design changes or via joint hardware and DNN algorithm changes.

...read moreread less

Journal ArticleDOI

ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars

Ali Shafiee, +7 more

TL;DR: This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner.

...read moreread less

Posted Content

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

Vivienne Sze, +3 more

- 27 Mar 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, the authors provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs, and discuss various hardware platforms and architectures that support deep neural networks.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Learning to Forget: Continual Prediction with LSTM

Felix A. Gers, +2 more

- 01 Oct 2000 -

Neural Computation

TL;DR: This work identifies a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset, and proposes a novel, adaptive forget gate that enables an LSTm cell to learn to reset itself at appropriate times, thus releasing internal resources.

...read moreread less

Journal ArticleDOI

Principles of neurodynamics. perceptrons and the theory of brain mechanisms

Frank Rosenblatt

- 15 Mar 1961 -

American Journal of Psychology

TL;DR: The background, basic sources of data, concepts, and methodology to be employed in the study of perceptrons are reviewed, and some of the notation to be used in later sections are presented.

...read moreread less

Proceedings ArticleDOI

DaDianNao: A Machine-Learning Supercomputer

Yunji Chen, +10 more

TL;DR: This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.

...read moreread less

Book ChapterDOI

Deep Learning, Neural Networks

Ivo D. Dinov

TL;DR: Deep learning is a special branch of machine learning using a collage of algorithms to model high-level data motifs using multiplier layers of nodes and many edges linking the nodes forming input/output (I/O) layered grids representing a multiscale processing network.

...read moreread less

Proceedings ArticleDOI

Decomposing a scene into geometric and semantically consistent regions

Stephen Gould, +2 more

TL;DR: A region-based model which combines appearance and scene geometry to automatically decompose a scene into semantically meaningful regions and which achieves state-of-the-art performance on the tasks of both multi-class image segmentation and geometric reasoning.

...read moreread less

Collapse

Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory

Citations

In-Datacenter Performance Analysis of a Tensor Processing Unit

In-Datacenter Performance Analysis of a Tensor Processing Unit

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

References

Learning to Forget: Continual Prediction with LSTM

Principles of neurodynamics. perceptrons and the theory of brain mechanisms

DaDianNao: A Machine-Learning Supercomputer

Deep Learning, Neural Networks

Decomposing a scene into geometric and semantically consistent regions

Related Papers (5)

DaDianNao: A Machine-Learning Supercomputer

ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars

DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning

EIE: efficient inference engine on compressed deep neural network

In-Datacenter Performance Analysis of a Tensor Processing Unit