scispace - formally typeset
Proceedings ArticleDOI

A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors

TLDR
Many artificial intelligence (AI) edge devices use nonvolatile memory (NVM) to store the weights for the neural network (trained off-line on an AI server), and require low-energy and fast I/O accesses.
Abstract
Many artificial intelligence (AI) edge devices use nonvolatile memory (NVM) to store the weights for the neural network (trained off-line on an AI server), and require low-energy and fast I/O accesses. The deep neural networks (DNN) used by AI processors [1,2] commonly require p-layers of a convolutional neural network (CNN) and q-layers of a fully-connected network (FCN). Current DNN processors that use a conventional (von-Neumann) memory structure are limited by high access latencies, I/O energy consumption, and hardware costs. Large working data sets result in heavy accesses across the memory hierarchy, moreover large amounts of intermediate data are also generated due to the large number of multiply-and-accumulate (MAC) operations for both CNN and FCN. Even when binary-based DNN [3] are used, the required CNN and FCN operations result in a major memory I/O bottleneck for AI edge devices.

read more

Citations
More filters
Journal ArticleDOI

SLIM: Simultaneous Logic-in-Memory Computing Exploiting Bilayer Analog OxRAM Devices.

TL;DR: This paper proposes a novel ‘Simultaneous Logic in-Memory’ (SLIM) methodology which is complementary to existing LIM approaches in literature and demonstrates novel SLIM bitcells comprising non-filamentary bilayer analog OxRAM devices with NMOS transistors.
Journal ArticleDOI

Neuro-inspired computing chips

TL;DR: The development of neuro-inspired computing chips and their key benchmarking metrics are reviewed, providing a co-design tool chain and proposing a roadmap for future large-scale chips are provided and a future electronic design automation tool chain is proposed.
Journal ArticleDOI

Reinforcement learning with analogue memristor arrays

TL;DR: An experimental demonstration of reinforcement learning on a three-layer 1-transistor 1-memristor (1T1R) network using a modified learning algorithm tailored for the authors' hybrid analogue–digital platform, which has the potential to achieve a significant boost in speed and energy efficiency.
Proceedings ArticleDOI

24.1 A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing Time for CNN Based AI Edge Processors

TL;DR: This work proposes a serial-input non-weighted product (SINWP) structure to optimize the tradeoff between area, tMAC and EMAC, and a down-scaling weighted current translator and positive-negative current- subtractor (PN-ISUB) for short delay, a small offset and a compact read-path area.
Journal ArticleDOI

Three-dimensional memristor circuits as complex neural networks

TL;DR: A three-dimensional circuit composed of eight layers of monolithically integrated memristive devices is built and used to implement complex neural networks, demonstrating accurate MNIST classification and effective edge detection in videos.
References
More filters
Journal ArticleDOI

PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory

TL;DR: This work proposes a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory, and distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving.
Proceedings ArticleDOI

14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks

TL;DR: A highly reconfigurable CNN-RNN processor with high energy-efficiency is desirable to support general-purpose deep neural networks (DNNs).
Proceedings ArticleDOI

14.4 A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating

TL;DR: IC designs for ASR and VAD are described that improve on the accuracy, programmability, and scalability of previous work.
Proceedings ArticleDOI

An offset-tolerant current-sampling-based sense amplifier for Sub-100nA-cell-current nonvolatile memory

TL;DR: This study proposes a new offset tolerant current-sampling-based SA (CSB-SA) to achieve 7× faster read speed than previous SAs for sensing small ICELL, and achieves 26ns macro random access time for reading sub-200nA ICELL.
Related Papers (5)