Proceedings ArticleDOI
A Logic Compatible 4T Dual Embedded DRAM Array for In-Memory Computation of Deep Neural Networks
Taegeun Yoo,Hyunjoon Kim,Qian Chen,Tony Tae-Hyoung Kim,Bongjin Kim +4 more
- pp 1-6
Reads0
Chats0
TLDR
This work introduces a dot-product processing macro using eDRAM array and explores its capability as an in-memory computing processing element and investigated a method to maximize the retention time in conjunction with analyzing the device mismatch.Abstract:
Modern deep neural network (DNN) systems evolved under the ever-increasing demands of handling more complex and computation-heavy tasks. Traditional hardware designed for such tasks had larger size memory and power consumption issue due to extensive on/off-chip memory access. In-memory computing, one of the promising solutions to resolve the issue, dramatically reduced memory access and improved energy efficiency by utilizing the memory cell to function as both a data storage and a computing element. Embedded DRAM (eDRAM) is one of the potential candidates for in-memory computation. Its minimal use of circuit components and low static power consumption provided design advantage while its relatively short retention time made eDRAM unsuitable for certain applications. This work introduces a dot-product processing macro using eDRAM array and explores its capability as an in-memory computing processing element. The proposed architecture implemented a pair of 2T eDRAM cells as a processing unit that can store and operate with ternary weights using only four transistors. Besides, we investigated a method to maximize the retention time in conjunction with analyzing the device mismatch. An input/weight bit-precision reconfigurable 4T eDRAM processing array shows the energy efficiency of 1.81fJ/OP (including refresh energy) when it operates with binary inputs and ternary weights.read more
Citations
More filters
Journal ArticleDOI
Compute-in-Memory Chips for Deep Learning: Recent Trends and Prospects
TL;DR: In this paper, the authors survey the recent progresses in SRAM and RRAM-based CIM macros that have been demonstrated in silicon and discuss general design challenges of the CIM chips including analog-to-digital conversion bottleneck, variations in analog compute, and device non-idealities.
Journal ArticleDOI
Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead
Maurizio Capra,Beatrice Bussolino,Alberto Marchisio,Guido Masera,Maurizio Martina,Muhammad Shafique +5 more
TL;DR: This work summarizes and compares the works for four leading platforms for the execution of algorithms such as CPU, GPU, FPGA and ASIC describing the main solutions of the state-of-the-art, giving much prominence to the last two solutions since they offer greater design flexibility and bear the potential of high energy-efficiency, especially for the inference process.
Journal ArticleDOI
A Logic-Compatible eDRAM Compute-In-Memory With Embedded ADCs for Processing Neural Networks
Chengshuo Yu,Taegeun Yoo,Hyunjoon Kim,Tony Tae-Hyoung Kim,Kevin Chai Tshun Chuan,Bongjin Kim +5 more
TL;DR: A novel 4T2C ternary embedded DRAM (eDRAM) cell is proposed for computing a vector-matrix multiplication in the memory array and a method to mitigate the compute accuracy degradation issue due to device mismatches and variations is presented.
Journal ArticleDOI
A Reconfigurable 4T2R ReRAM Computing In-Memory Macro for Efficient Edge Applications
TL;DR: In this paper, a reconfigurable ReRAM architecture using a novel 4T2R bit-cell that supports nonvolatile storage and two types of CIM operations: i) ternary content addressable memory (TCAM) and ii) in-memory dot product (IM-DP) for neural networks.
Proceedings ArticleDOI
In-Memory Computing: The Next-Generation AI Computing Paradigm
TL;DR: The recent trends of IMC from techniques (SRAM, flash, RRAM and other types of non-volatile memory) to architecture and to applications are investigated to serve as a guide to the future advances on computing in-memory (CIM).
References
More filters
Journal ArticleDOI
Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
TL;DR: Eyeriss as mentioned in this paper is an accelerator for state-of-the-art deep convolutional neural networks (CNNs) that optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, by reconfiguring the architecture.
Proceedings ArticleDOI
DaDianNao: A Machine-Learning Supercomputer
Yunji Chen,Luo Tao,Liu Shaoli,Zhang Shijin,Liqiang He,Jia Wang,Ling Li,Tianshi Chen,Zhiwei Xu,Ninghui Sun,Olivier Temam +10 more
TL;DR: This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.
Journal ArticleDOI
Cnvlutin: ineffectual-neuron-free deep neural network computing
Jorge Albericio,Patrick Judd,Tayler Hetherington,Tor M. Aamodt,Natalie Enright Jerger,Andreas Moshovos +5 more
TL;DR: Cnvolutin (CNV), a value-based approach to hardware acceleration that eliminates most of these ineffectual operations, improving performance and energy over a state-of-the-art accelerator with no accuracy loss.
Journal ArticleDOI
In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array
TL;DR: A machine-learning classifier where computations are performed in a standard 6T SRAM array, which stores the machine- learning model, and a training algorithm enables a strong classifier through boosting and also overcomes circuit nonidealities, by combining multiple columns.
Journal ArticleDOI
CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks
TL;DR: An energy-efficient static random access memory (SRAM) with embedded dot-product computation capability, for binary-weight convolutional neural networks, using a 10T bit-cell-based SRAM array to store the 1-b filter weights.