scispace - formally typeset
Proceedings ArticleDOI

A Logic Compatible 4T Dual Embedded DRAM Array for In-Memory Computation of Deep Neural Networks

Reads0
Chats0
TLDR
This work introduces a dot-product processing macro using eDRAM array and explores its capability as an in-memory computing processing element and investigated a method to maximize the retention time in conjunction with analyzing the device mismatch.
Abstract
Modern deep neural network (DNN) systems evolved under the ever-increasing demands of handling more complex and computation-heavy tasks. Traditional hardware designed for such tasks had larger size memory and power consumption issue due to extensive on/off-chip memory access. In-memory computing, one of the promising solutions to resolve the issue, dramatically reduced memory access and improved energy efficiency by utilizing the memory cell to function as both a data storage and a computing element. Embedded DRAM (eDRAM) is one of the potential candidates for in-memory computation. Its minimal use of circuit components and low static power consumption provided design advantage while its relatively short retention time made eDRAM unsuitable for certain applications. This work introduces a dot-product processing macro using eDRAM array and explores its capability as an in-memory computing processing element. The proposed architecture implemented a pair of 2T eDRAM cells as a processing unit that can store and operate with ternary weights using only four transistors. Besides, we investigated a method to maximize the retention time in conjunction with analyzing the device mismatch. An input/weight bit-precision reconfigurable 4T eDRAM processing array shows the energy efficiency of 1.81fJ/OP (including refresh energy) when it operates with binary inputs and ternary weights.

read more

Citations
More filters
Journal ArticleDOI

Compute-in-Memory Chips for Deep Learning: Recent Trends and Prospects

TL;DR: In this paper, the authors survey the recent progresses in SRAM and RRAM-based CIM macros that have been demonstrated in silicon and discuss general design challenges of the CIM chips including analog-to-digital conversion bottleneck, variations in analog compute, and device non-idealities.
Journal ArticleDOI

Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

TL;DR: This work summarizes and compares the works for four leading platforms for the execution of algorithms such as CPU, GPU, FPGA and ASIC describing the main solutions of the state-of-the-art, giving much prominence to the last two solutions since they offer greater design flexibility and bear the potential of high energy-efficiency, especially for the inference process.
Journal ArticleDOI

A Logic-Compatible eDRAM Compute-In-Memory With Embedded ADCs for Processing Neural Networks

TL;DR: A novel 4T2C ternary embedded DRAM (eDRAM) cell is proposed for computing a vector-matrix multiplication in the memory array and a method to mitigate the compute accuracy degradation issue due to device mismatches and variations is presented.
Journal ArticleDOI

A Reconfigurable 4T2R ReRAM Computing In-Memory Macro for Efficient Edge Applications

TL;DR: In this paper, a reconfigurable ReRAM architecture using a novel 4T2R bit-cell that supports nonvolatile storage and two types of CIM operations: i) ternary content addressable memory (TCAM) and ii) in-memory dot product (IM-DP) for neural networks.
Proceedings ArticleDOI

In-Memory Computing: The Next-Generation AI Computing Paradigm

TL;DR: The recent trends of IMC from techniques (SRAM, flash, RRAM and other types of non-volatile memory) to architecture and to applications are investigated to serve as a guide to the future advances on computing in-memory (CIM).
References
More filters
Journal ArticleDOI

Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

TL;DR: Eyeriss as mentioned in this paper is an accelerator for state-of-the-art deep convolutional neural networks (CNNs) that optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, by reconfiguring the architecture.
Proceedings ArticleDOI

DaDianNao: A Machine-Learning Supercomputer

TL;DR: This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.
Journal ArticleDOI

Cnvlutin: ineffectual-neuron-free deep neural network computing

TL;DR: Cnvolutin (CNV), a value-based approach to hardware acceleration that eliminates most of these ineffectual operations, improving performance and energy over a state-of-the-art accelerator with no accuracy loss.
Journal ArticleDOI

In-Memory Computation of a Machine-Learning Classifier in a Standard 6T SRAM Array

TL;DR: A machine-learning classifier where computations are performed in a standard 6T SRAM array, which stores the machine- learning model, and a training algorithm enables a strong classifier through boosting and also overcomes circuit nonidealities, by combining multiple columns.
Journal ArticleDOI

CONV-SRAM: An Energy-Efficient SRAM With In-Memory Dot-Product Computation for Low-Power Convolutional Neural Networks

TL;DR: An energy-efficient static random access memory (SRAM) with embedded dot-product computation capability, for binary-weight convolutional neural networks, using a 10T bit-cell-based SRAM array to store the 1-b filter weights.
Related Papers (5)