scispace - formally typeset
Open AccessProceedings ArticleDOI

HieIM: highly flexible in-memory computing using STT MRAM

TLDR
A Highly Flexible InMemory (HieIM) computing platform using STT MRAM, which can be leveraged to implement Boolean logic functions without sacrificing memory functionality, thus overcoming the ‘operand locality’ problem in contemporary in-memory computing platform designs is proposed.
Abstract
In this paper we propose a Highly Flexible In-Memory (HieIM) computing platform using STT MRAM, which can be leveraged to implement Boolean logic functions without sacrificing memory functionality. It could pre-process data within memory to further reduce power hungry long distance communication between memory and processing units as in Von-Neumann computing system. HieIM can implement all the Boolean logic functions (AND/NAND, OR/NOR, XOR/XNOR) between any two cells in the same memory array, thus overcoming the 'operand locality' problem in contemporary in-memory computing platform designs. To investigate the performance of HieIM, we test in-memory bulk bit-wise Boolean logic operations using different vector datasets, which shows ∼ 8× energy saving and ∼ 5× speedup compared to recent DRAM based in-memory computing platform. We further implement an in-memory data encryption engine design based on HieIM as another case study. With AES algorithm, it shows 51.5% and 68.9% lower energy consumption compared to CMOS-ASIC and CMOL based implementations, respectively.

read more

Citations
More filters
Journal ArticleDOI

C3SRAM: An In-Memory-Computing SRAM Macro Based on Robust Capacitive Coupling Computing Mechanism

TL;DR: The macro is an SRAM module with the circuits embedded in bitcells and peripherals to perform hardware acceleration for neural networks with binarized weights and activations and utilizes analog-mixed-signal capacitive-coupling computing to evaluate the main computations of binary neural networks, binary-multiply-and-accumulate operations.
Journal ArticleDOI

MRIMA: An MRAM-Based In-Memory Accelerator

TL;DR: This paper presents practical case studies to demonstrate MRIMA’s acceleration for binary-weight and low bit-width convolutional neural networks (CNNs) as well as data encryption, and shows ~77% and 21% lower energy consumption compared to CMOS-ASIC and recent domain-wall-based design, respectively.
Journal ArticleDOI

A Survey of Spintronic Architectures for Processing-in-Memory and Neural Networks

TL;DR: A survey of spintronic-architectures for PIM and NNs based on main attributes to underscore their similarities and differences will be useful for researchers in the area of artificial intelligence, hardware architecture, chip design and memory system.
Proceedings ArticleDOI

An MRAM-Based Deep In-Memory Architecture for Deep Neural Networks

TL;DR: An MRAM-based deep in-memory architecture (MRAM-DIMA) to efficiently implement multi-bit matrix vector multiplication for deep neural networks using a standard MRAM bitcell array is presented.
Journal ArticleDOI

Vesti: Energy-Efficient In-Memory Computing Accelerator for Deep Neural Networks

TL;DR: A new DNN accelerator is designed to support configurable multibit activations and large-scale DNNs seamlessly while substantially improving the chip-level energy-efficiency with favorable accuracy tradeoff compared to conventional digital ASIC.
References
More filters
Book

The Design of Rijndael: AES - The Advanced Encryption Standard

TL;DR: The underlying mathematics and the wide trail strategy as the basic design idea are explained in detail and the basics of differential and linear cryptanalysis are reworked.
Proceedings ArticleDOI

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

TL;DR: Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taking into account configuring clusters with 4 cores gives thebest EDA2P and EDAP.
Journal ArticleDOI

Hitting the memory wall: implications of the obvious

TL;DR: This work proposes an exact analysis, removing all remaining uncertainty, based on model checking, using abstract-interpretation results to prune down the model for scalability, and notably improves precision upon classical abstract interpretation at reasonable cost.
Proceedings ArticleDOI

Architecting phase change memory as a scalable dram alternative

TL;DR: This work proposes, crafted from a fundamental understanding of PCM technology parameters, area-neutral architectural enhancements that address these limitations and make PCM competitive with DRAM.
Related Papers (5)