Showing papers on "Sense amplifier published in 2020"

PDF

Open Access

Journal Article•DOI•

Embedded 1-Mb ReRAM-Based Computing-in- Memory Macro With Multibit Input and Weight for CNN-Based AI Edge Processors

[...]

Cheng-Xin Xue¹, Ting-Wei Chang¹, Tung-Cheng Chang¹, Hui-Yao Kao¹, Yen-Cheng Chiu¹, Chun-Ying Lee¹, Ya-Chin King¹, Chrong Jung Lin¹, Ren-Shuo Liu¹, Chih-Cheng Hsieh¹, Kea-Tiong Tang¹, Wei-Hao Chen¹, Meng-Fan Chang¹, Je-Syu Liu¹, Jiafang Li¹, Wei-Yu Lin¹, Wei-En Lin¹, Jing-Hong Wang¹, Wei-Chen Wei¹, Tsung-Yuan Huang¹ - Show less +16 more•Institutions (1)

National Tsing Hua University¹

01 Jan 2020-IEEE Journal of Solid-state Circuits

TL;DR: This article proposes a serial-input non-weighted product (SINWP) structure; a down-scaling weighted current translator and positive–negative current-subtractor scheme; a current-aware bitline clamper scheme; and a triple-margin small-offset current-mode sense amplifier (TMCSA).

...read moreread less

Abstract: Computing-in-memory (CIM) based on embedded nonvolatile memory is a promising candidate for energy-efficient multiply-and-accumulate (MAC) operations in artificial intelligence (AI) edge devices. However, circuit design for NVM-based CIM (nvCIM) imposes a number of challenges, including an area-latency-energy tradeoff for multibit MAC operations, pattern-dependent degradation in signal margin, and small read margin. To overcome these challenges, this article proposes the following: 1) a serial-input non-weighted product (SINWP) structure; 2) a down-scaling weighted current translator (DSWCT) and positive–negative current-subtractor (PN-ISUB); 3) a current-aware bitline clamper (CABLC) scheme; and 4) a triple-margin small-offset current-mode sense amplifier (TMCSA). A 55-nm 1-Mb ReRAM-CIM macro was fabricated to demonstrate the MAC operation of 2-b-input, 3-b-weight with 4-b-out. This nvCIM macro achieved $T_{\text {MAC}}= 14.6$ ns at 4-b-out with peak energy efficiency of 53.17 TOPS/W.

...read moreread less

76 citations

Proceedings Article•DOI•

A 16K Current-Based 8T SRAM Compute-In-Memory Macro with Decoupled Read/Write and 1-5bit Column ADC

[...]

Chengshuo Yu¹, Taegeun Yoo², Tony Tae-Hyoung Kim², Kevin Chai Tshun Chuan¹, Bongjin Kim² - Show less +1 more•Institutions (2)

Agency for Science, Technology and Research¹, Nanyang Technological University²

22 Mar 2020

TL;DR: A novel 8T SRAM -based bitcell is proposed for current-based compute-in-memory dot-product operations and Monte-Carlo simulations and test-chip measurement results have verified both linearity and process variation.

...read moreread less

Abstract: A novel 8T SRAM -based bitcell is proposed for current-based compute-in-memory dot-product operations. The proposed bitcell with two extra NMOS transistors (vs. standard 6T SRAM) decouples SRAM read and write operation. A 128×128 8T SRAM bitcell array is built for processing a vector-matrix multiplication (or parallel dot-products) with 64x binary (0 or 1) inputs, 64×128 binary (-1 or +1) weights, and 128x 1-5bit outputs. Each column (i.e. neuron) of the proposed SRAM compute-in-memory macro consists of 64x bitcells for dot-product, 32x bitcells for ADC, and 32x bitcells for calibration. The column-based neuron minimizes the ADC overhead by reusing a sense amplifier for SRAM read. The column-wise ADC converts the analog dot-product results to N-bit output codes (N=1 to 5) by sweeping reference levels using replica bitcells for 2N-1 cycles for each conversion. Monte-Carlo simulations and test-chip measurement results have verified both linearity and process variation. The largest variation (σ=2.48%) results in the MNIST classification accuracy of 96.2% (i.e. 0.4% lower than a baseline with no variation). A test-chip is fabricated using 65nm, and the 16K SRAM bitcell array occupies 0.055mm2. The energy efficiency of the 1bit operation is 490-to-15.8TOPS/W at 1-5bit ADC mode using 0.45/0.8V core supply and 200MHz.

...read moreread less

66 citations

Journal Article•DOI•

A Self-Timed Voltage-Mode Sensing Scheme With Successive Sensing and Checking for STT-MRAM

[...]

Yongliang Zhou¹, Hao Cai¹, Lei Xie¹, Menglin Han¹, Mingyue Liu¹, Shi Xu, Bo Liu¹, Weisheng Zhao, Jun Yang¹ - Show less +5 more•Institutions (1)

Southeast University¹

08 Jan 2020-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: A self-timed voltage-mode sense scheme named ST-VSS which can enable optimal timing depending on the bit-cell discharging ability is proposed which is applied to a 32bits/word MRAM using 28-nm CMOS process.

...read moreread less

Abstract: In Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM), the most commonly used timing scheme for conventional Voltage-mode Sense Amplifier (VSA) is the global activated timing. Obviously this method cannot obtain the optimal yield because different bit-cells have its sensing latency respectively. This paper proposes a self-timed voltage-mode sense scheme named ST-VSS which can enable optimal timing depending on the bit-cell discharging ability. Two circuit structures are proposed: The single SA structure uses a multiplexer at the input of the SA. Its successive sensing operations are implemented with input offset flipping. A dual SA structure is reconfigured by built-in-self-test (BIST) method to the opposite offset states to monitor sensing results from each other. The sensing operation can be immediately terminated after successful reading. The proposed ST-VSS is applied to a 32bits/word MRAM using 28-nm CMOS process. Simulation results show that the successful sensing rate across a wide range of voltages can be improved, comparing with the conventional scheme. The single SA structure obtains 32%~42% yield improvement, costs 44.1%/26.9%/19.3% energy, and brings 8.3%/5.8%/2.9% layout area penalty in 128/256/512 column depth, respectively. The dual SA structure gets 54%~65% yield improvement, costs 66.2%/38.6%/27.5% energy, and brings 26.4%/13.8%/7.1% area penalty in 128/256/512 column depth conditions, respectively.

...read moreread less

31 citations

Proceedings Article•DOI•

CLR-DRAM: a low-cost DRAM architecture enabling dynamic capacity-latency trade-off

[...]

Haocong Luo¹, Taha Shahroodi², Hasan Hassan², Minesh Patel², A. Giray Yaglikci², Lois Orosa², Jisung Park², Onur Mutlu² - Show less +4 more•Institutions (2)

ShanghaiTech University¹, ETH Zurich²

30 May 2020

TL;DR: This paper proposes Capacity-Latency-Reconfigurable DRAM (CLR-DRAM), a new DRAM architecture that enables dynamic capacity-latency trade-off at low cost and can improve system performance and DRAM energy consumption with four-core multiprogrammed workloads.

...read moreread less

Abstract: DRAM is the prevalent main memory technology, but its long access latency can limit the performance of many workloads. Although prior works provide DRAM designs that reduce DRAM access latency, their reduced storage capacities hinder the performance of workloads that need large memory capacity. Because the capacity-latency trade-off is fixed at design time, previous works cannot achieve maximum performance under very different and dynamic workload demands. This paper proposes Capacity-Latency-Reconfigurable DRAM (CLR-DRAM), a new DRAM architecture that enables dynamic capacity-latency trade-off at low cost. CLR-DRAM allows dynamic reconfiguration of any DRAM row to switch between two operating modes: 1) max-capacity mode, where every DRAM cell operates individually to achieve approximately the same storage density as a density-optimized commodity DRAM chip and 2) high-performance mode, where two adjacent DRAM cells in a DRAM row and their sense amplifiers are coupled to operate as a single low-latency logical cell driven by a single logical sense amplifier. We implement CLR-DRAM by adding isolation transistors in each DRAM subarray. Our evaluations show that CLR-DRAM can improve system performance and DRAM energy consumption by 18.6% and 29.7% on average with four-core multiprogrammed workloads. We believe that CLR-DRAM opens new research directions for a system to adapt to the diverse and dynamically changing memory capacity and access latency demands of workloads.

...read moreread less

28 citations

Journal Article•DOI•

Hybrid Spin-CMOS Polymorphic Logic Gate With Application in In-Memory Computing

[...]

Shaahin Angizi¹, Zhezhi He², An Chen³, Deliang Fan²•Institutions (3)

University of Central Florida¹, Arizona State University², Semiconductor Research Corporation³

10 Jan 2020-IEEE Transactions on Magnetics

TL;DR: A novel processing-in-memory architecture (HPLG-PIM) for highly flexible, efficient, and secure logic computation that exploits a hardware-friendly approach to implement the complex logic functions between multiple operands combining a reconfigurable sense amplifier and an HPLG unit to reduce the latency and the power-hungry data movement further.

...read moreread less

Abstract: In this article, we initially present a hybrid spin-CMOS polymorphic logic gate (HPLG) using a novel 5-terminal magnetic domain wall motion device. The proposed HPLG is able to perform a full set of 1and 2-input Boolean logic functions (i.e., NOT, AND/NAND, OR/NOR, and XOR/XNOR) by configuring the applied keys. We further show that our proposed HPLG could become a promising hardware security primitive to address IC counterfeiting or reverse engineering by logic locking and polymorphic transformation. The experimental results on a set of ISCAS-89, ITC-99, and Ecole Polytechnique Federale de Lausanne (EPFL) benchmarks show that HPLG obtains up to 51.4% and 10% average performance improvements on the power-delay product (PDP) compared with recent non-volatile logic and CMOS-based designs, respectively. We then leverage this gate to realize a novel processing-in-memory architecture (HPLG-PIM) for highly flexible, efficient, and secure logic computation. Instead of integrating complex logic units in cost-sensitive memory, this architecture exploits a hardware-friendly approach to implement the complex logic functions between multiple operands combining a reconfigurable sense amplifier and an HPLG unit to reduce the latency and the power-hungry data movement further. The device-to-architecture co-simulation results for widely used graph processing tasks running on three social network data sets indicate roughly 3.6× higher energy efficiency and 5.3× speedup over recent resistive RAM (ReRAM) accelerators. In addition, an HPLG-PIM achieves ~4× higher energy efficiency and 5.1× speedup over recent processing-in-DRAM acceleration methods.

...read moreread less

27 citations

Posted Content•

CLR-DRAM: A Low-Cost DRAM Architecture Enabling Dynamic Capacity-Latency Trade-Off

[...]

Haocong Luo, Taha Shahroodi, Hasan Hassan, Minesh Patel, Abdullah Giray Yağlıkçı, Lois Orosa, Jisung Park, Onur Mutlu - Show less +4 more

26 May 2020-arXiv: Hardware Architecture

TL;DR: In this article, the authors propose a new DRAM architecture that enables dynamic capacity-latency trade-off at low cost, which is called Capacity-Latency-Reconfigurable DRAM (CLR-DRAM).

...read moreread less

26 citations

Journal Article•DOI•

BVA-NQSL: A Bio-Inspired Variation-Aware Nonvolatile Quaternary Spintronic Latch

[...]

Abdolah Amirany¹, Kian Jafari¹, Mohammad Hossein Moaiyeri¹•Institutions (1)

Shahid Beheshti University¹

10 Nov 2020-IEEE Magnetics Letters

TL;DR: In this article, a bio-inspired variation-aware nonvolatile quaternary latch (Qlatch) was proposed to reduce standby power without the need for extra component and data loss.

...read moreread less

Abstract: Quaternary memory and logic circuits have been studied by researchers, as they can provide denser integrated circuits and subsequently lower area and power consumption via succinct interconnects. Using multithreshold gate-all-around carbon nanotube field-effect transistors and the nonvolatile feature of magnetic tunnel junctions (MTJs), this letter proposes a bio-inspired variation-aware nonvolatile quaternary latch (Qlatch). Nonvolatility allows the system to be completely powered off during the idle state to reduce standby power significantly without the need for extra component and data loss. Moreover, thanks to the bio-inspired structure of the Qlatch, and the fact that no sense amplifier is used in this circuit to read the MTJs, the Qlatch is more robust to process variations. Simulation results show that the nonvolatile Qlatch consumes 7% less dynamic power and 12% less static power than the conventional Qlatch, has a 61% lower delay, and has 68% lower power delay product.

...read moreread less

15 citations

Journal Article•DOI•

Signal Integrity Design and Analysis of 3-D X-Point Memory Considering Crosstalk and IR Drop for Higher Performance Computing

[...]

Kyungjune Son¹, Kyungjun Cho¹, Subin Kim¹, Shinyoung Park¹, Daniel H. Jung¹, Junyong Park¹, Gapyeol Park¹, Seongguk Kim¹, Taein Shin¹, Youngwoo Kim¹, Joungho Kim¹ - Show less +7 more•Institutions (1)

KAIST¹

30 Mar 2020-IEEE Transactions on Components, Packaging and Manufacturing Technology

TL;DR: Signal integrity (SI) is used to design and analyze 3-D X-Point memory, including a phase-change memory (PCM) cell, ovonic threshold switch (OTS) selector, interconnection lines, and peripheral circuits, including decoder, sense amplifier, and analog-to-digital converter.

...read moreread less

Abstract: In this article, we, for the first time, used signal integrity (SI) to design and analyze 3-D X-Point memory, including a phase-change memory (PCM) cell, ovonic threshold switch (OTS) selector, interconnection lines, and peripheral circuits. With the narrow space and the long interconnection lines that come with 20-nm process technology, crosstalk and IR drop can degrade the voltage margin of the memory cell and affect the memory operation. For SI analysis considering crosstalk and IR drop, the unit size of the memory array tile was considered in designing the interconnection lines. Crosstalk and IR drop are analyzed using full 3-D electromagnetic and circuit simulations. To cover practical conditions, the PCM cell and OTS selector are modeled as behavior models using Verilog-A modules, respectively. Also, the word lines (WLs) and bit lines (BLs) of 3-D X-Point memory are modeled to resistance and capacitance by ANSYS Q3D extractor. The core peripheral circuits, such as decoder, sense amplifier, and analog-to-digital converter, are included in the circuit simulation. To verify the proposed design and analysis, a transient simulation was conducted considering crosstalk and IR drop of 3-D X-Point memory. A tradeoff relationship between crosstalk and IR drop in the interconnection designs was verified. Additionally, to suppress crosstalk and reduce IR drop, the new design of the interconnection lines considering the tradeoff between SI issues is proposed. The newly proposed interconnection design shows 30% improvement in the voltage margin considering the IR drop issues and under 10% enhancement of crosstalk noise. It is expected that the SI analysis and design methodologies could be widely applied in other new memory developments.

...read moreread less

15 citations

Journal Article•DOI•

Binary Addition in Resistance Switching Memory Array by Sensing Majority.

[...]

John Reuben¹•Institutions (1)

University of Erlangen-Nuremberg¹

14 May 2020-Micromachines

TL;DR: A technique is proposed to implement a majority gate in a memory array in an energy-efficient manner as a memory READ operation and the proposed logic family disintegrates arithmetic operations to majority and NOT operations which are implemented as memory READ and WRITE operations.

...read moreread less

Abstract: The flow of data between processing and memory units in contemporary computing systems is their main performance and energy-efficiency bottleneck, often referred to as the ‘von Neumann bottleneck’ or ‘memory wall’. Emerging resistance switching memories (memristors) show promising signs to overcome the ‘memory wall’ by enabling computation in the memory array. Majority logic is a type of Boolean logic, and in many nanotechnologies, it has been found to be an efficient logic primitive. In this paper, a technique is proposed to implement a majority gate in a memory array. The majority gate is realised in an energy-efficient manner as a memory R E A D operation. The proposed logic family disintegrates arithmetic operations to majority and NOT operations which are implemented as memory R E A D and W R I T E operations. A 1-bit full adder can be implemented in 6 steps (memory cycles) in a 1T–1R array, which is faster than I M P L Y , N A N D , N O R and other similar logic primitives.

...read moreread less

15 citations

Proceedings Article•DOI•

A 14.7Mb/mm 2 28nm FDSOI STT-MRAM with Current Starved Read Path, 52Ω/Sigma Offset Voltage Sense Amplifier and Fully Trimmable CTAT Reference

[...]

El Mehdi Boujamaa, Samsudeen Mohamed Ali, Steve Ngueya Wandji, Alexandra Gourio, Suk-Soo Pyo¹, Gwan-Hyeob Koh¹, Yoon-Jong Song¹, Taejoong Song¹, Jongwook Kye¹, Jean Christophe Vial, Andrew Sowden, Manuj Rathor, Cyrille Dray - Show less +9 more•Institutions (1)

Samsung¹

16 Jun 2020

TL;DR: A read circuitry that tackles all STT-MRAM read challenges by using a negative temperature coefficient (NTC) reference based on an MTJ in series with an “NTC” resistor circuit emulator is presented.

...read moreread less

Abstract: In this paper we present a read circuitry that tackles all STT-MRAM read challenges. First, a negative temperature coefficient (NTC) reference based on an MTJ in series with an “NTC” resistor circuit emulator is described. Then, an offset cancelled voltage sense amplifier using low read current and reference averaging is discussed. Measurement results show a maximum of 2% reference impedance error (vs. ideal) and 1.7% read error rate degradation (vs. technology intrinsic defectivity rate). A 14.7Mb/mm2 memory density is also achieved, which is the best STT-MRAM published density for embedded applications.

...read moreread less

14 citations

Journal Article•DOI•

A 1036-F 2 /Bit High Reliability Temperature Compensated Cross-Coupled Comparator-Based PUF

[...]

Qiang Zhao¹, Yiheng Wu¹, Xiaojin Zhao¹, Yuan Cao², Chip-Hong Chang³ - Show less +1 more•Institutions (3)

Shenzhen University¹, Hohai University², Nanyang Technological University³

31 Mar 2020-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: The proposed compact physical unclonable function (PUF) based on cross-coupled comparator has the lowest native response instability and the unpredictability of the fabricated PUF chips is validated by autocorrelation function and NIST randomness tests.

...read moreread less

Abstract: In this article, a compact physical unclonable function (PUF) based on cross-coupled comparator is presented. Featuring a positive feedback response generation mechanism, the mismatch in analog signals between the cross-coupled transistor pair is quickly amplified to prevent its polarity from flipping by the temporal noise. The rapid enlargement of noise margin by the sense amplifier also contributes to stabilizing the response against supply voltage variations. To improve its temperature stability, the counteracting effect of complementary-to-absolute-temperature (CTAT) and proportional-to-absolute-temperature (PTAT) drives are considered in sizing the bit cell transistors. The proposed design is fabricated in a standard 65-nm CMOS process. The bit cell occupies an area of only $4.38~\mu \text {m}^{2}$ (i.e., $1036~F^{2}$ ), and the overall PUF chip consumes 2.98 pJ/bit at the throughput of 8 Mb/s, of which only 1.61 pJ/bit is due to the PUF’s core. With the uniqueness measured to be 49.53%, the unpredictability of the fabricated PUF chips is validated by autocorrelation function and NIST randomness tests. Compared with the state-of-the-art implementations, the proposed PUF has the lowest native response instability of 1.46% with 500 repeated PUF readouts at 27 °C and 1.2 V. By varying the operating temperature from −50 °C to 150 °C in a step size of 10 °C and the supply voltage from 1.0 to 1.4 V in a step size of 0.1 V simultaneously, the average reliability of the proposed PUF obtained from the 2-D plot of all operating conditions is found to be 96.87% without correction and 99.31% with spatial majority voting (SMV).

...read moreread less

Journal Article•DOI•

NAND Flash Based Novel Synaptic Architecture for Highly Robust and High-Density Quantized Neural Networks With Binary Neuron Activation of (1, 0)

[...]

Sung-Tae Lee¹, Dongseok Kwon¹, Hyeongsu Kim¹, Honam Yoo¹, Jong-Ho Lee¹ - Show less +1 more•Institutions (1)

Seoul National University¹

22 Jun 2020-IEEE Access

TL;DR: The low-variance conductance distribution of the NAND cells achieves a higher inference accuracy compared to that of resistive random access memory (RRAM) devices by 2~7 % and 0.04~0.23 % for CIFAR 10 and MNIST datasets, respectively.

...read moreread less

Abstract: We propose a novel synaptic architecture based on a NAND flash memory for highly robust and high-density quantized neural networks (QNN) with 4-bit weight and binary neuron activation, for the first time The proposed synaptic architecture is fully compatible with the conventional NAND flash memory architecture by adopting a differential sensing scheme and a binary neuron activation of (1, 0) A binary neuron enables using a 1-bit sense amplifier, which significantly reduces the burden of peripheral circuits and power consumption and enables bitwise communication between the layers of neural networks Operating NAND cells in the saturation region eliminates the effect of metal wire resistance and serial resistance of the NAND cells With a read-verify-write (RVW) scheme, low-variance conductance distribution is demonstrated for 8 levels Vector-matrix multiplication (VMM) of a 4-bit weight and binary activation can be accomplished by only one input pulse, eliminating the need of a multiplier and an additional logic operation In addition, quantization training can minimize the degradation of the inference accuracy compared to post-training quantization Finally, the low-variance conductance distribution of the NAND cells achieves a higher inference accuracy compared to that of resistive random access memory (RRAM) devices by 2~7 % and 004~023 % for CIFAR 10 and MNIST datasets, respectively

...read moreread less

Journal Article•DOI•

TS Cache: A Fast Cache With Timing-Speculation Mechanism Under Low Supply Voltages

[...]

Shan Shen¹, Tianxiang Shao¹, Xiaojing Shang¹, Yichen Guo¹, Ming Ling¹, Jun Yang¹, Longxing Shi¹ - Show less +3 more•Institutions (1)

Southeast University¹

01 Jan 2020-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: In this paper, the authors proposed a timing-speculation (TS) cache to boost the cache frequency and improve energy efficiency under low supply voltages, where the voltage differences of bitlines (BLs) are continuously evaluated twice by a sense amplifier (SA).

...read moreread less

Abstract: To mitigate the ever-worsening “power wall” problem, more and more applications need to expand their working voltage to the wide-voltage range including the near-threshold region. However, the read delay distribution of the static random access memory (SRAM) cells under the near-threshold voltage shows a more serious long-tail characteristic than that under the nominal voltage due to the process fluctuation. Such degradation of SRAM delay makes the SRAM-based cache a performance bottleneck of systems as well. To avoid unreliable data reading, circuit-level studies use larger/more transistors in a bitcell by sacrificing chip area and the static power of cache arrays. Architectural studies propose the auxiliary error correction or block disabling/remapping methods in fault-tolerant caches, which worsen both the hit latency and energy efficiency due to the complex accessing logic. This article proposes a timing-speculation (TS) cache to boost the cache frequency and improve energy efficiency under low supply voltages. In the TS cache, the voltage differences of bitlines (BLs) are continuously evaluated twice by a sense amplifier (SA), and the access timing error can be detected much earlier than that in prior methods. According to the measurement results from the fabricated chips, the TS L1 cache aggressively increases its frequency to $1.62\times $ and $1.92\times $ compared with the conventional scheme at 0.5- and 0.6-V supply voltages, respectively.

...read moreread less

Journal Article•DOI•

An Energy-Efficient Current-Controlled Write and Read Scheme for Resistive RAMs (RRAMs)

[...]

Hassen Aziza¹, Mathieu Moreau¹, Moritz Fieback², Mottaqiallah Taouil², Said Hamdioui² - Show less +1 more•Institutions (2)

Aix-Marseille University¹, Delft University of Technology²

24 Jul 2020-IEEE Access

TL;DR: A simple and feasible low power design scheme which can be used as a powerful tool for energy reduction in RRAM circuits is proposed, exclusively based on current control during write and read operations and ensures that write operations are completed without wasted energy.

...read moreread less

Abstract: Energy efficiency remains one of the main factors for improving the key performance markers of RRAMs to support IoT edge devices. This paper proposes a simple and feasible low power design scheme which can be used as a powerful tool for energy reduction in RRAM circuits. The design scheme is exclusively based on current control during write and read operations and ensures that write operations are completed without wasted energy. Self-adaptive write termination circuits are proposed to control the RRAM current during FORMING, RESET and SET operations. The termination circuits sense the programming current and stop the write pulse as soon as a preferred programming current is reached. Simulation results demonstrate that an appropriate choice of the programming currents can help obtain 4.1X improvement in FORMING, 9.1X improvement in SET and 1.12X improvement in RESET energy. Also, the possibility to have a tight control over the RESET resistance is demonstrated. READ energy optimization is also covered by leveraging on a differential sense amplifier offering a programmable current reference. Finally, an optimal trade-off between energy consumption during SET/RESET operations and an acceptable read margin is established according to the final application requirements.

...read moreread less

Proceedings Article•DOI•

Algorithm/Hardware Co-Design for In-Memory Neural Network Computing with Minimal Peripheral Circuit Overhead

[...]

Hyungjun Kim¹, Yulhwa Kim¹, Sungju Ryu¹, Jae-Joon Kim¹•Institutions (1)

Pohang University of Science and Technology¹

20 Jul 2020

TL;DR: An in-memory neural network accelerator architecture called MOSAIC is proposed which uses minimal form of peripheral circuits; 1-bit word line driver to replace DAC and1-bit sense amplifier to replace ADC to achieve an order of magnitude higher energy and area efficiency.

...read moreread less

Abstract: We propose an in-memory neural network accelerator architecture called MOSAIC which uses minimal form of peripheral circuits; 1-bit word line driver to replace DAC and 1-bit sense amplifier to replace ADC. To map multi-bit neural networks on MOSAIC architecture which has 1-bit precision peripheral circuits, we also propose a bit-splitting method to approximate the original network by separating each bit path of the multi-bit network so that each bit path can propagate independently throughout the network. Thanks to the minimal form of peripheral circuits, MOSAIC can achieve an order of magnitude higher energy and area efficiency than previous in-memory neural network accelerators.

...read moreread less

Journal Article•DOI•

A Capacitor-Coupled Offset-Canceled Sense Amplifier for DRAMs With Reduced Variation of Decision Threshold Voltage

[...]

Jung Min Yoon¹, Hyungrok Do¹, Daehyun Koh¹, Seung Han Oak², Junphyo Lee², Deog-Kyoon Jeong¹ - Show less +2 more•Institutions (2)

Seoul National University¹, SK Hynix²

17 Feb 2020-IEEE Journal of Solid-state Circuits

TL;DR: An offset-canceled DRAM sense amplifier with coupling capacitors to store and cancel the offset arising from random variations of the threshold voltages of the amplifying transistors, thereby increasing the sensing margin of the overall DRAM design.

...read moreread less

Abstract: This article reports an offset-canceled DRAM sense amplifier with coupling capacitors to store and cancel the offset arising from random variations of the threshold voltages of the amplifying transistors. Analytical calculations of the average and standard deviation of the decision threshold voltages, defined as the voltage in the cell capacitor that bifurcates into binary levels when activated, are performed on various DRAM sensing schemes and their comparison results are presented. Based on the analysis, the proposed sense amplifier scheme using coupling capacitors is shown to offer the least amount of variation in the decision threshold, thereby increasing the sensing margin of the overall DRAM design. The coupling capacitors not only compensate for the random offset of the sense amplifiers but also mitigate the effect of the mismatch of the bitline capacitances in the open bitline scheme. Measurement on the experimental chip fabricated in a 65-nm CMOS process validates the analysis and confirms the superior performance of the proposed DRAM sensing scheme.

...read moreread less

Journal Article•DOI•

An Embedded Level-Shifting Dual-Rail SRAM for High-Speed and Low-Power Cache

[...]

Taehyun Kim¹, Hanwool Jeong², Juhyun Park³, Hoonki Kim⁴, Taejoong Song⁴, Seong-Ook Jung¹ - Show less +2 more•Institutions (4)

Yonsei University¹, Kwangwoon University², SK Hynix³, Samsung⁴

19 Oct 2020-IEEE Access

TL;DR: An embedded level-shifting (ELS) dual-rail SRAM is proposed to enhance the availability of dual-Rail SRAMs and achieves low-power operation with 71.4% power consumption compared to single-railSRAM with 72% performance overhead in circuit-level simulation, while the previous hybrid dual- rail SRAM shows 67.8% energy consumption with 270%performance overhead.

...read moreread less

Abstract: An embedded level-shifting (ELS) dual-rail SRAM is proposed to enhance the availability of dual-rail SRAMs. Although dual-rail SRAM is a powerful solution for satisfying the increasing demand for low-power applications, the enormous performance degradation at low supply voltages cannot meet the high-performance cache requirement in recent computing systems. The requirement of many level shifters is another drawback of the dual-rail SRAM because it degrades the energy-savings. The proposed ELS dual-rail SRAM achieves energy-savings by using a low supply voltage to precharge bitlines while minimizing the performance overhead by appropriately assigning a high-supply voltage to critical circuit blocks with effective level-shifting circuits. The sense amplifier embeds a level-shifting operation, thereby operating with a high supply voltage for a fast sensing operation. The proposed dynamic output buffer resolves the potential static current problem and improves the read delay. The number of level shifters is reduced using a proposed write driver, which conducts level-shifting and write-driving simultaneously. The proposed ELS dual-rail SRAM achieves low-power operation with 71.4% power consumption compared to single-rail SRAM with 72% performance overhead in circuit-level simulation, while the previous hybrid dual-rail SRAM shows 67.8% energy consumption with 270% performance overhead. In architecture-level simulation using Gem5 simulator with SPEC2006 benchmarks, the system with the ELS dual-rail SRAM caches shows, on average, 29% performance improvement compared to that of the system with the hybrid dual-rail SRAM caches.

...read moreread less

Journal Article•DOI•

Robust Offset-Cancellation Sense Amplifier for an Offset-Canceling Dual-Stage Sensing Circuit in Resistive Nonvolatile Memories

[...]

Taehui Na

30 Aug 2020-Electronics

TL;DR: An offset-canceling zero-sensing-dead-zone sense amplifier (OCZS-SA) combined with the OCDS-SC is proposed to significantly improve the read yield of resistive nonvolatile memories.

...read moreread less

Abstract: With technology scaling, achieving a target read yield of resistive nonvolatile memories becomes more difficult due to increased process variation and decreased supply voltage. Recently, an offset-canceling dual-stage sensing circuit (OCDS-SC) has been proposed to improve the read yield by canceling the offset voltage and utilizing a double-sensing-margin structure. In this paper, an offset-canceling zero-sensing-dead-zone sense amplifier (OCZS-SA) combined with the OCDS-SC is proposed to significantly improve the read yield. The OCZS-SA has two major advantages, namely, offset voltage cancellation and a zero sensing dead zone. The Monte Carlo HSPICE simulation results using a 65-nm predictive technology model show that the OCZS-SA achieves 2.1 times smaller offset voltage with a zero sensing dead zone than the conventional latch-type SAs at the cost of an increased area overhead of 1.0% for a subarray size of 128 × 16.

...read moreread less

Proceedings Article•DOI•

A Parallel-friendly Majority Gate to Accelerate In-memory Computation

[...]

John Reuben¹, Stefan Pechmann²•Institutions (2)

University of Erlangen-Nuremberg¹, University of Bayreuth²

06 Jul 2020

TL;DR: A method to compute majority while reading from a transistor-accessed RRAM array, which could achieve a latency reduction of 70% and 50% when compared to IMPLY and NAND/NOR logic-based adders, respectively.

...read moreread less

Abstract: Efforts to combat the ‘von Neumann bottleneck’ have been strengthened by Resistive RAMs (RRAMs), which enable computation in the memory array. Majority logic can accelerate computation when compared to NAND/NOR/IMPLY logic due to it’s expressive power. In this work, we propose a method to compute majority while reading from a transistor-accessed RRAM array. The proposed gate was verified by simulations using a physics-based model (for RRAM) and industry standard model (for CMOS sense amplifier) and, found to tolerate reasonable variations in the RRAMs’ resistive states. Together with NOT gate, which is also implemented in-memory, the proposed gate forms a functionally complete Boolean logic, capable of implementing any digital logic. Computing is simplified to a sequence of READ and WRITE operations and does not require any major modifications to the peripheral circuitry of the array. The parallel-friendly nature of the proposed gate is exploited to implement an eight-bit parallel-prefix adder in memory array. The proposed in-memory adder could achieve a latency reduction of 70% and 50% when compared to IMPLY and NAND/NOR logic-based adders, respectively.

...read moreread less

Journal Article•DOI•

A Low-Power PAM4 Receiver With an Adaptive Variable-Gain Rectifier-Based Decoder

[...]

Quan Pan¹, Li Wang², Xiongshi Luo¹, C. Patrick Yue²•Institutions (2)

Southern University of Science and Technology¹, Hong Kong University of Science and Technology²

27 Jul 2020-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A low-power 1/4-rate four-level pulse amplitude modulation (PAM4) receiver with an adaptive variable-gain rectifier (AVGR)-based decoder in 28-nm CMOS technology achieves a better power efficiency by employing a 1/ 4-rate topology and merging a variable- gain function into the decoder.

...read moreread less

Abstract: This article presents a low-power 1/4-rate four-level pulse amplitude modulation (PAM4) receiver with an adaptive variable-gain rectifier (AVGR)-based decoder in 28-nm CMOS technology. The PAM4 input signal is preconditioned by a continuous-time linear equalizer (CTLE) then sampled into four branches of decoders by 1/4-rate clocks. The proposed AVGR-based PAM4-to-nonreturn-to-zero (NRZ) decoder performs gain adaptation and amplitude rectification simultaneously for decoding the least significant bit (LSB). The linear sense amplifier in the AVGR is modified from a latch to achieve a high gain and low power. Compared with the full-rate receiver adopting a decoder consisting of three comparators, this design achieves a better power efficiency by employing a 1/4-rate topology and merging a variable-gain function into the decoder. Experimental results demonstrate that the receiver chip can receive and decode a 24-Gb/s 190-mVpp PAM4 signal at a BER of 10−11 and a bit efficiency of 1.38 pJ/bit.

...read moreread less

Proceedings Article•DOI•

Design of Sense Amplifier for Wide Voltage Range Operation of Split Supply Memories in 22nm HKMG CMOS Technology

[...]

Vinay C. Patil¹, Anuj Grover², Anuj Parashar¹•Institutions (2)

Synopsys¹, Indraprastha Institute of Information Technology²

01 Jan 2020

TL;DR: A sense amplifier for split supply SRAMs to enable wide range of Dynamic Voltage and Frequency Scaling (DVFS) and the proposed solution has more than 25% lower offset and almost the same SA reaction time compared to the voltage latch sense amplifier.

...read moreread less

Abstract: Embedded memories are an integral part of the design of todays processors and System on Chip (SoC). Sense amplifier (SA) plays an important role in determining the performance and yield of memories. In this work, we propose a sense amplifier for split supply SRAMs to enable wide range of Dynamic Voltage and Frequency Scaling (DVFS). The proposed solution has more than 25% lower offset and almost the same SA reaction time compared to the voltage latch sense amplifier and more than 50% lower offset and more than 15% faster SA reaction time compared to the current latch sense amplifier across 0.45V to 1.0V operation in 22nm HKMG CMOS technology.

...read moreread less

Journal Article•DOI•

Material analysis of high degree of variability in thin CMOS for SRAM current sense amplifier

[...]

R. Selvakumar¹, M. Lakshmana Kumar¹, S. Raja Gopal¹•Institutions (1)

K L University¹

01 Jan 2020-Materials Today: Proceedings

TL;DR: In modern computer memory role is to sense the low power signals, this paper optimally finds the best solutions to improve the performance of the Sense Amplifier for CMOS SRAM.

...read moreread less

Journal Article•DOI•

A 100% Stable Sense-Amplifier-Based Physically Unclonable Function With Individually Embedded Non-Volatile Memory

[...]

Kang-Un Choi¹, Seungbum Baek¹, Jino Heo¹, Jong-Phil Hong¹•Institutions (1)

Chungbuk National University¹

01 Jan 2020-IEEE Access

TL;DR: A sense-amplifier-based physically unclonable function (PUF) with individually embedded non-volatile memory (eNVM) that offers 100% stable random bits that is implemented in a 180 nm standard CMOS process.

...read moreread less

Abstract: In this paper is presented a sense-amplifier-based physically unclonable function (PUF) with individually embedded non-volatile memory (eNVM) that offers 100% stable random bits. The proposed eNVM, which stores the initially generated random key, biases the sense-amplifier to reproduce always the same key as the initial value through a feedback path. In order to verify the performance of the proposed architecture, a 256-bit PUF with a core area of 0.160 mm2 was implemented in a 180 nm standard CMOS process. The measurement results of the implemented PUF show an intra-chip Hamming Distance (HD) of 0 (100% stability) and inter-chip HD of 0.5047.

...read moreread less

Patent•

Apparatuses and method for reducing sense amplifier leakage current during active power-down

[...]

Kawamura Christopher¹•Institutions (1)

Micron Technology¹

04 Jun 2020

TL;DR: In this paper, a memory cell is coupled to the first digit line in response to activation of a wordline coupled the memory cell, and the second transistor is disabled to decouple the second digit line from the second gut node.

...read moreread less

Abstract: Apparatuses and methods for reducing sense amplifier leakage current during an active power-down are disclosed. An example apparatus includes a memory that includes a memory cell and a first digit line and a second digit line. The memory cell is coupled to the first digit line in response to activation of a wordline coupled the memory cell. The example apparatus further includes a sense amplifier comprising of a first transistor coupled between the first digit line and a first gut node of the sense amplifier and a second transistor coupled between the second digit line and a second gut node of the sense amplifier. While the wordline is activated, in response to entering a power-down mode, the first transistor is disabled to decouple the first digit line from the first gut node and the second transistor is disabled to decouple the second digit line from the second gut node.

...read moreread less

Journal Article•DOI•

Dynamic Read Current Sensing With Amplified Bit-Line Voltage for STT-MRAMs

[...]

Mustafa Ali¹, Robert Andrawis¹, Kaushik Roy¹•Institutions (1)

Purdue University¹

01 Mar 2020-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: A novel sensing approach for spin-based memories that augments the read sense margin and improves read decision failure without deteriorating read disturb by changing the read current dynamically according to the bit-cell state is presented.

...read moreread less

Abstract: This brief presents a novel sensing approach for spin-based memories that augments the read sense margin and improves read decision failure without deteriorating read disturb by changing the read current dynamically according to the bit-cell state. The proposed sensing circuit consists of three main sub-circuits: 1) the bit-cell; 2) bit-line amplifier; and 3) a standard current-latch sense amplifier. The bit-line amplifier is a basic amplifier with positive feedback connection to achieve higher sense margin with dynamic read current. As a result, the significant increase in the sense margin eliminates the effect of the sense amplifier offset on read decision failure. Monte Carlo simulations in 45 nm demonstrate that the proposed sensing scheme improves the read bit error rate (BER) by more than one order of magnitude compared to the conventional voltage sensing scheme with a cost of 0.3% array area overhead and 3% energy penalty. Moreover, quantitatively compared with some of the state-of-the-art sensing schemes, the proposed scheme achieves a better area-energy-robustness trade-off.

...read moreread less

Journal Article•DOI•

A 2T-MONOS Embedded Flash Macro With 65-nm SOTB Technology Achieving 0.15-pJ/bit Read Energy With 80-MHz Access for IoT Applications

[...]

Ken Matsubara¹, Tsutomu Nagasawa¹, Yoshinobu Kaneda¹, Hidenori Mitani¹, Iwase Takashi¹, Yasunobu Aoki¹, Kohei Hashimoto¹, Toshiaki Morioka¹, Keiichi Maekawa¹, Takashi Ito¹, Hiroyuki Kondo¹, Takashi Kono¹ - Show less +8 more•Institutions (1)

Renesas Electronics¹

10 Mar 2020

TL;DR: In this letter, 1.5-MB and 256-KB 2T-MONOS eFlash macros are developed with 65-nm silicon-on-thin-box (SOTB) technology, adopting low-energy sense amplifier and data transmission circuit techniques which enhance intrinsic advantages of SOTB devices.

...read moreread less

Abstract: To expand Internet of Things application ranges, ultralow active energy operations are essential in edge devices. Especially, read energy reduction in embedded Flash (eFlash) memory is strongly required to enable real-time sensing with limited energy generated by energy harvesting (EH). In this letter, 1.5-MB and 256-KB 2T-MONOS eFlash macros are developed with 65-nm silicon-on-thin-box (SOTB) technology, adopting low-energy sense amplifier and data transmission circuit techniques which enhance intrinsic advantages of SOTB devices. These macros achieve read energy of 0.15-pJ/bit with 80-MHz random read access capability, which is low enough to utilize EH technologies as energy sources.

...read moreread less

Patent•

Apparatuses and method for reducing row address to column address delay

[...]

Kawamura Christopher¹•Institutions (1)

Micron Technology¹

30 Apr 2020

TL;DR: In this paper, a memory including a memory cell coupled to a first digit line in response to a wordline being set to an active state and a sense amplifier coupled to the first digit lines and to a second digit line.

...read moreread less

Abstract: Apparatuses and methods for reducing row address (RAS) to column address (CAS) delay are disclosed. An example apparatus includes a memory including a memory cell coupled to a first digit line in response to a wordline being set to an active state and a sense amplifier coupled to the first digit line and to a second digit line. The sense amplifier is configured to perform a threshold voltage compensation operation to bias the first digit line and the second digit line based on a threshold voltage difference between at least two circuit components of the sense amplifier. The apparatus further comprising a decoder circuit coupled to the wordline and to the sense amplifier. In response to an activate command, the decoder circuit is configured to initiate the threshold voltage compensation operation and, during the threshold voltage compensation operation, to the set the wordline to the active state.

...read moreread less

Journal Article•DOI•

Low-Power Binary Neuron Circuit With Adjustable Threshold for Binary Neural Networks Using NAND Flash Memory

[...]

Sung-Tae Lee¹, Sung Yun Woo¹, Jong-Ho Lee¹•Institutions (1)

Seoul National University¹

20 Aug 2020-IEEE Access

TL;DR: An analog bit-counting scheme is proposed to decrease the burden of neuron circuits with a synaptic architecture utilizing NAND flash memory, and a novel binary neuron circuit with a double-gate positive feedback (PF) device is demonstrated to replace the sense amplifier, adder, and comparator.

...read moreread less

Abstract: Recent studies have demonstrated that binary neural networks (BNN) could achieve a satisfying inference accuracy on representative image datasets. BNN conducts XNOR and bit-counting operations instead of high-precision vector-matrix multiplication (VMM), significantly reducing the memory storage. In this work, an analog bit-counting scheme is proposed to decrease the burden of neuron circuits with a synaptic architecture utilizing NAND flash memory. A novel binary neuron circuit with a double-gate positive feedback (PF) device is demonstrated to replace the sense amplifier, adder, and comparator, thereby reducing the burden of the complementary metal-oxide semiconductor (CMOS) circuits and power consumption. By using the double-gate PF device, the threshold voltage of the neuron circuits can be adaptively matched to the threshold value in the algorithms eliminating the accuracy degradation introduced by the process variation. Thanks to the super-steep SS characteristics of the PF device, the proposed neuron circuit with the PF device has an off-state current of 1 pA, representing 10 5 times improvement compared to the neuron circuit with a conventional metal-oxide-semiconductor field effect transistor (MOSFET) device. A system simulation of a hardware-based BNN shows that the low-variance conductance distribution (8.4 %) of the synaptic device and the adjustable threshold of the neuron circuit implement a highly efficient BNN with a high inference accuracy.

...read moreread less

Posted Content•

Low Power In-Memory Implementation of Ternary Neural Networks with Resistive RAM-Based Synapse

[...]

Axel Laborieux, Marc Bocquet, Tifenn Hirtzlin, Jacques-Olivier Klein, Liza Herrera Diez, Etienne Nowak, Elisa Vianello, Jean-Michel Portal, Damien Querlioz - Show less +5 more

05 May 2020-arXiv: Emerging Technologies

TL;DR: It is shown based on neural network simulation on the CIFAR-10 image recognition task that going from binary to ternary neural networks significantly increases neural network performance, highlighting that AI circuits function may sometimes be revisited when operated in low power regimes.

...read moreread less

Abstract: The design of systems implementing low precision neural networks with emerging memories such as resistive random access memory (RRAM) is a major lead for reducing the energy consumption of artificial intelligence (AI). Multiple works have for example proposed in-memory architectures to implement low power binarized neural networks. These simple neural networks, where synaptic weights and neuronal activations assume binary values, can indeed approach state-of-the-art performance on vision tasks. In this work, we revisit one of these architectures where synapses are implemented in a differential fashion to reduce bit errors, and synaptic weights are read using precharge sense amplifiers. Based on experimental measurements on a hybrid 130 nm CMOS/RRAM chip and on circuit simulation, we show that the same memory array architecture can be used to implement ternary weights instead of binary weights, and that this technique is particularly appropriate if the sense amplifier is operated in near-threshold regime. We also show based on neural network simulation on the CIFAR-10 image recognition task that going from binary to ternary neural networks significantly increases neural network performance. These results highlight that AI circuits function may sometimes be revisited when operated in low power regimes.

...read moreread less

Proceedings Article•DOI•

Low Power In-Memory Implementation of Ternary Neural Networks with Resistive RAM-Based Synapse

[...]

Axel Laborieux¹, Marc Bocquet¹, Tifenn Hirtzlin¹, Jacques-Olivier Klein¹, L. Herrera Diez¹, E. Nowak, E. Vianello¹, Jean-Michel Portal¹, Damien Querlioz¹ - Show less +5 more•Institutions (1)

Centre national de la recherche scientifique¹

01 Aug 2020

TL;DR: In this article, the same memory array architecture can be used to implement ternary weights instead of binary weights, and this technique is particularly appropriate if the sense amplifier is operated in near-threshold regime.

...read moreread less

Collapse