Showing papers by "Jean-Michel Portal published in 2021"

PDF

Open Access

Journal Article•DOI•

Implementation of Ternary Weights With Resistive RAM Using a Single Sense Operation Per Synapse

[...]

Axel Laborieux¹, Marc Bocquet², Tifenn Hirtzlin¹, Jacques-Olivier Klein¹, Etienne Nowak³, Elisa Vianello³, Jean-Michel Portal², Damien Querlioz¹ - Show less +4 more•Institutions (3)

Université Paris-Saclay¹, Aix-Marseille University², University of Grenoble³

01 Jan 2021-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: It is shown based on neural network simulation on the CIFAR-10 image recognition task that the use of ternary neural networks significantly increases neural network performance, with regards to binary ones, which are often preferred for inference hardware.

...read moreread less

Abstract: The design of systems implementing low precision neural networks with emerging memories such as resistive random access memory (RRAM) is a significant lead for reducing the energy consumption of artificial intelligence. To achieve maximum energy efficiency in such systems, logic and memory should be integrated as tightly as possible. In this work, we focus on the case of ternary neural networks, where synaptic weights assume ternary values. We propose a two-transistor/two-resistor memory architecture employing a precharge sense amplifier, where the weight value can be extracted in a single sense operation. Based on experimental measurements on a hybrid 130 nm CMOS/RRAM chip featuring this sense amplifier, we show that this technique is particularly appropriate at low supply voltage, and that it is resilient to process, voltage, and temperature variations. We characterize the bit error rate in our scheme. We show based on neural network simulation on the CIFAR-10 image recognition task that the use of ternary neural networks significantly increases neural network performance, with regards to binary ones, which are often preferred for inference hardware. We finally evidence that the neural network is immune to the type of bit errors observed in our scheme, which can therefore be used without error correction.

...read moreread less

8 citations

Journal Article•DOI•

Model of the Weak Reset Process in HfO x Resistive Memory for Deep Learning Frameworks

[...]

Atreya Majumdar¹, Marc Bocquet², Tifenn Hirtzlin³, Axel Laborieux¹, Jacques-Olivier Klein¹, Etienne Nowak³, Elisa Vianello³, Jean-Michel Portal², Damien Querlioz¹ - Show less +5 more•Institutions (3)

Centre national de la recherche scientifique¹, Aix-Marseille University², Commissariat à l'énergie atomique et aux énergies alternatives³

02 Jul 2021-IEEE Transactions on Electron Devices

Abstract: The implementation of current deep learning training algorithms is power-hungry, due to data transfer between memory and logic units. Oxide-based resistive random access memories (RRAMs) are outstanding candidates to implement in-memory computing, which is less power-intensive. Their weak RESET regime is particularly attractive for learning, as it allows tuning the resistance of the devices with remarkable endurance. However, the resistive change behavior in this regime suffers from many fluctuations and is particularly challenging to model, especially in a way compatible with tools used for simulating deep learning. In this work, we present a model of the weak RESET process in hafnium oxide RRAM and integrate this model within the PyTorch deep learning framework. Validated on experiments on a hybrid CMOS/RRAM technology, our model reproduces both the noisy progressive behavior and the device-to-device (D2D) variability. We use this tool to train binarized neural networks (BNNs) for the MNIST handwritten digit recognition task and the CIFAR-10 object classification task. We simulate our model with and without various aspects of device imperfections to understand their impact on the training process and identify that the D2D variability is the most detrimental aspect. The framework can be used in the same manner for other types of memories to identify the device imperfections that cause the most degradation, which can, in turn, be used to optimize the devices to reduce the impact of these imperfections.

...read moreread less

4 citations

Journal Article•DOI•

Model of the Weak Reset Process in HfOx Resistive Memory for Deep Learning Frameworks

[...]

Atreya Majumdar, Marc Bocquet, Tifenn Hirtzlin, Axel Laborieux, Jacques-Olivier Klein, Etienne Nowak, Elisa Vianello, Jean-Michel Portal, Damien Querlioz - Show less +5 more

02 Jul 2021-arXiv: Learning

TL;DR: In this article, a model of the weak RESET process in hafnium oxide RRAM is presented and integrated into the PyTorch deep learning framework for training Binarized Neural Networks for handwritten digit recognition and object classification.

...read moreread less

Abstract: The implementation of current deep learning training algorithms is power-hungry, owing to data transfer between memory and logic units. Oxide-based RRAMs are outstanding candidates to implement in-memory computing, which is less power-intensive. Their weak RESET regime, is particularly attractive for learning, as it allows tuning the resistance of the devices with remarkable endurance. However, the resistive change behavior in this regime suffers many fluctuations and is particularly challenging to model, especially in a way compatible with tools used for simulating deep learning. In this work, we present a model of the weak RESET process in hafnium oxide RRAM and integrate this model within the PyTorch deep learning framework. Validated on experiments on a hybrid CMOS/RRAM technology, our model reproduces both the noisy progressive behavior and the device-to-device (D2D) variability. We use this tool to train Binarized Neural Networks for the MNIST handwritten digit recognition task and the CIFAR-10 object classification task. We simulate our model with and without various aspects of device imperfections to understand their impact on the training process and identify that the D2D variability is the most detrimental aspect. The framework can be used in the same manner for other types of memories to identify the device imperfections that cause the most degradation, which can, in turn, be used to optimize the devices to reduce the impact of these imperfections.

...read moreread less

3 citations

Proceedings Article•DOI•

CAPC: A Configurable Analog Pop-Count Circuit for Near-Memory Binary Neural Networks

[...]

F. Jebali¹, A. Majumdar², Axel Laborieux², Tifenn Hirtzlin, E. Vianello, J.P. Walder¹, Marc Bocquet¹, Damien Querlioz², Jean-Michel Portal¹ - Show less +5 more•Institutions (2)

Centre national de la recherche scientifique¹, Université Paris-Saclay²

09 Aug 2021

TL;DR: In this article, the authors proposed a configurable analog auto-compensate Pop-Count (CAPC) circuit compatible with column-wise neuron mapping, which has the advantage of featuring a very natural configurability through analog switch connections.

...read moreread less

Abstract: Currently, a major trend in artificial intelligence is to implement neural networks at the edge, within circuits with limited memory capacity. To reach this goal, the in-memory or near-memory implementation of low precision neural networks such as Binarized Neural Networks (BNNs) constitutes an appealing solution. However, the configurability of these approaches is a major challenge: in neural networks, the number of neurons per layer vary tremendously depending on the application, limiting the column-wise or row-wise mapping of neurons in memory arrays. To tackle this issue, we propose, for the first time, a Configurable Analog auto-compensate Pop-Count (CAPC) circuit compatible with column-wise neuron mapping. Our circuit has the advantage of featuring a very natural configurability through analog switch connections. We demonstrate that our solution saves 18% of area compared to non configurable conventional digital solution. Moreover, through extensive Monte-Carlo simulations, we show that the overall error probability remains low, and we highlight, at network level, the resilience of our configurable solution, with very limited accuracy degradation of 0.15% on the MNIST task, and 2.84% on the CIFAR-10 task.

...read moreread less

2 citations

Proceedings Article•DOI•

Storage Class Memory with Computing Row Buffer: A Design Space Exploration

[...]

Valentin Egloff¹, Jean-Philippe Noel¹, Maha Kooli¹, Bastien Giraud¹, Lorenzo Ciampolini¹, Roman Gauchi¹, Cesar Fuguet¹, Eric Guthmuller¹, Mathieu Moreau², Jean-Michel Portal² - Show less +6 more•Institutions (2)

University of Grenoble¹, Centre national de la recherche scientifique²

01 Feb 2021

TL;DR: In this paper, the authors proposed a computing row buffer (C-RB) using a computing SRAM model, in place of the standard Row Buffer (RB) in the SCM.

...read moreread less

Abstract: Today computing centric von Neumann architectures face strong limitations in the data-intensive context of numerous applications, such as deep learning. One of these limitations corresponds to the well known von Neumann bottleneck. To overcome this bottleneck, the concepts of In-Memory Computing (IMC) and Near-Memory Computing (NMC) have been proposed. IMC solutions based on volatile memories, such as SRAM and DRAM, with nearly infinite endurance, solve only partially the data transfer problem from the Storage Class Memory (SCM). Computing in SCM is extremely limited by the intrinsic poor endurance of the Non-Volatile Memory (NVM) technologies. In this paper, we propose to take the best of both solutions, by introducing a Computing Row Buffer (C-RB), using a Computing SRAM (C-SRAM) model, in place of the standard Row Buffer (RB) in the SCM. The principle is to keep operations on large vectors in the C-RB of the SCM, minimizing data movement to and from the CPU, thus drastically reducing energy consumption of the overall system. To evaluate the proposed architecture, we use an instruction accurate platform based on Intel Pin software. Pin instruments run time binaries in order to get applications' full memory traces of our solution. We achieve energy reduction up to 7.9x on average and up to 45x for the best case and speedup up to 3.8x on average and up to 13x for the best case, and a reduction of write accesses in the SCM up to 18 %, compared to SIMD 512-bit architecture.

...read moreread less

1 citations