scispace - formally typeset
Search or ask a question

Showing papers by "Jean-Michel Portal published in 2021"


Journal ArticleDOI
TL;DR: It is shown based on neural network simulation on the CIFAR-10 image recognition task that the use of ternary neural networks significantly increases neural network performance, with regards to binary ones, which are often preferred for inference hardware.
Abstract: The design of systems implementing low precision neural networks with emerging memories such as resistive random access memory (RRAM) is a significant lead for reducing the energy consumption of artificial intelligence. To achieve maximum energy efficiency in such systems, logic and memory should be integrated as tightly as possible. In this work, we focus on the case of ternary neural networks, where synaptic weights assume ternary values. We propose a two-transistor/two-resistor memory architecture employing a precharge sense amplifier, where the weight value can be extracted in a single sense operation. Based on experimental measurements on a hybrid 130 nm CMOS/RRAM chip featuring this sense amplifier, we show that this technique is particularly appropriate at low supply voltage, and that it is resilient to process, voltage, and temperature variations. We characterize the bit error rate in our scheme. We show based on neural network simulation on the CIFAR-10 image recognition task that the use of ternary neural networks significantly increases neural network performance, with regards to binary ones, which are often preferred for inference hardware. We finally evidence that the neural network is immune to the type of bit errors observed in our scheme, which can therefore be used without error correction.

8 citations


Journal ArticleDOI
Abstract: The implementation of current deep learning training algorithms is power-hungry, due to data transfer between memory and logic units. Oxide-based resistive random access memories (RRAMs) are outstanding candidates to implement in-memory computing, which is less power-intensive. Their weak RESET regime is particularly attractive for learning, as it allows tuning the resistance of the devices with remarkable endurance. However, the resistive change behavior in this regime suffers from many fluctuations and is particularly challenging to model, especially in a way compatible with tools used for simulating deep learning. In this work, we present a model of the weak RESET process in hafnium oxide RRAM and integrate this model within the PyTorch deep learning framework. Validated on experiments on a hybrid CMOS/RRAM technology, our model reproduces both the noisy progressive behavior and the device-to-device (D2D) variability. We use this tool to train binarized neural networks (BNNs) for the MNIST handwritten digit recognition task and the CIFAR-10 object classification task. We simulate our model with and without various aspects of device imperfections to understand their impact on the training process and identify that the D2D variability is the most detrimental aspect. The framework can be used in the same manner for other types of memories to identify the device imperfections that cause the most degradation, which can, in turn, be used to optimize the devices to reduce the impact of these imperfections.

4 citations


Journal ArticleDOI
TL;DR: In this article, a model of the weak RESET process in hafnium oxide RRAM is presented and integrated into the PyTorch deep learning framework for training Binarized Neural Networks for handwritten digit recognition and object classification.
Abstract: The implementation of current deep learning training algorithms is power-hungry, owing to data transfer between memory and logic units. Oxide-based RRAMs are outstanding candidates to implement in-memory computing, which is less power-intensive. Their weak RESET regime, is particularly attractive for learning, as it allows tuning the resistance of the devices with remarkable endurance. However, the resistive change behavior in this regime suffers many fluctuations and is particularly challenging to model, especially in a way compatible with tools used for simulating deep learning. In this work, we present a model of the weak RESET process in hafnium oxide RRAM and integrate this model within the PyTorch deep learning framework. Validated on experiments on a hybrid CMOS/RRAM technology, our model reproduces both the noisy progressive behavior and the device-to-device (D2D) variability. We use this tool to train Binarized Neural Networks for the MNIST handwritten digit recognition task and the CIFAR-10 object classification task. We simulate our model with and without various aspects of device imperfections to understand their impact on the training process and identify that the D2D variability is the most detrimental aspect. The framework can be used in the same manner for other types of memories to identify the device imperfections that cause the most degradation, which can, in turn, be used to optimize the devices to reduce the impact of these imperfections.

3 citations


Proceedings ArticleDOI
09 Aug 2021
TL;DR: In this article, the authors proposed a configurable analog auto-compensate Pop-Count (CAPC) circuit compatible with column-wise neuron mapping, which has the advantage of featuring a very natural configurability through analog switch connections.
Abstract: Currently, a major trend in artificial intelligence is to implement neural networks at the edge, within circuits with limited memory capacity. To reach this goal, the in-memory or near-memory implementation of low precision neural networks such as Binarized Neural Networks (BNNs) constitutes an appealing solution. However, the configurability of these approaches is a major challenge: in neural networks, the number of neurons per layer vary tremendously depending on the application, limiting the column-wise or row-wise mapping of neurons in memory arrays. To tackle this issue, we propose, for the first time, a Configurable Analog auto-compensate Pop-Count (CAPC) circuit compatible with column-wise neuron mapping. Our circuit has the advantage of featuring a very natural configurability through analog switch connections. We demonstrate that our solution saves 18% of area compared to non configurable conventional digital solution. Moreover, through extensive Monte-Carlo simulations, we show that the overall error probability remains low, and we highlight, at network level, the resilience of our configurable solution, with very limited accuracy degradation of 0.15% on the MNIST task, and 2.84% on the CIFAR-10 task.

2 citations


Proceedings ArticleDOI
01 Feb 2021
TL;DR: In this paper, the authors proposed a computing row buffer (C-RB) using a computing SRAM model, in place of the standard Row Buffer (RB) in the SCM.
Abstract: Today computing centric von Neumann architectures face strong limitations in the data-intensive context of numerous applications, such as deep learning. One of these limitations corresponds to the well known von Neumann bottleneck. To overcome this bottleneck, the concepts of In-Memory Computing (IMC) and Near-Memory Computing (NMC) have been proposed. IMC solutions based on volatile memories, such as SRAM and DRAM, with nearly infinite endurance, solve only partially the data transfer problem from the Storage Class Memory (SCM). Computing in SCM is extremely limited by the intrinsic poor endurance of the Non-Volatile Memory (NVM) technologies. In this paper, we propose to take the best of both solutions, by introducing a Computing Row Buffer (C-RB), using a Computing SRAM (C-SRAM) model, in place of the standard Row Buffer (RB) in the SCM. The principle is to keep operations on large vectors in the C-RB of the SCM, minimizing data movement to and from the CPU, thus drastically reducing energy consumption of the overall system. To evaluate the proposed architecture, we use an instruction accurate platform based on Intel Pin software. Pin instruments run time binaries in order to get applications' full memory traces of our solution. We achieve energy reduction up to 7.9x on average and up to 45x for the best case and speedup up to 3.8x on average and up to 13x for the best case, and a reduction of write accesses in the SCM up to 18 %, compared to SIMD 512-bit architecture.

1 citations