scispace - formally typeset
Search or ask a question
Journal ArticleDOI

An Analog Neural Network Computing Engine Using CMOS-Compatible Charge-Trap-Transistor (CTT)

TL;DR: An analog neural network computing engine based on CMOS-compatible charge-trap transistor (CTT) is proposed and obtained a performance comparable to state-of-the-art fully connected neural networks using 8-bit fixed-point resolution.
Abstract: An analog neural network computing engine based on CMOS-compatible charge-trap transistor (CTT) is proposed in this paper. CTT devices are used as analog multipliers. Compared to digital multipliers, CTT-based analog multiplier shows significant area and power reduction. The proposed computing engine is composed of a scalable CTT multiplier array and energy efficient analog–digital interfaces. By implementing the sequential analog fabric, the engine’s mixed-signal interfaces are simplified and hardware overhead remains constant regardless of the size of the array. A proof-of-concept 784 by 784 CTT computing engine is implemented using TSMC 28-nm CMOS technology and occupies 0.68 mm2. The simulated performance achieves 76.8 TOPS (8-bit) with 500 MHz clock frequency and consumes 14.8 mW. As an example, we utilize this computing engine to address a classic pattern recognition problem—classifying handwritten digits on MNIST database and obtained a performance comparable to state-of-the-art fully connected neural networks using 8-bit fixed-point resolution.
Citations
More filters
Journal ArticleDOI
TL;DR: In this article , a multifunctional infrared image sensor based on an array of black phosphorous programmable phototransistors (bP-PPT) is presented, which can receive optical images transmitted over a broad spectral range in the infrared and perform inference computation to process and recognize the images with 92% accuracy.
Abstract: Image sensors with internal computing capability enable in-sensor computing that can significantly reduce the communication latency and power consumption for machine vision in distributed systems and robotics. Two-dimensional semiconductors have many advantages in realizing such intelligent vision sensors because of their tunable electrical and optical properties and amenability for heterogeneous integration. Here, we report a multifunctional infrared image sensor based on an array of black phosphorous programmable phototransistors (bP-PPT). By controlling the stored charges in the gate dielectric layers electrically and optically, the bP-PPT's electrical conductance and photoresponsivity can be locally or remotely programmed with 5-bit precision to implement an in-sensor convolutional neural network (CNN). The sensor array can receive optical images transmitted over a broad spectral range in the infrared and perform inference computation to process and recognize the images with 92% accuracy. The demonstrated bP image sensor array can be scaled up to build a more complex vision-sensory neural network, which will find many promising applications for distributed and remote multispectral sensing.

32 citations

Posted Content
TL;DR: In this article, a multispectral infrared image sensor based on an array of black phosphorous programmable phototransistors (bP-PPT) is presented.
Abstract: Image sensors with internal computing capability enable in-sensor computing that can significantly reduce the communication latency and power consumption for machine vision in distributed systems and robotics. Two-dimensional semiconductors are uniquely advantageous in realizing such intelligent visionary sensors because of their tunable electrical and optical properties and amenability for heterogeneous integration. Here, we report a multifunctional infrared image sensor based on an array of black phosphorous programmable phototransistors (bP-PPT). By controlling the stored charges in the gate dielectric layers electrically and optically, the bP-PPT's electrical conductance and photoresponsivity can be locally or remotely programmed with high precision to implement an in-sensor convolutional neural network (CNN). The sensor array can receive optical images transmitted over a broad spectral range in the infrared and perform inference computation to process and recognize the images with 92% accuracy. The demonstrated multispectral infrared imaging and in-sensor computing with the black phosphorous optoelectronic sensor array can be scaled up to build a more complex visionary neural network, which will find many promising applications for distributed and remote multispectral sensing.

32 citations

Journal ArticleDOI
TL;DR: In this article, the authors present the three-terminal synaptic devices based on ferroelectric field effect transistor (FeFET), and discuss the switching physics of the intermediate states, the back-end-of-line (BEOL) integration and the 3D NAND architecture design.
Abstract: Neuro-inspired deep learning algorithm has shown promising futures for artificial intelligence (AI). Despite the remarkable progress on software-based neural network, the traditional von-Neumann hardware architecture has suffered from limited energy efficiency while facing unprecedented large amount of data. In order to meet the performance requirement of neuro-inspired computing, the large-scale vector-matrix multiplication is preferred to be performed in-situ, namely compute-in-memory (CIM). Non-volatile memory devices with different materials have been proposed for weight storage as synaptic devices. Among them, the HfO2-based ferroelectric devices have obtained great attention because of their low energy consumption, good CMOS compatibility and multi-bit per cell potential. In this review, recent trends and prospects of the ferroelectric synaptic devices are surveyed. First, we present the three-terminal synaptic devices based on ferroelectric field effect transistor (FeFET), and discuss the switching physics of the intermediate states, the back-end-of-line (BEOL) integration and the 3D NAND architecture design. Then, we introduce hybrid precision synapse concept that leverages the volatile charges on the gate capacitor of FeFET and the non-volatile polarization on the gate dielectric of FeFET. Lastly, we review two-terminal synaptic devices using ferroelectric tunnel junction (FTJ) and ferroelectric capacitor (FeCAP). Design margins of the crossbar array with FTJ and FeCAP are analyzed.

18 citations

Journal ArticleDOI
TL;DR: A comprehensive investigation of the programming behavior of CTTs, including analog retention, intra- and inter-device variation, and fine-tuning of the device, both for individual devices and for devices in an integrated array reveals the promising future of using the CTT as a CMOS-only analog memory device.
Abstract: Since our demonstration of unsupervised learning using the CMOS-only charge-trap transistors (CTTs) as analog synapses, there has been an increasing interest in exploiting the device for various other neural network (NN) applications. However, most of these studies are limited to mere simulation due to the absence of detailed experimental device characterization. In this article, we provide a comprehensive investigation of the programming behavior of CTTs, including analog retention, intra- and inter-device variation, and fine-tuning of the device, both for individual devices and for devices in an integrated array. It is found that, after programming, the channel current gradually increases to a higher level, and the shift is larger when the device is programmed to a higher threshold voltage. With this postprogramming current increase appropriately accounted for, individual devices can be programmed to an equivalent precision of five bits, and three bits can be achieved for devices in an array. Our results reveal the promising future of using the CTT as a CMOS-only analog memory device.

18 citations


Cites methods from "An Analog Neural Network Computing ..."

  • ...For example, in [34], an analog inference engine using the CTT as the analog synapses was proposed, and the simulation shows significant performance edge over conventional digital approaches....

    [...]

Proceedings ArticleDOI
Yufei Ma1, Yuan Du1, Li Du1, Jun Lin1, Zhongfeng Wang1 
07 Sep 2020
TL;DR: The recent trends of IMC from techniques (SRAM, flash, RRAM and other types of non-volatile memory) to architecture and to applications are investigated to serve as a guide to the future advances on computing in-memory (CIM).
Abstract: To overcome the memory bottleneck of von-Neuman architecture, various memory-centric computing techniques are emerging to reduce the latency and energy consumption caused by data communication. The great success of artificial intelligence (AI) algorithms, which involve a large number of computations and data movements, has motivated and accelerated the recent researches of in-memory computing (IMC) techniques to significantly reduce or even diminish the accesses of off-chip data, where memory is not only storing data but can also directly output computation results. For example, the multiply-and-accumulate (MAC) operations in deep learning algorithms can be realized by accessing the memory using the input activations. This paper will investigate the recent trends of IMC from techniques (SRAM, flash, RRAM and other types of non-volatile memory) to architecture and to applications, which will serve as a guide to the future advances on computing in-memory (CIM).

11 citations


Cites background or methods from "An Analog Neural Network Computing ..."

  • ...We then survey recent computingin-memory research based on different types of NVM devices, including Flash memory, ReRAM, STT-MRAM, PCM, and the emerging non-volatile devices [38]....

    [...]

  • ...KEYWORDS In-memory computing (IMC), SRAM, NVM, deep learning, convolutional neural networks (CNNs), ACM Reference format: Yufei Ma, Yuan Du, Li Du, Jun Lin and Zhongfeng Wang....

    [...]

  • ...Ref [22] Ref [23] Ref [38] Ref [25] Ref [26] Ref [34] Ref [30]...

    [...]

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Journal ArticleDOI
01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

42,067 citations