Proceedings ArticleDOI
A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors
Wei-Hao Chen,K. C. Li,Wei-Yu Lin,K. C. Hsu,Pin-Yi Li,Cheng-Han Yang,Cheng-Xin Xue,En-Yu Yang,Yen-Kai Chen,Yun-Sheng Chang,Tzu-Hsiang Hsu,Ya-Chin King,Chorng-Jung Lin,Ren-Shuo Liu,Chih-Cheng Hsieh,Kea-Tiong Tang,Meng-Fan Chang +16 more
- pp 494-496
TLDR
Many artificial intelligence (AI) edge devices use nonvolatile memory (NVM) to store the weights for the neural network (trained off-line on an AI server), and require low-energy and fast I/O accesses.Abstract:
Many artificial intelligence (AI) edge devices use nonvolatile memory (NVM) to store the weights for the neural network (trained off-line on an AI server), and require low-energy and fast I/O accesses. The deep neural networks (DNN) used by AI processors [1,2] commonly require p-layers of a convolutional neural network (CNN) and q-layers of a fully-connected network (FCN). Current DNN processors that use a conventional (von-Neumann) memory structure are limited by high access latencies, I/O energy consumption, and hardware costs. Large working data sets result in heavy accesses across the memory hierarchy, moreover large amounts of intermediate data are also generated due to the large number of multiply-and-accumulate (MAC) operations for both CNN and FCN. Even when binary-based DNN [3] are used, the required CNN and FCN operations result in a major memory I/O bottleneck for AI edge devices.read more
Citations
More filters
Journal ArticleDOI
SLIM: Simultaneous Logic-in-Memory Computing Exploiting Bilayer Analog OxRAM Devices.
TL;DR: This paper proposes a novel ‘Simultaneous Logic in-Memory’ (SLIM) methodology which is complementary to existing LIM approaches in literature and demonstrates novel SLIM bitcells comprising non-filamentary bilayer analog OxRAM devices with NMOS transistors.
Journal ArticleDOI
Neuro-inspired computing chips
Wenqiang Zhang,Bin Gao,Jianshi Tang,Peng Yao,Shimeng Yu,Meng-Fan Chang,Hoi-Jun Yoo,He Qian,Huaqiang Wu +8 more
TL;DR: The development of neuro-inspired computing chips and their key benchmarking metrics are reviewed, providing a co-design tool chain and proposing a roadmap for future large-scale chips are provided and a future electronic design automation tool chain is proposed.
Journal ArticleDOI
Reinforcement learning with analogue memristor arrays
Zhongrui Wang,Can Li,Wenhao Song,Mingyi Rao,Daniel Belkin,Yunning Li,Peng Yan,Hao Jiang,Peng Lin,Miao Hu,John Paul Strachan,Ning Ge,Mark Barnell,Qing Wu,Andrew G. Barto,Qinru Qiu,R. Stanley Williams,Qiangfei Xia,Jianhua Yang +18 more
TL;DR: An experimental demonstration of reinforcement learning on a three-layer 1-transistor 1-memristor (1T1R) network using a modified learning algorithm tailored for the authors' hybrid analogue–digital platform, which has the potential to achieve a significant boost in speed and energy efficiency.
Proceedings ArticleDOI
24.1 A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing Time for CNN Based AI Edge Processors
Cheng-Xin Xue,Wei-Hao Chen,Je-Syu Liu,Jiafang Li,Wei-Yu Lin,Wei-En Lin,Jing-Hong Wang,Wei-Chen Wei,Ting-Wei Chang,Tung-Cheng Chang,Tsung-Yuan Huang,Hui-Yao Kao,Shih-Ying Wei,Yen-Cheng Chiu,Chun-Ying Lee,Chung-Chuan Lo,Ya-Chin King,Chorng-Jung Lin,Ren-Shuo Liu,Chih-Cheng Hsieh,Kea-Tiong Tang,Meng-Fan Chang +21 more
TL;DR: This work proposes a serial-input non-weighted product (SINWP) structure to optimize the tradeoff between area, tMAC and EMAC, and a down-scaling weighted current translator and positive-negative current- subtractor (PN-ISUB) for short delay, a small offset and a compact read-path area.
Journal ArticleDOI
Three-dimensional memristor circuits as complex neural networks
Peng Lin,Peng Lin,Can Li,Zhongrui Wang,Yunning Li,Hao Jiang,Wenhao Song,Mingyi Rao,Ye Zhuo,Navnidhi K. Upadhyay,Mark Barnell,Qing Wu,Jianhua Yang,Qiangfei Xia +13 more
TL;DR: A three-dimensional circuit composed of eight layers of monolithically integrated memristive devices is built and used to implement complex neural networks, demonstrating accurate MNIST classification and effective edge detection in videos.
References
More filters
Journal ArticleDOI
PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory
TL;DR: This work proposes a novel PIM architecture, called PRIME, to accelerate NN applications in ReRAM based main memory, and distinguishes itself from prior work on NN acceleration, with significant performance improvement and energy saving.
Proceedings ArticleDOI
14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks
TL;DR: A highly reconfigurable CNN-RNN processor with high energy-efficiency is desirable to support general-purpose deep neural networks (DNNs).
Proceedings ArticleDOI
14.4 A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating
TL;DR: IC designs for ASR and VAD are described that improve on the accuracy, programmability, and scalability of previous work.
Proceedings ArticleDOI
A 462GOPs/J RRAM-based nonvolatile intelligent processor for energy harvesting IoE system featuring nonvolatile logics and processing-in-memory
Fang Su,Wei-Hao Chen,Lixue Xia,Chieh-Pu Lo,Tianqi Tang,Zhibo Wang,K. C. Hsu,Ming Cheng,Jun-Yi Li,Yuan Xie,Yu Wang,Meng-Fan Chang,Huazhong Yang,Yongpan Liu +13 more
TL;DR: This work presents the first nonvolatile processor capable of general as well as neural network computing in addition to the first integrated chip using RRAM-based PIM.
Proceedings ArticleDOI
An offset-tolerant current-sampling-based sense amplifier for Sub-100nA-cell-current nonvolatile memory
Meng-Fan Chang,Shin-Jang Shen,Chia-Chi Liu,Che-Wei Wu,Yu-Fan Lin,Shang-Chi Wu,Chia-En Huang,Han-Chao Lai,Ya-Chin King,Chorng-Jung Lin,Hung-jen Liao,Yu-Der Chih,Hiroyuki Yamauchi +12 more
TL;DR: This study proposes a new offset tolerant current-sampling-based SA (CSB-SA) to achieve 7× faster read speed than previous SAs for sensing small ICELL, and achieves 26ns macro random access time for reading sub-200nA ICELL.