Open AccessPosted Content
EDEN: Enabling Energy-Efficient, High-Performance Deep Neural Network Inference Using Approximate DRAM.
Skanda Koppula,Lois Orosa,Abdullah Giray Yağlıkçı,Roknoddin Azizi,Taha Shahroodi,Konstantinos Kanellopoulos,Onur Mutlu +6 more
TLDR
EDEN as mentioned in this paper proposes a general framework that reduces DNN energy consumption and DNN evaluation latency by using approximate DRAM devices, while strictly meeting a user-specified target DNN accuracy.Abstract:
The effectiveness of deep neural networks (DNN) in vision, speech, and language processing has prompted a tremendous demand for energy-efficient high-performance DNN inference systems. Due to the increasing memory intensity of most DNN workloads, main memory can dominate the system's energy consumption and stall time. One effective way to reduce the energy consumption and increase the performance of DNN inference systems is by using approximate memory, which operates with reduced supply voltage and reduced access latency parameters that violate standard specifications. Using approximate memory reduces reliability, leading to higher bit error rates. Fortunately, neural networks have an intrinsic capacity to tolerate increased bit errors. This can enable energy-efficient and high-performance neural network inference using approximate DRAM devices.
Based on this observation, we propose EDEN, a general framework that reduces DNN energy consumption and DNN evaluation latency by using approximate DRAM devices, while strictly meeting a user-specified target DNN accuracy. EDEN relies on two key ideas: 1) retraining the DNN for a target approximate DRAM device to increase the DNN's error tolerance, and 2) efficient mapping of the error tolerance of each individual DNN data type to a corresponding approximate DRAM partition in a way that meets the user-specified DNN accuracy requirements.
We evaluate EDEN on multi-core CPUs, GPUs, and DNN accelerators with error models obtained from real approximate DRAM devices. For a target accuracy within 1% of the original DNN, our results show that EDEN enables 1) an average DRAM energy reduction of 21%, 37%, 31%, and 32% in CPU, GPU, and two DNN accelerator architectures, respectively, across a variety of DNNs, and 2) an average (maximum) speedup of 8% (17%) and 2.7% (5.5%) in CPU and GPU architectures, respectively, when evaluating latency-bound DNNs.read more
Citations
More filters
Journal ArticleDOI
Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead
Maurizio Capra,Beatrice Bussolino,Alberto Marchisio,Guido Masera,Maurizio Martina,Muhammad Shafique +5 more
TL;DR: This work summarizes and compares the works for four leading platforms for the execution of algorithms such as CPU, GPU, FPGA and ASIC describing the main solutions of the state-of-the-art, giving much prominence to the last two solutions since they offer greater design flexibility and bear the potential of high energy-efficiency, especially for the inference process.
Journal ArticleDOI
Robust Machine Learning Systems: Challenges,Current Trends, Perspectives, and the Road Ahead
Muhammad Shafique,Mahum Naseer,Theocharis Theocharides,Christos Kyrkou,Onur Mutlu,Lois Orosa,Jungwook Choi +6 more
TL;DR: Various challenges and probable solutions for security attacks on ML-inspired hardware and software techniques in smart cyber-physical systems (CPS) and Internet-of-Things (IoT).
Proceedings ArticleDOI
DSAGEN: synthesizing programmable spatial accelerators
TL;DR: The insight is that many prior accelerator architectures can be approximated by composing a small number of hardware primitives, specifically those from spatial architectures, which is used to develop the DSAGEN framework, which automates the hardware/software co-design process for reconfigurable accelerators.
Journal ArticleDOI
FT-CNN: Algorithm-Based Fault Tolerance for Convolutional Neural Networks
Kai Zhao,Sheng Di,Sihuan Li,Xin Liang,Yujia Zhai,Jieyang Chen,Kaiming Ouyang,Franck Cappello,Zizhong Chen +8 more
TL;DR: This article proposes several systematic ABFT schemes based on checksum techniques and analyzes their fault protection ability and runtime thoroughly, and designs a novel workflow integrating all the proposed schemes to obtain a high detection/correction ability with limited total runtime overhead.
Posted Content
FIGARO: Improving System Performance via Fine-Grained In-DRAM Data Relocation and Caching
Yaohua Wang,Lois Orosa,Xiangjun Peng,Yang Guo,Saugata Ghose,Minesh Patel,Jeremie S. Kim,Juan Gómez Luna,Mohammad Sadrosadati,Nika Mansouri Ghiasi,Onur Mutlu +10 more
TL;DR: A new substrate, FIGARO, is proposed that uses existing shared global buffers among subarrays within a DRAM bank to provide support for in-DRAM data relocation across subar-rays at the granularity of a single cache block, and it is shown that FIGCache outperforms state-of-the-art in- DRAM caching techniques, and that its performance gains are robust across many system and mechanism parameters.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Journal ArticleDOI
Deep learning
TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.