Open AccessPosted Content
To prune, or not to prune: exploring the efficacy of pruning for model compression
Michael H. Zhu,Suyog Gupta +1 more
Reads0
Chats0
TLDR
In this article, the authors investigate two distinct paths for model compression within the context of energy-efficient inference in resource-constrained environments and propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with minimal tuning.Abstract:
Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model. Recent reports (Han et al., 2015; Narang et al., 2017) prune deep networks at the cost of only a marginal loss in accuracy and achieve a sizable reduction in model size. This hints at the possibility that the baseline models in these experiments are perhaps severely over-parameterized at the outset and a viable alternative for model compression might be to simply reduce the number of hidden units while maintaining the model's dense connection structure, exposing a similar trade-off in model size and accuracy. We investigate these two distinct paths for model compression within the context of energy-efficient inference in resource-constrained environments and propose a new gradual pruning technique that is simple and straightforward to apply across a variety of models/datasets with minimal tuning and can be seamlessly incorporated within the training process. We compare the accuracy of large, but pruned models (large-sparse) and their smaller, but dense (small-dense) counterparts with identical memory footprint. Across a broad range of neural network architectures (deep CNNs, stacked LSTM, and seq2seq LSTM models), we find large-sparse models to consistently outperform small-dense models and achieve up to 10x reduction in number of non-zero parameters with minimal loss in accuracy.read more
Citations
More filters
Journal ArticleDOI
Deep Learning Models Compression for Agricultural Plants
TL;DR: This work tackles the challenges related to the resource limitation by compressing some state-of-the-art models very often used in image classification, applies model pruning and quantization to LeNet5, VGG16, and AlexNet, and reveals that it is possible to compress the size of these models by a factor of 38 and to reduce the FLOPs of VGG15 without considerable loss of accuracy.
Proceedings ArticleDOI
Pruning Depthwise Separable Convolutions for MobileNet Compression
TL;DR: A technique to gradually prune the depthwise separable convolution networks, such as MobileNet, for improving the speed of this kind of "dense" network and achieves satisfiable speedup with little accuracy drop for MobileNets.
Journal ArticleDOI
Eight pruning deep learning models for low storage and high-speed COVID-19 computed tomography lung segmentation and heatmap-based lesion localization: A multicenter study using COVLIAS 2.0
Mohit Agarwal,Sushant Agarwal,Luca Saba,Gian Luca Chabert,Suneet K. Gupta,Alessandro Carriero,Alessio Paschè,Pietro Danna,Armin Mehmedović,Gavino Faa,Saurabh Shrivastava,Kanishka D Jain,Harsh Jain,Tanay Jujaray,Inder M. Singh,Monika Turk,Paramjit S. Chadha,Amer M. Johri,Narendra N. Khanna,Sophie Mavrogeni,John R. Laird,David W. Sobel,Martin Miner,Antonella Balestrieri,Petros P. Sfikakis,George Tsoulfas,Durga Prasanna Misra,Vikas Agarwal,George D. Kitas,Jagjit S Teji,Mustafa Al-Maini,Surinder Dhanjil,Andrew Nicolaides,Aditya Sharma,Vijay Rathore,Mostafa Fatemi,Azra Alizad,P. R. Krishnan,Rajanikant R Yadav,F. Nagy,Zsigmond Tamás Kincses,Zoltán Ruzsa,Subbaram Naidu,Klaudija Višković,Manudeep Kalra,Jasjit S. Suri +45 more
TL;DR: In this article , the authors proposed COVLIAS 2.0 using pruned AI (PAI) networks for improving both storage and speed, wiliest high performance on lung segmentation and lesion localization.
Proceedings Article
Rare Gems: Finding Lottery Tickets at Initialization
Kartik K. Sreenivasan,Jy-yong Sohn,Liu Yang,Matthew Grinde,Alliot Nagle,Hongyi Wang,Kangwook Lee,Dimitris S. Papailiopoulos +7 more
TL;DR: G EM -M INER is proposed, which proposes lottery tickets at initialization that beat current baselines that train to better accuracy compared to simple baselines, and does so up to 19 × faster.
Proceedings Article
Directional Pruning of Deep Neural Networks
TL;DR: In this article, the authors proposed a directional pruning method which searches for a sparse minimizer in or close to the flat region of the training loss, which does not require retraining or expert knowledge on the sparsity level.
References
More filters
Posted Content
Rethinking the Inception Architecture for Computer Vision
TL;DR: This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.
Posted Content
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew Howard,Menglong Zhu,Bo Chen,Dmitry Kalenichenko,Weijun Wang,Tobias Weyand,M. Andreetto,Hartwig Adam +7 more
TL;DR: This work introduces two simple global hyper-parameters that efficiently trade off between latency and accuracy and demonstrates the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.
Proceedings Article
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
TL;DR: Deep Compression as mentioned in this paper proposes a three-stage pipeline: pruning, quantization, and Huffman coding to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.
Posted Content
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu,Mike Schuster,Zhifeng Chen,Quoc V. Le,Mohammad Norouzi,Wolfgang Macherey,Maxim Krikun,Yuan Cao,Qin Gao,Klaus Macherey,Jeff Klingner,Apurva Shah,Melvin Johnson,Xiaobing Liu,Łukasz Kaiser,Stephan Gouws,Yoshikiyo Kato,Taku Kudo,Hideto Kazawa,Keith Stevens,George Kurian,Nishant Patil,Wei Wang,Cliff Young,Jason A. Smith,Jason Riesa,Alex Rudnick,Oriol Vinyals,Greg S. Corrado,Macduff Hughes,Jeffrey Dean +30 more
TL;DR: GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models.
Proceedings Article
Learning both weights and connections for efficient neural networks
TL;DR: In this paper, the authors proposed a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections using a three-step method.