scispace - formally typeset
Search or ask a question
Author

Sajid Anwar

Bio: Sajid Anwar is an academic researcher from Information Technology Institute. The author has contributed to research in topics: Software system & Deep learning. The author has an hindex of 16, co-authored 67 publications receiving 1862 citations. Previous affiliations of Sajid Anwar include Ghulam Ishaq Khan Institute of Engineering Sciences and Technology & Seoul National University.


Papers
More filters
Journal ArticleDOI
TL;DR: The proposed work shows that when pruning granularities are applied in combination, the CIFAR-10 network can be pruned by more than 70% with less than a 1% loss in accuracy.
Abstract: Real-time application of deep learning algorithms is often hindered by high computational complexity and frequent memory accesses. Network pruning is a promising technique to solve this problem. However, pruning usually results in irregular network connections that not only demand extra representation efforts but also do not fit well on parallel computation. We introduce structured sparsity at various scales for convolutional neural networks: feature map-wise, kernel-wise, and intra-kernel strided sparsity. This structured sparsity is very advantageous for direct computational resource savings on embedded computers, in parallel computing environments, and in hardware-based systems. To decide the importance of network connections and paths, the proposed method uses a particle filtering approach. The importance weight of each particle is assigned by assessing the misclassification rate with a corresponding connectivity pattern. The pruned network is retrained to compensate for the losses due to pruning. While implementing convolutions as matrix products, we particularly show that intra-kernel strided sparsity with a simple constraint can significantly reduce the size of the kernel and feature map tensors. The proposed work shows that when pruning granularities are applied in combination, we can prune the CIFAR-10 network by more than 70% with less than a 1% loss in accuracy.

476 citations

Posted Content
TL;DR: In this article, the importance weight of each particle is assigned by computing the misclassification rate with corresponding connectivity pattern, and the pruned network is re-trained to compensate for the losses due to pruning.
Abstract: Real time application of deep learning algorithms is often hindered by high computational complexity and frequent memory accesses. Network pruning is a promising technique to solve this problem. However, pruning usually results in irregular network connections that not only demand extra representation efforts but also do not fit well on parallel computation. We introduce structured sparsity at various scales for convolutional neural networks, which are channel wise, kernel wise and intra kernel strided sparsity. This structured sparsity is very advantageous for direct computational resource savings on embedded computers, parallel computing environments and hardware based systems. To decide the importance of network connections and paths, the proposed method uses a particle filtering approach. The importance weight of each particle is assigned by computing the misclassification rate with corresponding connectivity pattern. The pruned network is re-trained to compensate for the losses due to pruning. While implementing convolutions as matrix products, we particularly show that intra kernel strided sparsity with a simple constraint can significantly reduce the size of kernel and feature map matrices. The pruned network is finally fixed point optimized with reduced word length precision. This results in significant reduction in the total storage size providing advantages for on-chip memory based implementations of deep neural networks.

454 citations

Proceedings ArticleDOI
19 Apr 2015
TL;DR: The results indicate that quantization induces sparsity in the network which reduces the effective number of network parameters and improves generalization, and reduces the required memory storage by a factor of 1/10 and achieves better classification results than the high precision networks.
Abstract: Deep convolutional neural networks have shown promising results in image and speech recognition applications. The learning capability of the network improves with increasing depth and size of each layer. However this capability comes at the cost of increased computational complexity. Thus reduction in hardware complexity and faster classification are highly desired. This work proposes an optimization method for fixed point deep convolutional neural networks. The parameters of a pre-trained high precision network are first directly quantized using L2 error minimization. We quantize each layer one by one, while other layers keep computation with high precision, to know the layer-wise sensitivity on word-length reduction. Then the network is retrained with quantized weights. Two examples on object recognition, MNIST and CIFAR-10, are presented. Our results indicate that quantization induces sparsity in the network which reduces the effective number of network parameters and improves generalization. This work reduces the required memory storage by a factor of 1/10 and achieves better classification results than the high precision networks.

255 citations

Journal ArticleDOI
TL;DR: The empirical results demonstrate that the overall predictive performance of MTDF and rules-generation based on genetic algorithms performed the best as compared with the rest of the evaluated oversampling methods and rule-generation algorithms.
Abstract: Customer retention is a major issue for various service-based organizations particularly telecom industry, wherein predictive models for observing the behavior of customers are one of the great instruments in customer retention process and inferring the future behavior of the customers However, the performances of predictive models are greatly affected when the real-world data set is highly imbalanced A data set is called imbalanced if the samples size from one class is very much smaller or larger than the other classes The most commonly used technique is over/under sampling for handling the class-imbalance problem (CIP) in various domains In this paper, we survey six well-known sampling techniques and compare the performances of these key techniques, ie, mega-trend diffusion function (MTDF), synthetic minority oversampling technique, adaptive synthetic sampling approach, couples top-N reverse $k$ -nearest neighbor, majority weighted minority oversampling technique, and immune centroids oversampling technique Moreover, this paper also reveals the evaluation of four rules-generation algorithms (the learning from example module, version 2 (LEM2), covering, exhaustive, and genetic algorithms) using publicly available data sets The empirical results demonstrate that the overall predictive performance of MTDF and rules-generation based on genetic algorithms performed the best as compared with the rest of the evaluated oversampling methods and rule-generation algorithms

198 citations

Journal ArticleDOI
TL;DR: This study proposes an intelligent rule-based decision-making technique, based on rough set theory (RST), to extract important decision rules related to customer churn and non-churn, and shows that RST based on GA is the most efficient technique for extracting implicit knowledge in the form of decision rules from the publicly available, benchmark telecom dataset.

155 citations


Cited by
More filters
01 Jan 2002

9,314 citations

Proceedings Article
15 Feb 2016
TL;DR: Deep Compression as mentioned in this paper proposes a three-stage pipeline: pruning, quantization, and Huffman coding to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.
Abstract: Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we introduce "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy. Our method first prunes the network by learning only the important connections. Next, we quantize the weights to enforce weight sharing, finally, we apply Huffman coding. After the first two steps we retrain the network to fine tune the remaining connections and the quantized centroids. Pruning, reduces the number of connections by 9x to 13x; Quantization then reduces the number of bits that represent each connection from 32 to 5. On the ImageNet dataset, our method reduced the storage required by AlexNet by 35x, from 240MB to 6.9MB, without loss of accuracy. Our method reduced the size of VGG-16 by 49x from 552MB to 11.3MB, again with no loss of accuracy. This allows fitting the model into on-chip SRAM cache rather than off-chip DRAM memory. Our compression method also facilitates the use of complex neural networks in mobile applications where application size and download bandwidth are constrained. Benchmarked on CPU, GPU and mobile GPU, compressed network has 3x to 4x layerwise speedup and 3x to 7x better energy efficiency.

7,256 citations

Book ChapterDOI
08 Oct 2016
TL;DR: The Binary-Weight-Network version of AlexNet is compared with recent network binarization methods, BinaryConnect and BinaryNets, and outperform these methods by large margins on ImageNet, more than \(16\,\%\) in top-1 accuracy.
Abstract: We propose two efficient approximations to standard convolutional neural networks: Binary-Weight-Networks and XNOR-Networks. In Binary-Weight-Networks, the filters are approximated with binary values resulting in 32\(\times \) memory saving. In XNOR-Networks, both the filters and the input to convolutional layers are binary. XNOR-Networks approximate convolutions using primarily binary operations. This results in 58\(\times \) faster convolutional operations (in terms of number of the high precision operations) and 32\(\times \) memory savings. XNOR-Nets offer the possibility of running state-of-the-art networks on CPUs (rather than GPUs) in real-time. Our binary networks are simple, accurate, efficient, and work on challenging visual tasks. We evaluate our approach on the ImageNet classification task. The classification accuracy with a Binary-Weight-Network version of AlexNet is the same as the full-precision AlexNet. We compare our method with recent network binarization methods, BinaryConnect and BinaryNets, and outperform these methods by large margins on ImageNet, more than \(16\,\%\) in top-1 accuracy. Our code is available at: http://allenai.org/plato/xnornet.

3,288 citations

Journal ArticleDOI
20 Nov 2017
TL;DR: In this paper, the authors provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs, and discuss various hardware platforms and architectures that support DNN, and highlight key trends in reducing the computation cost of deep neural networks either solely via hardware design changes or via joint hardware and DNN algorithm changes.
Abstract: Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic codesigns, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the tradeoffs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.

2,391 citations