scispace - formally typeset
Search or ask a question

Showing papers by "Kapil Ahuja published in 2020"


Journal ArticleDOI
TL;DR: This paper proposes a novel and scalable technique with two different modes for the quantization of the parameters of pre-trained neural networks, which represents parameters in powers of 2, thereby eliminating the need for resource-computationally intensive multiplier units for the hardware accelerators of the neural networks.
Abstract: Deep Neural Networks are one of the machine learning techniques which are increasingly used in a variety of applications. However, the significantly high memory and computation demands of deep neural networks often limit their deployment on embedded systems. Many recent works have considered this problem by proposing different types of data quantization schemes. However, most of these techniques either require post-quantization retraining of deep neural networks or bear a significant loss in output accuracy. In this paper, we propose a novel and scalable technique with two different modes for the quantization of the parameters of pre-trained neural networks. In the first mode, referred to as log_2_lead , we use a single template for the quantization of all parameters. In the second mode, denoted as ALigN , we analyze the trained parameters of each layer and adaptively adjust the quantization template to achieve even higher accuracy. Our technique significantly maintains the accuracy of the parameters and does not require retraining of the networks. Moreover, it supports quantization to an arbitrary bit-size. For example, compared to the single-precision floating-point numbers-based implementation, our proposed 8-bit quantization technique generates only $\sim 0.2\%$ and $\sim 0.1\%$ , loss in the Top-1 and Top-5 accuracies respectively for VGG-16 network using ImageNet dataset. We have observed similar minimal losses in the Top-1 and Top-5 accuracies for AlexNet and Resnet-18 using the proposed quantization scheme for the 8-bit range. Our proposed quantization technique also provides a higher mean intersection over union for semantic segmentation when compared with state-of-the-art quantization techniques. The proposed technique represents parameters in powers of 2, thereby eliminating the need for resource-computationally intensive multiplier units for the hardware accelerators of the neural networks. We also present a design for implementing the multiplication operation using bit-shifts and addition for the proposed quantization technique.

9 citations


Journal ArticleDOI
TL;DR: This work proposes the use of the block variant of the problem-dependent underlying iterative method, a technique to cheaply update the SPAI preconditioner, while solving parametrically changing linear systems.
Abstract: The main computational cost of algorithms for computing reduced-order models of parametric dynamical systems is in solving sequences of very large and sparse linear systems of equations, which are ...

6 citations


Proceedings ArticleDOI
09 Mar 2020
TL;DR: This paper proposes a novel quantization technique for parameters of pre-trained deep neural networks that significantly maintains the accuracy of the parameters and does not require retraining of the networks.
Abstract: Deep Neural Networks are one of the machine learning techniques which are increasingly used in a variety of applications. However, the significantly high memory and computation demands of deep neural networks often limit their deployment on embedded systems. Many recent works have considered this problem by proposing different types of data quantization schemes. However, most of these techniques either require post-quantization retraining of deep neural networks or bear a significant loss in output accuracy. In this paper, we propose a novel quantization technique for parameters of pre-trained deep neural networks. Our technique significantly maintains the accuracy of the parameters and does not require retraining of the networks. Compared to the single-precision floating-point numbers-based implementation, our proposed 8bit quantization technique generates only ~ 1% and ~ 0.4%, loss in the top-1 and top-5 accuracies respectively for VGG16 network using ImageNet dataset.

5 citations


Journal ArticleDOI
TL;DR: In this article, the impact of link formation between a pair of agents on the resource availability of other agents in a social cloud network, a special case of end-to-end networks, is investigated.
Abstract: This paper investigates the impact of link formation between a pair of agents on the resource availability of other agents (that is, externalities) in a social cloud network, a special case of endo...

4 citations


Journal ArticleDOI
TL;DR: In this article, Singh et al. showed that using preconditioners is an art via detailed algorithmic implementations in multiple model order reduction (MOR) algorithms and showed that reusing preconditions for reducing a real-life industrial problem (of size 1.2 million), leads to relative savings of up to 64 % in the total computation time (in absolute terms a saving of 5 days).
Abstract: Dynamical systems are pervasive in almost all engineering and scientific applications. Simulating such systems is computationally very intensive. Hence, Model Order Reduction (MOR) is used to reduce them to a lower dimension. Most of the MOR algorithms require solving large sparse sequences of linear systems. Since using direct methods for solving such systems does not scale well in time with respect to the increase in the input dimension, efficient preconditioned iterative methods are commonly used. In one of our previous works, we have shown substantial improvements by reusing preconditioners for the parametric MOR (Singh et al. 2019). Here, we had proposed techniques for both, the non-parametric and the parametric cases, but had applied them only to the latter. We have three main contributions here. First, we demonstrate that preconditioners can be reused more effectively in the non-parametric case as compared to the parametric one. Second, we show that reusing preconditioners is an art via detailed algorithmic implementations in multiple MOR algorithms. Third and final, we demonstrate that reusing preconditioners for reducing a real-life industrial problem (of size 1.2 million), leads to relative savings of up to 64 % in the total computation time (in absolute terms a saving of 5 days).

3 citations


Journal ArticleDOI
TL;DR: The concept of bilateral stability is proposed which refines the pairwise stability concept defined by Jackson and Wolinsky, by requiring mutual consent for both addition and deletion of links, as compared to mutual consent just for link addition.
Abstract: Social storage systems are a good alternative to existing data backup systems of local, centralized, and P2P backup. Till date, researchers have mostly focussed on either building such systems by using existing underlying social networks (exogenously built) or on studying quality of service related issues. In this paper, we look at two untouched aspects of social storage systems. One aspect involves modelling social storage as an endogenous social network, where agents themselves decide with whom they want to build data backup relation, which is more intuitive than exogenous social networks. The second aspect involves studying the stability of social storage systems, which would help reduce maintenance costs and further, help build efficient as well as contented networks. We have a four fold contribution that covers the above two aspects. We, first, model the social storage system as a strategic network formation game. We define the utility of each agent in the network under two different frameworks, one where the cost to add and maintain links is considered in the utility function and the other where budget constraints are considered. In the context of social storage and social cloud computing, these utility functions are the first of its kind, and we use them to define and analyse the social storage network game. Second, we propose the concept of bilateral stability which refines the pairwise stability concept defined by Jackson and Wolinsky (J Econ Theory 71(1):44–74, 1996), by requiring mutual consent for both addition and deletion of links, as compared to mutual consent just for link addition. Mutual consent for link deletion is especially important in the social storage setting. The notion of bilateral stability subsumes the bilateral equilibrium definition of Goyal and Vega-Redondo (J Econ Theory 137(1):460–492, 2007). Third, we prove necessary and the sufficient conditions for bilateral stability of social storage networks. For symmetric social storage networks, we prove that there exists a unique neighborhood size, independent of the number of agents (for all non-trivial cases), where no pair of agents has any incentive to increase or decrease their neighborhood size. We call this neighborhood size as the stability point. Fourth, given the number of agents and other parameters, we discuss which bilaterally stable networks would evolve and also discuss which of these stable networks are efficient—that is, stable networks with maximum sum of utilities of all agents. We also discuss ways to build contented networks, where each agent achieves the maximum possible utility.

2 citations


Posted Content
TL;DR: Spectral Clustering (SC) algorithm with Pivotal Sampling achieves substantially more accuracy than all the other proposed competitive clustering with sampling algorithms (i.e. SC with VQ), and outperforms the standard HC algorithm in both accuracy and computational complexity.
Abstract: Clustering genotypes based upon their phenotypic characteristics is used to obtain diverse sets of parents that are useful in their breeding programs. The Hierarchical Clustering (HC) algorithm is the current standard in clustering of phenotypic data. This algorithm suffers from low accuracy and high computational complexity issues. To address the accuracy challenge, we propose the use of Spectral Clustering (SC) algorithm. To make the algorithm computationally cheap, we propose using sampling, specifically, Pivotal Sampling that is probability based. Since application of samplings to phenotypic data has not been explored much, for effective comparison, another sampling technique called Vector Quantization (VQ) is adapted for this data as well. VQ has recently given promising results for genome data. The novelty of our SC with Pivotal Sampling algorithm is in constructing the crucial similarity matrix for the clustering algorithm and defining probabilities for the sampling technique. Although our algorithm can be applied to any plant genotypes, we test it on the phenotypic data obtained from about 2400 Soybean genotypes. SC with Pivotal Sampling achieves substantially more accuracy (in terms of Silhouette Values) than all the other proposed competitive clustering with sampling algorithms (i.e. SC with VQ, HC with Pivotal Sampling, and HC with VQ). The complexities of our SC with Pivotal Sampling algorithm and these three variants are almost same because of the involved sampling. In addition to this, SC with Pivotal Sampling outperforms the standard HC algorithm in both accuracy and computational complexity. We experimentally show that we are up to 45% more accurate than HC in terms of clustering accuracy. The computational complexity of our algorithm is more than a magnitude lesser than HC.

1 citations


Posted Content
TL;DR: It is demonstrated that preconditioners can be reused more effectively in the non-parametric case as compared to the parametric one, and it is shown that reusing preconditionsers is an art via detailed algorithmic implementations in multiple MOR algorithms.
Abstract: Dynamical systems are pervasive in almost all engineering and scientific applications. Simulating such systems is computationally very intensive. Hence, Model Order Reduction (MOR) is used to reduce them to a lower dimension. Most of the MOR algorithms require solving large sparse sequences of linear systems. Since using direct methods for solving such systems does not scale well in time with respect to the increase in the input dimension, efficient preconditioned iterative methods are commonly used. In one of our previous works, we have shown substantial improvements by reusing preconditioners for the parametric MOR (Singh et al. 2019). Here, we had proposed techniques for both, the non-parametric and the parametric cases, but had applied them only to the latter. We have four main contributions here. First, we demonstrate that preconditioners can be reused more effectively in the non-parametric case as compared to the parametric one because of the lack of parameters in the former. Second, we show that reusing preconditioners is an art and it needs to be fine-tuned for the underlying MOR algorithm. Third, we describe the pitfalls in the algorithmic implementation of reusing preconditioners. Fourth, and final, we demonstrate this theory on a real life industrial problem (of size 1.2 million), where savings of upto 64% in the total computation time is obtained by reusing preconditioners. In absolute terms, this leads to a saving of 5 days.

1 citations


Posted Content
TL;DR: A set of novel Lagrange heuristics that improve the Lagrange relaxation process are introduced that lead to halving of the constraints violation, up to 10% improvement in the minimum channel width, and up to 8% reduction in the critical path delay as obtained from ParaLarPD.
Abstract: Routing of the nets in Field Programmable Gate Array (FPGA) design flow is one of the most time consuming steps. Although Versatile Place and Route (VPR), which is a commonly used algorithm for this purpose, routes effectively, it is slow in execution. One way to accelerate this design flow is to use parallelization. Since VPR is intrinsically sequential, a set of parallel algorithms have been recently proposed for this purpose (ParaLaR and ParaLarPD). These algorithms formulate the routing process as a Linear Program (LP) and solve it using the Lagrange relaxation, the sub-gradient method, and the Steiner tree algorithm. Out of the many metrics available to check the effectiveness of routing, ParaLarPD, which is an improved version of ParaLaR, suffers from large violations in the constraints of the LP problem (which is related to the minimum channel width metric) as well as an easily measurable critical path delay metric that can be improved further. In this paper, we introduce a set of novel Lagrange heuristics that improve the Lagrange relaxation process. When tested on the MCNC benchmark circuits, on an average, this leads to halving of the constraints violation, up to 10% improvement in the minimum channel width, and up to 8% reduction in the critical path delay as obtained from ParaLarPD. We term our new algorithm as ParaLarH. Due to the increased work in the Lagrange relaxation process, as compared to ParaLarPD, ParaLarH does slightly deteriorate the speedup obtained because of parallelization, however, this aspect is easily compensated by using more number of threads.

1 citations