scispace - formally typeset
Search or ask a question
Author

Sanchari Sen

Other affiliations: IBM
Bio: Sanchari Sen is an academic researcher from Purdue University. The author has contributed to research in topics: Artificial neural network & Spiking neural network. The author has an hindex of 7, co-authored 20 publications receiving 143 citations. Previous affiliations of Sanchari Sen include IBM.

Papers
More filters
Proceedings ArticleDOI
27 Mar 2017
TL;DR: This work proposes AxSNN, the first effort to apply approximate computing to improve the computational efficiency of evaluating SNNs, and designs SNNAP, a Spiking Neural Network Approximate Processor that embodies the proposed approximation strategy, and synthesized it to 45nm technology.
Abstract: Spiking Neural Networks (SNNs) are widely regarded as the third generation of artificial neural networks, and are expected to drive new classes of recognition, data analytics and computer vision applications. However, large-scale SNNs (e.g., of the scale of the human visual cortex) are highly compute and data intensive, requiring new approaches to improve their efficiency. Complementary to prior efforts that focus on parallel software and the design of specialized hardware, we propose AxSNN, the first effort to apply approximate computing to improve the computational efficiency of evaluating SNNs. In SNNs, the inputs and outputs of neurons are encoded as a time series of spikes. A spike at a neuron's output triggers updates to the potentials (internal states) of neurons to which it is connected. AxSNN determines spike-triggered neuron updates that can be skipped with little or no impact on output quality and selectively skips them to improve both compute and memory energy. Neurons that can be approximated are identified by utilizing various static and dynamic parameters such as the average spiking rates and current potentials of neurons, and the weights of synaptic connections. Such a neuron is placed into one of many approximation modes, wherein the neuron is sensitive only to a subset of its inputs and sends spikes only to a subset of its outputs. A controller periodically updates the approximation modes of neurons in the network to achieve energy savings with minimal loss in quality. We apply AxSNN to both hardware and software implementations of SNNs. For hardware evaluation, we designed SNNAP, a Spiking Neural Network Approximate Processor that embodies the proposed approximation strategy, and synthesized it to 45nm technology. The software implementation of AxSNN was evaluated on a 2.7 GHz Intel Xeon server with 128 GB memory. Across a suite of 6 image recognition benchmarks, AxSNN achieves 1.4–5.5x reduction in scalar operations for network evaluation, which translates to 1.2–3.62x and 1.26–3.9x improvement in hardware and software energies respectively, for no loss in application quality. Progressively higher energy savings are achieved with modest reductions in output quality.

44 citations

Proceedings ArticleDOI
14 Jun 2021
TL;DR: RaPiD1 as mentioned in this paper is a 4-core AI accelerator chip supporting a spectrum of precisions, namely, 16 and 8-bit floating-point and 4 and 2-bit fixed-point.
Abstract: The growing prevalence and computational demands of Artificial Intelligence (AI) workloads has led to widespread use of hardware accelerators in their execution. Scaling the performance of AI accelerators across generations is pivotal to their success in commercial deployments. The intrinsic error-resilient nature of AI workloads present a unique opportunity for performance/energy improvement through precision scaling. Motivated by the recent algorithmic advances in precision scaling for inference and training, we designed RaPiD1, a 4-core AI accelerator chip supporting a spectrum of precisions, namely, 16 and 8-bit floating-point and 4 and 2-bit fixed-point. The 36mm2 RaPiD chip fabricated in 7nm EUV technology delivers a peak 3.5 TFLOPS/W in HFP8 mode and 16.5 TOPS/W in INT4 mode at nominal voltage. Using a performance model calibrated to within 1% of the measurement results, we evaluated DNN inference using 4-bit fixed-point representation for a 4-core 1 RaPiD chip system and DNN training using 8-bit floating point representation for a 768 TFLOPs AI system comprising 4 32-core RaPiD chips. Our results show INT4 inference for batch size of 1 achieves 3 - 13.5 (average 7) TOPS/W and FP8 training for a mini-batch of 512 achieves a sustained 102 - 588 (average 203) TFLOPS across a wide range of applications.

42 citations

Posted Content
TL;DR: This work proposes EMPIR, ensembles of quantized DNN models with different numerical precisions, as a new approach to increase robustness against adversarial attacks, and indicates that EMPIR boosts the average adversarial accuracies by 43.6%, 15.3% and 11.9% when compared to single full-precision models, without sacrificing accuracy on the unperturbed inputs.
Abstract: Ensuring robustness of Deep Neural Networks (DNNs) is crucial to their adoption in safety-critical applications such as self-driving cars, drones, and healthcare. Notably, DNNs are vulnerable to adversarial attacks in which small input perturbations can produce catastrophic misclassifications. In this work, we propose EMPIR, ensembles of quantized DNN models with different numerical precisions, as a new approach to increase robustness against adversarial attacks. EMPIR is based on the observation that quantized neural networks often demonstrate much higher robustness to adversarial attacks than full precision networks, but at the cost of a substantial loss in accuracy on the original (unperturbed) inputs. EMPIR overcomes this limitation to achieve the 'best of both worlds', i.e., the higher unperturbed accuracies of the full precision models combined with the higher robustness of the low precision models, by composing them in an ensemble. Further, as low precision DNN models have significantly lower computational and storage requirements than full precision models, EMPIR models only incur modest compute and memory overheads compared to a single full-precision model (<25% in our evaluations). We evaluate EMPIR across a suite of DNNs for 3 different image recognition tasks (MNIST, CIFAR-10 and ImageNet) and under 4 different adversarial attacks. Our results indicate that EMPIR boosts the average adversarial accuracies by 42.6%, 15.2% and 10.5% for the DNN models trained on the MNIST, CIFAR-10 and ImageNet datasets respectively, when compared to single full-precision models, without sacrificing accuracy on the unperturbed inputs.

39 citations

Journal ArticleDOI
TL;DR: The proposed AxLSTM, an application of approximate computing to improve the execution efficiency of LSTMs, is implemented within the TensorFlow deep learning framework and evaluated on 3 state-of-the-art sequence-to-sequence models.
Abstract: Long Short Term Memory (LSTM) networks are a class of recurrent neural networks that are widely used for machine learning tasks involving sequences, including machine translation, text generation, and speech recognition. Large-scale LSTMs, which are deployed in many real-world applications, are highly compute intensive. To address this challenge, we propose AxLSTM, an application of approximate computing to improve the execution efficiency of LSTMs. An LSTM is composed of cells, each of which contains a cell state along with multiple gating units that control the addition and removal of information from the state. The LSTM execution proceeds in timesteps, with a new symbol of the input sequence processed at each timestep. AxLSTM consists of two techniques—Dynamic Timestep Skipping (DTS) and Dynamic State Reduction (DSR). DTS identifies, at runtime, input symbols that are likely to have little or no impact on the cell state and skips evaluating the corresponding timesteps. In contrast, DSR reduces the size of the cell state in accordance with the complexity of the input sequence, leading to a reduced number of computations per timestep. We describe how AxLSTM can be applied to the most common application of LSTMs, viz. , sequence-to-sequence learning. We implement AxLSTM within the TensorFlow deep learning framework and evaluate it on 3 state-of-the-art sequence-to-sequence models. On a 2.7 GHz Intel Xeon server with 128 GB memory and 32 processor cores, AxLSTM achieves $ {1.08\times -1.31 \times }$ speedups with minimal loss in quality, and $ {1.12 \times -1.37 \times }$ speedups when moderate reductions in quality are acceptable.

39 citations

Journal ArticleDOI
TL;DR: This work proposes Sparsity-aware Core Extensions (SparCE) - a set of low-overhead micro-architectural and ISA extensions that dynamically detect whether an operand is zero and subsequently skip aSet of future instructions that use it, and improves the performance of DNNs on general-purpose processor (GPP) cores.
Abstract: Deep Neural Networks (DNNs) have emerged as the method of choice for solving a wide range of machine learning tasks. The enormous computational demand posed by DNNs is a key challenge for computing system designers and has most commonly been addressed through the design of DNN accelerators. However, these specialized accelerators utilize large quantities of multiply-accumulate units and on-chip memory and are prohibitive in area and cost constrained systems such as wearable devices and IoT sensors. In this work, we take a complementary approach and improve the performance of DNNs on general-purpose processor (GPP) cores. We do so by exploiting a key attribute of DNNs, viz. sparsity or the prevalence of zero values. We propose Sparsity-aware Core Extensions (SparCE) - a set of low-overhead micro-architectural and ISA extensions that dynamically detect whether an operand (e.g., the result of a load instruction) is zero and subsequently skip a set of future instructions that use it. To maximize performance benefits, SparCE ensures that the instructions to be skipped are prevented from even being fetched, as squashing instructions comes with a penalty (e.g., a pipeline stall). SparCE consists of 2 key micro-architectural enhancements. First, a Sparsity Register File (SpRF) is utilized to track registers that are zero. Next, a Sparsity-Aware Skip Address (SASA) Table is used to indicate instruction sequences that can be skipped, and to specify conditions on SpRF registers that trigger instruction skipping. When an instruction is fetched, SparCE dynamically pre-identifies whether the following instruction(s) can be skipped, and if so appropriately modifies the program counter, thereby skipping the redundant instructions and improving performance. We model SparCE using the gem5 architectural simulator, and evaluate our approach on 6 state-of-the-art image-recognition DNNs in the context of both training and inference using the Caffe deep learning framework. On a scalar microprocessor, SparCE achieves 1.11×-1.96× speedups across both convolution and fully-connected layers that exhibit 10-90 percent sparsity. These speedups translate to 19-31 percent reduction in execution time at the overall application-level. We also evaluate SparCE on a 4-way SIMD ARMv8 processor using the OpenBLAS library, and demonstrate that SparCE achieves 8-15 percent reduction in the application-level execution time.

28 citations


Cited by
More filters
Posted Content
TL;DR: In this article, the authors demonstrate that adaptive attacks can be circumvented despite attempting to perform evaluations using adaptive attacks, and provide guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus allow the community to make further progress in building more robust models.
Abstract: Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS---and chosen for illustrative and pedagogical purposes---can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result---showing that a defense was ineffective---this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.

398 citations

Journal ArticleDOI
TL;DR: In this article, the authors present the state of the art of hardware implementations of spiking neural networks and the current trends in algorithm elaboration from model selection to training mechanisms, and describe the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level.
Abstract: Neuromorphic computing is henceforth a major research field for both academic and industrial actors. As opposed to Von Neumann machines, brain-inspired processors aim at bringing closer the memory and the computational elements to efficiently evaluate machine learning algorithms. Recently, spiking neural networks, a generation of cognitive algorithms employing computational primitives mimicking neuron and synapse operational principles, have become an important part of deep learning. They are expected to improve the computational performance and efficiency of neural networks, but they are best suited for hardware able to support their temporal dynamics. In this survey, we present the state of the art of hardware implementations of spiking neural networks and the current trends in algorithm elaboration from model selection to training mechanisms. The scope of existing solutions is extensive; we thus present the general framework and study on a case-by-case basis the relevant particularities. We describe the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level and discuss their related advantages and challenges.

112 citations

Journal ArticleDOI
TL;DR: This survey presents the state of the art of hardware implementations of spiking neural networks and the current trends in algorithm elaboration from model selection to training mechanisms and describes the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level.
Abstract: Neuromorphic computing is henceforth a major research field for both academic and industrial actors. As opposed to Von Neumann machines, brain-inspired processors aim at bringing closer the memory and the computational elements to efficiently evaluate machine-learning algorithms. Recently, Spiking Neural Networks, a generation of cognitive algorithms employing computational primitives mimicking neuron and synapse operational principles, have become an important part of deep learning. They are expected to improve the computational performance and efficiency of neural networks, but are best suited for hardware able to support their temporal dynamics. In this survey, we present the state of the art of hardware implementations of spiking neural networks and the current trends in algorithm elaboration from model selection to training mechanisms. The scope of existing solutions is extensive; we thus present the general framework and study on a case-by-case basis the relevant particularities. We describe the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level and discuss their related advantages and challenges.

73 citations

Journal ArticleDOI
TL;DR: Pipe-it develops a performance-prediction model that utilizes only the convolutional layer descriptors to predict the execution time of each layer individually on all permitted core configurations (type and count), and exploits the predictions to create a balanced pipeline using an efficient design space exploration algorithm.
Abstract: Internet of Things edge intelligence requires convolutional neural network (CNN) inference to take place in the edge devices itself. ARM big.LITTLE architecture is at the heart of prevalent commercial edge devices. It comprises of single-ISA heterogeneous cores grouped into multiple homogeneous clusters that enable power and performance tradeoffs. All cores are expected to be simultaneously employed in inference to attain maximal throughput. However, high communication overhead involved in parallelization of computations from convolution kernels across clusters is detrimental to throughput. We present an alternative framework called Pipe-it that employs pipelined design to split convolutional layers across clusters while limiting parallelization of their respective kernels to the assigned cluster. We develop a performance-prediction model that utilizes only the convolutional layer descriptors to predict the execution time of each layer individually on all permitted core configurations (type and count). Pipe-it then exploits the predictions to create a balanced pipeline using an efficient design space exploration algorithm. Pipe-it on average results in a 39% higher throughput than the highest antecedent throughput.

73 citations

Journal ArticleDOI
TL;DR: In this article , a hierarchical adversarial attack (HAA) generation method is introduced to realize the level-aware black-box adversary attack strategy, targeting the graph neural network (GNN)-based intrusion detection in IoT systems with a limited budget.
Abstract: The advancement of Internet of Things (IoT) technologies leads to a wide penetration and large-scale deployment of IoT systems across an entire city or even country. While IoT systems are capable of providing intelligent services, the large amount of data collected and processed in IoT systems also raises serious security concerns. Many research efforts have been devoted to design intelligent network intrusion detection system (NIDS) to prevent misuse of IoT data across smart applications. However, existing approaches may suffer from the issue of limited and imbalanced attack data when training the detection model, which make the system vulnerable especially for those unknown type attacks. In this study, a novel hierarchical adversarial attack (HAA) generation method is introduced to realize the level-aware black-box adversarial attack strategy, targeting the graph neural network (GNN)-based intrusion detection in IoT systems with a limited budget. By constructing a shadow GNN model, an intelligent mechanism based on a saliency map technique is designed to generate adversarial examples by effectively identifying and modifying the critical feature elements with minimal perturbations. A hierarchical node selection algorithm based on random walk with restart (RWR) is developed to select a set of more vulnerable nodes with high attack priority, considering their structural features, and overall loss changes within the targeted IoT network. The proposed HAA generation method is evaluated using the open-source data set UNSW-SOSR2019 with three baseline methods. Comparison results demonstrate its ability in degrading the classification precision by more than 30% in the two state-of-the-art GNN models, GCN and JK-Net, respectively, for NIDS in IoT environments.

62 citations