scispace - formally typeset
Search or ask a question

Showing papers by "Saibal Mukhopadhyay published in 2020"


Journal ArticleDOI
TL;DR: The proposed D LDO improves SCA resistance using control-loop-induced perturbations in a nominal DLDO, enhanced by a random switching noise injector by power-stage control and a randomized reference voltage generator coupled with all-digital clock modulation (ADCM).
Abstract: This article demonstrates enhanced power (P) and electromagnetic (EM) side-channel analysis (SCA) attack resistance of standard (unprotected) 128-bit advanced encryption standard (AES) engines with parallel (P-AES, 128-bit) and serial (S-AES, 8-bit) datapaths and a 128-bit SIMON engine with the bit-serial (1-bit) datapath by an on-die security-aware all-digital low-dropout (DLDO) regulator. The proposed DLDO improves SCA resistance using control-loop-induced perturbations in a nominal DLDO, enhanced by a random switching noise injector (SNI) by power-stage control and a randomized reference voltage (R-VREF) generator coupled with all-digital clock modulation (ADCM). SCA performed on the measured power/EM signatures acquired from a 130-nm CMOS testchip demonstrates up to 25 $\times $ reduction in test vector leakage assessment (TVLA) leakage for P-AES and 3579 $\times $ , 2182 $\times $ , and 500 $\times $ increase in minimum-traces-to-disclose (MTD) 80% of the subkeys for P-AES, S-AES, and SIMON cores, respectively, with respect to correlation power analysis (CPA) and correlation EM analysis (CEMA).

32 citations


Journal ArticleDOI
TL;DR: This article presents a highly integrated design flow that encompasses architecture, circuit, and package to build and simulate heterogeneous 2.5-D integrated chip (IC) designs and performs DSE studies for power delivery scheme and interposer technology to investigate the tradeoffs.
Abstract: A new trend in system-on-chip (SoC) design is chiplet-based IP reuse using 2.5-D integration. Complete electronic systems can be created through the integration of chiplets on an interposer, rather than through a monolithic flow. This approach expands access to a large catalog of off-the-shelf intellectual properties (IPs), allows reuse of them, and enables heterogeneous integration of blocks in different technologies. In this article, we present a highly integrated design flow that encompasses architecture, circuit, and package to build and simulate heterogeneous 2.5-D designs. Our target design is 64-core architecture based on Reduced Instruction Set Computer (RISC)-V processor. We first chipletize each IP by adding logical protocol translators and physical interface modules. We convert a given register transfer level (RTL) for 64-core processor into chiplets, which are enhanced with our centralized network-on-chip. Next, we use our tool to obtain physical layouts, which is subsequently used to synthesize chip-to-chip I/O drivers and these chiplets are placed/routed on a silicon interposer. Our package models are used to calculate power, performance, and area (PPA) and reliability of 2.5-D design. Our design space exploration (DSE) study shows that 2.5-D integration incurs $1.29\times $ power and $2.19\times $ area overheads compared with 2-D counterpart. Moreover, we perform DSE studies for power delivery scheme and interposer technology to investigate the tradeoffs in 2.5-D integrated chip (IC) designs.

22 citations


Journal ArticleDOI
TL;DR: All-digital tuning and dynamic control of feedback compensator in digital low drop out regulators to enhance the transient performance under process and passive variations, aging, and load changes is demonstrated.
Abstract: This article demonstrates all-digital tuning and dynamic control of feedback compensator in digital low drop out regulators to enhance the transient performance under process and passive variations, aging, and load changes. The measured results from a 130-nm CMOS test-chip shows 2.1× improvement in transient performance under process variations and 30% improvement for aging-induced degradations. We demonstrate 55-ns setting time for a 5 to 45 mA load step in 100 ps, with 97.8% peak current efficiency.

20 citations


Journal ArticleDOI
TL;DR: This article presents anomaly detection by examining sensor stream statistics (AEGIS), a novel mixed-signal framework for real-time AEGIS that utilizes kernel density estimation (KDE)-based nonparametric density estimation to generate a real- time statistical model of the sensor data stream.
Abstract: This article presents anomaly detection by examining sensor stream statistics (AEGIS), a novel mixed-signal framework for real-time AEGIS. AEGIS utilizes kernel density estimation (KDE)-based nonparametric density estimation to generate a real-time statistical model of the sensor data stream. The likelihood estimate of the sensor data point can be obtained based on the generated statistical model to detect outliers. We present CMOS Gilbert Gaussian cell-based design to realize Gaussian kernels for KDE. For outlier detection, the decision boundary is defined in terms of kernel standard deviation ( $\sigma _{\mathrm{ Kernel}}$ ) and likelihood threshold ( $P_{\mathrm{ Thres}}$ ). We adopt a sliding window to update the detection model in real time. We use time-series data set provided from Yahoo to benchmark the performance of AEGIS. A ${f}$ 1-score higher than 0.87 is achieved by optimizing parameters such as length of the sliding window and decision thresholds which are programmable in AEGIS. Discussed architecture is designed using 45-nm technology node and our approach on average consumes ~75- $\mu \text{W}$ power at a sampling rate of 2 MHz while using ten recent inlier samples for density estimation.

18 citations


Posted Content
TL;DR: This paper forms the model PhICNet as a convolutional recurrent neural network (RNN) which is end-to-end trainable for spatio-temporal evolution prediction of dynamical systems and learns the source behavior as an internal state of the RNN.
Abstract: Spatio-temporal dynamics of physical processes are generally modeled using partial differential equations (PDEs). Though the core dynamics follows some principles of physics, real-world physical processes are often driven by unknown external sources. In such cases, developing a purely analytical model becomes very difficult and data-driven modeling can be of assistance. In this paper, we present a hybrid framework combining physics-based numerical models with deep learning for source identification and forecasting of spatio-temporal dynamical systems with unobservable time-varying external sources. We formulate our model PhICNet as a convolutional recurrent neural network (RNN) which is end-to-end trainable for spatio-temporal evolution prediction of dynamical systems and learns the source behavior as an internal state of the RNN. Experimental results show that the proposed model can forecast the dynamics for a relatively long time and identify the sources as well.

18 citations


Proceedings ArticleDOI
20 Jul 2020
TL;DR: A genetic algorithm (GA) based training free layer-wise quantization method, named as GAQ, to reduce model complexity of arbitrary DNN architectures and a SRAM based flexible precision all-digital processing-in-memory (PIM) architecture that leverages GAQ to optimally control precision for each DNN layer to enhance efficiency.
Abstract: This paper presents a genetic algorithm (GA) based training free layer-wise quantization method, named as GAQ, to reduce model complexity of arbitrary DNN architectures. The proposed algorithm formulates an optimization problem to determine the quantization level for each DNN layer under the constrain of maximum accuracy degradation and uses genetic algorithm to solve the problem at the inference stage of any pre-trained DNN models. The experimental results on various DNNs for image classification demonstrate 5x to 17x weight compression rate with insignificant (< 2%) accuracy loss, comparable with existing quantization algorithms which typically require multi-pass retraining and handcrafted tuning. To evaluate the computational benefits of GAQ, we present a SRAM based flexible precision all-digital processing-in-memory (PIM) architecture, named as Q-PIM, that leverages GAQ to optimally control precision for each DNN layer to enhance efficiency. The simulation in 28nm CMOS shows potential for significant energy and latency advantage over fixed-precision PIM architectures.

16 citations


Journal ArticleDOI
TL;DR: This work uses task-driven feedback as a reward signal for their reinforcement learning-based multispectral input fusion, which not only improves tracking accuracy but also maximizes modality-specific information as intended by the user.
Abstract: With recent advances in sensor technology, multispectral systems are becoming increasingly attractive for intelligence, surveillance, and reconnaissance applications. Fusing information from multiple imaging modalities is a major task for such systems. Combining feature maps obtained from multiple deep neural network pipelines demonstrates promising performance for object detection and tracking. However, feature fusion using multiple deep networks is computationally intensive and therefore not suitable for resource-constrained IoT edge devices. In this paper, we propose a novel method to fuse the input space to enable processing of multispectral data via a single deep network. We use task-driven feedback as a reward signal for our reinforcement learning-based multispectral input fusion. Proposed approach not only improves tracking accuracy but also maximizes modality-specific information as intended by the user.

14 citations


Journal ArticleDOI
TL;DR: An attention-based feedback for controlling input data that reduces activation maps in early layers of a DNN network which are critical for reducing data movement in real-time AI processing is presented.
Abstract: In state-of-the-art deep neural network (DNN), the layer-wise activation maps leads to significant data movement in hardware accelerators operating on real-time streaming inputs. We explore an architecture-aware algorithmic approach to reduce data movement and the resulting latency and power consumption. This article presents an attention-based feedback for controlling input data, referred to as the activation pruning, that reduces activation maps in early layers of a DNN network which are critical for reducing data movement in real-time AI processing. The proposed approach is demonstrated for coupling RGB and Lidar images to perform real-time perception and local motion planning in autonomous systems. Lidar data is used to determine “Pixels of Interest”( PoI ) in an RGB image depending on their distance from sensor, prune the RGB image to perform object detection only within the PoI , and use the detected objects to perform local motion planning. Experiments on sequences from KITTI dataset shows the activation pruning maintains quality of motion planning while increasing the sparsity of the activation maps. The sparsity-aware computing architectures is considered to leverage activation sparsity for improved performance. The simulation results show that proposed activation pruning algorithm reduces data movement (38.5%), computational load (30.1%), and memory latency (76.3%) in sparsity-aware compute architecture, leading to faster perception and lower energy consumption.

11 citations


Journal ArticleDOI
31 Jul 2020
TL;DR: A processing-in-memory (PIM)-based accelerator is presented in 65-nm CMOS for on-chip learning in spiking neural network using timing-based stochastic spike-timing-dependent plasticity (STDP).
Abstract: A processing-in-memory (PIM)-based accelerator is presented in 65-nm CMOS for on-chip learning in spiking neural network using timing-based stochastic spike-timing-dependent plasticity (STDP). The design uses mixed-signal processing in the 8T-SRAM array for spike accumulation and all-digital computation for neuron dynamics and synaptic weight updates. The 0.39-mm2 and 14.83-mW test chip demonstrates 100K images/second learning rate and 148.3 nJ/image learning energy.

10 citations


Journal ArticleDOI
TL;DR: This article presents a complete EDA flow and design strategies targeting, such active interposer-based 2.5-D ICs, and key contributions include the coanalysis of power, performance, signal and power integrity, and the related co-optimization of chiplets and the active interPOSer.
Abstract: Interposer-based 2.5-D integrated circuits (ICs) enable the chip-level reuse of hard intellectual properties (IPs), also known as chiplets. Such system-level integration shortens the design cycle considerably for large-scale and heterogeneous chips. Besides traditional interposers, which only provide passive elements and routing, active interposers are furthermore comprised of logic components. When implemented carefully using a dedicated electronic design automation (EDA) flow, an active interposer can significantly improve the design quality and flexibility for 2.5-D ICs. In this article, we present a complete EDA flow and design strategies targeting, such active interposer-based 2.5-D ICs. Our key contributions include the coanalysis of power, performance, signal and power integrity, and the related co-optimization of chiplets and the active interposer. Our benchmark is a 64-core RISC-V architecture, organized into multiple chiplets and interconnected by a system-level network-on-chip (NoC). For efficiency, we embed the NoC routers and integrated voltage regulators (IVRs) into the active interposer. Moreover, we integrate security monitors into the interposer-based NoC to protect the system and its shared memories against adversarial traffic. The simple yet powerful benefit of this implementation is to offer security by construction, as it is based on a clear physical separation between critical and trusted components (the system-level NoC) versus commodity components (the chiplets). We contrast our active, secured design to a passive, unsecured design baseline of the same RISC-V benchmark and find that the active design reduces the silicon area by 18.5%, power by 3.2%, and IR drop by 73.7%.

8 citations


Proceedings ArticleDOI
01 Mar 2020
TL;DR: A fully synthesized integrated inductive buck regulator with flexible precision variable frequency feedback loop implemented in 65nm CMOS process using an automated design and GDSII generation flow is presented.
Abstract: This paper presents a fully synthesized integrated inductive buck regulator with flexible precision variable frequency feedback loop implemented in 65nm CMOS process using an automated design and GDSII generation flow. The design demonstrates 0.52V/us output ramp and 200ns response time to 30mA/75ps load transient in a high precision mode with 120MHz switching frequency, and peak efficiency of 79.3% at 0.78V output and 43mA load current.

Proceedings ArticleDOI
19 Jul 2020
TL;DR: The full-chip simulations show that Flex-PIM can increase computing efficiency of training and inference by 32x and 120x, respectively, over desktop GPUs while maintaining high accuracy over a wide-range of DNN models using flexible precision.
Abstract: This paper presents Flex-PIM, a ferroelectric FET (FeFET) based processing-in-memory (PIM) engine for vector-matrix-multiplication (VMM). With FeFET as the basic memory cell, Flex-PIM features low read latency/programming energy, non-volatility and high density. The core of Flex-PIM micro-architecture is an all-digital VMM engine integrated with innovative memory array peripherals to realize dynamically controllable bitwidth and floating point precision. The Flex-PIM architecture is simulated in 28nm CMOS technology and shows multiplication-accumulation (MAC) operations from 32-bit floating point (99 GMACS/W) to 4 bit fixed-point (3.3 TMACS/W). A system level design with specialized instruction set is presented to acclerate training and inference of deep neural networks (DNN) using Flex-PIM. The full-chip simulations show that Flex-PIM can increase computing efficiency of training and inference by 32x and 120x, respectively, over desktop GPUs while maintaining high accuracy over a wide-range of DNN models using flexible precision.

Posted Content
14 Apr 2020
TL;DR: This paper formulate the model PhICNet as a convolutional recurrent neural network which is end-to-end trainable for spatiotemporal evolution prediction of dynamical systems and shows the long-term prediction capability of the model.
Abstract: Spatio-temporal dynamics of physical processes are generally modeled using partial differential equations (PDEs). Though the core dynamics follows some principles of physics, real-world physical processes are often driven by unknown external sources. In such cases, developing a purely analytical model becomes very difficult and data-driven modeling can be of assistance. In this paper, we present a hybrid framework combining physics-based numerical models with deep learning for source identification and forecasting of spatio-temporal dynamical systems with unobservable time-varying external sources. We formulate our model PhICNet as a convolutional recurrent neural network (RNN) which is end-to-end trainable for spatio-temporal evolution prediction of dynamical systems and learns the source behavior as an internal state of the RNN. Experimental results show that the proposed model can forecast the dynamics for a relatively long time and identify the sources as well.

Proceedings ArticleDOI
19 Jul 2020
TL;DR: A Deep Neural Network with Spike Assisted Feature Extraction (SAFE-DNN) to improve robustness of classification under stochastic perturbation of inputs and demonstrates improved noise robustness for multiple DNN architectures without sacrificing accuracy on clean images.
Abstract: We present a Deep Neural Network with Spike Assisted Feature Extraction (SAFE-DNN) to improve robustness of classification under stochastic perturbation of inputs. The proposed network augments a DNN with unsupervised learning of low-level features using spiking neural network (SNN) with spike-timing-dependent plasticity (STDP). The complete network learns to ignore local perturbation while performing global feature detection and classification. The experimental results on CIFAR-10 and ImageNet subset demonstrate improved noise robustness for multiple DNN architectures without sacrificing accuracy on clean images.

Proceedings ArticleDOI
02 Nov 2020
TL;DR: A Hessian based sensitivity metric that can be computed without computing or storing the full Hessian to identify and protect the “important” network parameters while allowing large variations in unprotected parameters is proposed.
Abstract: This paper presents an algorithmic approach to design reliable deep neural networks (DNN) in the presence of stochastic variations in the network parameters induced by process variations in the bit-cells in a processing-in-memory (PIM) architecture. We propose and derive a Hessian based sensitivity metric that can be computed without computing or storing the full Hessian to identify and protect the "important" network parameters while allowing large variations in unprotected parameters. Experiments on modern DNNs like ResNet, MobileNetv2, DenseNet on CIFAR10 demonstrates that by shielding only a small (1% -- 5%) fraction of parameters one can achieve less than 1% accuracy degradation even under large (50%) stochastic variations in other parameters.

Journal ArticleDOI
TL;DR: This article presents a comprehensive design and test techniques for emerging M3D-enabled circuits and systems for higher transistor density and more flexibility in designing circuits compared to conventional through silicon via (TSV)-based architectures.
Abstract: Monolithic 3-D (M3D) technology enables unprecedented degrees of integration on a single chip. The miniscule monolithic intertier vias (MIVs) in M3D are the key behind higher transistor density and more flexibility in designing circuits compared to conventional through silicon via (TSV)-based architectures. This article presents a comprehensive design and test techniques for emerging M3D-enabled circuits and systems.

Proceedings ArticleDOI
02 Nov 2020
TL;DR: This paper proposes RTL-to-GDS design flow for monolithic 3D ICs (M3D) built with carbon nanotube field-effect transistors and resistive memory and provides a post-route optimization flow, which exploits the full potential of the underlying M3D process design kit (PDK) for power, performance and area (PPA) optimization.
Abstract: In this paper, we propose RTL-to-GDS design flow for monolithic 3D ICs (M3D) built with carbon nanotube field-effect transistors and resistive memory. Our tool flow is based on commercial 2D tools and smart ways to extend them to conduct M3D design and simulation. We provide a post-route optimization flow, which exploits the full potential of the underlying M3D process design kit (PDK) for power, performance and area (PPA) optimization. We also conduct IR-drop and thermal analysis on M3D designs to improve the reliability. To enhance the testability of our M3D designs, we develop design-for-test (DFT) methodologies and integrate a low-overhead built-in self-test module into our design for testing inter-layer vias (ILVs) as well as logic circuitries in the individual tiers. Our benchmark design is RISC-V Rocketcore, which is an open source processor. Our experiments show 8.1% of power, 19.6% of wirelength and 55.7% of area savings with M3D designs at iso-performance compared to its 2D counterpart. In addition, our IR-drop and thermal analyses indicate acceptable power and thermal integrity in our M3D design.

Journal ArticleDOI
TL;DR: The telomere length was shorter among young, non-diabetic,non-smoker MI patients as compared with similar young controls without MI in a South Asian cohort, and may be a potential screening tool for young patients who don't have conventional risk factors.
Abstract: BACKGROUND There is need to identify novel markers that lead to an early occurrence of myocardial infarction (MI) in young South Asian population. This population has different risk profile as compared with others. Telomere length is known to be a marker of aging, and shorter telomeres have been reported in cardiovascular diseases (CVDs). We aimed to identify the association of telomere length in young nonsmokers and non-diabetic MI patients. METHODS In a case-control study of 154 subjects (n = 77 cases (ages 18-45 years, non-diabetic, non-smoker patients with MI) and n = 77, age and sex matched healthy controls), DNA extraction from peripheral blood leukocytes was carried out and the relative telomere length was estimated by quantitative PCR. The results were adjusted with various demographic parameters like age, gender and body mass index (BMI). The correlation studies were carried out between telomere length, sex and type of MI. RESULTS The relative telomere length was significantly shorter in young MI patients (31-45 years) compared with matched healthy controls (p < 0.0001). Interestingly, in a gender-based comparison, the female patients had shorter telomere length (p < 0.01). CONCLUSION In this pilot study, we found that the telomere length was shorter among young, non-diabetic, non-smoker MI patients as compared with similar young controls without MI in a South Asian cohort. Thus, telomere length may be a potential screening tool for young patients who don't have conventional risk factors. Larger studies are needed to confirm these findings.

Journal ArticleDOI
TL;DR: This paper focuses on the image noise derived from device mismatches in digital pixel circuits with 3D integrated and pixel-parallel readout circuits, and studies the effect of the resulting image noise on the accuracy of a DNN.
Abstract: Digital pixel based image sensors with embedded deep neural network (DNN) allow many mission critical surveillance applications. However, image noise caused by variations and non-idealities in the sensor aggravates the quality of image and further degrades the performance of a DNN. We propose a digital pixel-DNN cross-layer simulation methodology for accurate training and evaluation of a DNN under image noise induced from sensors. In particular, this paper focuses on the image noise derived from device mismatches in digital pixel circuits with 3D integrated and pixel-parallel readout circuits, and studies the effect of the resulting image noise on the accuracy of a DNN. The simulation results show that the device mismatch in the digital pixel creates distinct noise structure on output image and should be accurately considered while training a DNN. We also present design space explorations using our cross-layer simulation.

Proceedings ArticleDOI
20 Jul 2020
TL;DR: Experimental results show that WarningNet can provide early warning of the performance degradation of different tasks within a fraction of the time required for the task to complete, and can be leveraged to improve the task reliability under adverse condition using on-demand input pre-processing.
Abstract: There is a growing interest in deploying complex deep neural networks (DNN) in autonomous systems to extract task-specific information from real-time sensor data and drive critical tasks. The perturbations in sensor data due to noise or environmental conditions can lead to errors in information extraction and degrade reliability of the entire autonomous systems. This paper presents a light-weight deep learning plat-form, WarningNet, that operates on sensor data to estimate potential task failures due to spatiotemporal input perturbations. Experimental results show that WarningNet can provide early warning of the performance degradation of different tasks within a fraction of the time required for the task to complete. As a case-study, we show that the early warning can be leveraged to improve the task reliability under adverse condition using on-demand input pre-processing.

Journal ArticleDOI
TL;DR: It is demonstrated that automated I/O library cell generation can reduce the maximum die-to-die communication delay or energy.
Abstract: System-in-package (SiP) integration of multiple dies in a single package can achieve much higher performance than onboard integration of integrated circuits (ICs) while reducing the design cost/effort compared to a large system on chips (SoCs). However, a major challenge in the design of SiPs with many dies is automated design and insertion of input/output (I/O) cells to minimize energy and delay of the wire traces. This article presents an automated cell library generation flow for all-digital I/O circuits for SiP integration. Given parameterized models of SiP wire traces, our method automatically designs, optimizes, and generates layouts of I/O cells for delay/energy minimization. The proposed flow is demonstrated on interposer-based SiP integration considering 28-nm CMOS technology and 65-nm BEOL technology. Given a multidie SiP design and associated interposer wire traces, this article demonstrates that automated I/O library cell generation can reduce the maximum die-to-die communication delay or energy. We demonstrate the proposed flow for various interposer parameters and SiP designs to show the feasibility of chip-interposer codesign.

Proceedings ArticleDOI
01 Oct 2020
TL;DR: In this article, the authors conduct a quantitative comparison between two 2.5D IC designs based on silicon vs. liquid crystal polymer (LCP) interposer technologies in the overall system level for the first time.
Abstract: The optimal selection of an interposer substrate is important in 2.5D systems, because its physical, material and electrical characteristics govern the overall system performance, reliability and cost. Several materials have been proposed that offer various tradeoffs including silicon, organic, glass and etc. In this paper, we conduct a quantitative comparison between two 2.5D IC designs based on silicon vs. liquid crystal polymer (LCP) interposer technologies in the overall system level for the first time. We also investigate tradeoffs in power, performance and area (PPA), signal integrity (SI) and power integrity (PI) depending on the interposer technologies. Through our flow, we generate a large-scale benchmark architecture with commercial-grade GDS layouts of interposer and chiplets using two different interposer substrates. Then, we model transmission lines and power delivery network (PDN) of each 2.5D IC design. Finally, we perform PPA analysis, SI and PI on both 2.5D IC designs to observe the quantitative tradeoffs between two designs. Our experiment shows that silicon interposer-based design has 10.46% less power, 0.25× smaller area and 0.57× shorter average wirelength compared to LCP interposer-based design. However, LCP-based design has 0.59× smaller PDN DC impedance and 0.75× shorter worst delay of interposer wire while maintaining the power delivery efficiency. Lastly, our cost analysis of 2.5D IC design indicates that the overall cost of organic LCP technology, if both the chiplets and their interposer costs are combined, is 2.69× higher than the silicon even the cost of LCP interposer is 1.91% of silicon interposer. This indicates that LCP technology is prohibitive unless the interconnect and bump dimensions are dramatically reduced.

Proceedings ArticleDOI
25 Oct 2020
TL;DR: The uncertainty of a DNN in a closed-loop ROI-based imaging system using model and data uncertainty techniques is characterized and a feedback system that uses uncertainty as a decision metric for selecting the ROIs for feedback is proposed.
Abstract: Measuring reliability of a task controller in an active sensor system is an active research problem particularly for black box controllers such as a Deep Neural Network (DNN). In this paper, we characterize the uncertainty of a DNN in a closed-loop ROI-based imaging system using model and data uncertainty techniques. The uncertainty of the DNN system in different feedback configurations is compared. Additionally, a feedback system that uses uncertainty as a decision metric for selecting the ROIs for feedback is proposed. The hybrid system improves object detection and tracking accuracy on the CAMEL dataset by 1.8% and 5.7% respectively.

Proceedings ArticleDOI
01 Aug 2020
TL;DR: A digital pixel-DNN cross-layer simulation methodology is proposed for accurate training and evaluation of DNN under noise induced from process variations and results show that the process variation in the digital pixel creates distinct noise structure and should be accurately considered while training a DNN.
Abstract: The digital pixel based image sensors with 3D integrated and pixel-parallel read-out-integrated-circuits (ROIC) show potential for high resolution and high frame rate in many mission critical surveillance applications. However, fixed pattern noise (FPN) caused by process variations of ROIC aggravates the quality of image and further degrades the performance of deep neural network (DNN). This paper studies the effect of process variations in digital pixel circuits and resulting image noise on the accuracy of a DNN. We propose a digital pixel-DNN cross-layer simulation methodology for accurate training and evaluation of DNN under noise induced from process variations. The simulation results show that the process variation in the digital pixel creates distinct noise structure and should be accurately considered while training a DNN. We also present design space explorations using our cross-layer simulation.

Proceedings ArticleDOI
10 Aug 2020
TL;DR: A robust detector is proposed which is capable of detecting exploits aimed at undermining resource allocation fairness through malicious use of the DVFS framework, called BiasP exploit, which restricts the allocation of CPU resources to a set of targeted applications, thereby degrading their performance.
Abstract: Dynamic Voltage and Frequency Scaling (DVFS) plays an integral role in reducing the energy consumption of mobile devices, meeting the targeted performance requirements at the same time. We examine the security obliviousness of CPUFreq, the DVFS framework in Linux-kernel based systems. Since Linux-kernel based operating systems are present in a wide array of applications, the high-level CPUFreq policies are designed to be platform-independent. Using these policies, we present BiasP exploit, which restricts the allocation of CPU resources to a set of targeted applications, thereby degrading their performance. The exploit involves detecting the execution of instructions on the CPU core pertinent to the targeted applications, thereafter using CPUFreq policies to limit the available CPU resources available to those instructions. We demonstrate the practicality of the exploit by operating it on a commercial smartphone, running Android OS based on Linux-kernel. We can successfully degrade the User Interface (UI) performance of the targeted applications by increasing the frame processing time and the number of dropped frames by up to 200% and 947% for the animations belonging to the targeted-applications. We see a reduction of up to 66% in the number of retired instructions of the targeted-applications. Furthermore, we propose a robust detector which is capable of detecting exploits aimed at undermining resource allocation fairness through malicious use of the DVFS framework.

Proceedings ArticleDOI
24 Jan 2020
TL;DR: The MagNet is trained to discover the core dynamics of a multi-agent system from observations, and tuned on-line to learn agent-specific parameters of the dynamics to ensure accurate prediction even when physical or relational attributes of agents, or number of agents change.
Abstract: We present the MagNet, a neural network-based multi-agent interaction model to discover the governing dynamics and predict evolution of a complex multi-agent system from observations. We formulate a multi-agent system as a coupled non-linear network with a generic ordinary differential equation (ODE) based state evolution, and develop a neural network-based realization of its time-discretized model. MagNet is trained to discover the core dynamics of a multi-agent system from observations, and tuned on-line to learn agent-specific parameters of the dynamics to ensure accurate prediction even when physical or relational attributes of agents, or number of agents change. We evaluate MagNet on a point-mass system in two-dimensional space, Kuramoto phase synchronization dynamics and predator-swarm interaction dynamics demonstrating orders of magnitude improvement in prediction accuracy over traditional deep learning models.

Proceedings ArticleDOI
01 Apr 2020
TL;DR: The measurement results indicate upto 25% improvement in response time for aging induced degradations using an auto-tuning algorithm, demonstrating that an IVR exhibits a higher tolerance to power stage aging compared to DLDO.
Abstract: This paper analyzes degradation of transient performance of on-chip voltage regulators, namely, a digital low dropout regulator (DLDO) and an integrated inductive voltage regulator (IVR), due to negative-bias-temperature-instability (NBTI) induced aging degradations Improvement of transient response for the aged systems using post silicon tuning is also explored The measurement results from 130nm and 65nm CMOS test-chips demonstrate that an IVR exhibits a higher tolerance to power stage aging compared to DLDO Furthermore, the measurement results indicate upto 25% improvement in response time for aging induced degradations using an auto-tuning algorithm

Journal ArticleDOI
14 Sep 2020
TL;DR: An all-digital flexible precision in-memory accelerator for vector matrix multiplication (VMM) is demonstrated in 65 nm CMOS supporting flexible precision, floating point, and complex numbers enabling in- memory radio-frequency machine learning and signal processing computation.
Abstract: An all-digital flexible precision in-memory accelerator for vector matrix multiplication (VMM) is demonstrated in 65 nm CMOS. The design supports flexible precision, floating point, and complex numbers enabling in-memory radio-frequency machine learning and signal processing computation. The measured compute efficiency normalized to memory size is 34 GOPS/W/KB.


Journal ArticleDOI
TL;DR: The LV dysfunction is predominantly because of altered hemodynamics due to restricted LV filling with additional contribution from rheumatic involvement of basal LV myocardial segments with improvement in LV GLS after percutaneous balloon mitral valvuloplasty likely due to increase in preload.
Abstract: Seventy-five patients with isolated severe MS (mitral valve area: 1.10 ± 0.15 cm2) and pulmonary hypertension underwent regional and global longitudinal strain (GLS) measurements of left (LV) and right ventricle (RV) at baseline and within 48 h after percutaneous balloon mitral valvuloplasty (PBMV). PBMV resulted in significant improvement in LV GLS (−16.35 ± 1.67% vs −19.98 ± 2.17%) and RV GLS (−10.34 ± 2.38% vs −13.83 ± 2.04%), p