Top 19 papers published by Shahin Nazarian from University of Southern California in 2020

Proceedings Article•DOI•

VRoC: Variational Autoencoder-aided Multi-task Rumor Classifier Based on Text

[...]

Mingxi Cheng¹, Shahin Nazarian¹, Paul Bogdan¹•Institutions (1)

20 Apr 2020

TL;DR: The proposed VRoC, a tweet-level variational autoencoder-based rumor classification system, consistently outperforms several state-of-the-art techniques, on both observed and unobserved rumors, by up to 26.9%, in terms of macro-F1 scores.

...read moreread less

Abstract: Social media became popular and percolated almost all aspects of our daily lives. While online posting proves very convenient for individual users, it also fosters fast-spreading of various rumors. The rapid and wide percolation of rumors can cause persistent adverse or detrimental impacts. Therefore, researchers invest great efforts on reducing the negative impacts of rumors. Towards this end, the rumor classification system aims to to detect, track, and verify rumors in social media. Such systems typically include four components: (i) a rumor detector, (ii) a rumor tracker, (iii) a stance classifier, and (iv) a veracity classifier. In order to improve the state-of-the-art in rumor detection, tracking, and verification, we propose VRoC, a tweet-level variational autoencoder-based rumor classification system. VRoC consists of a co-train engine that trains variational autoencoders (VAEs) and rumor classification components. The co-train engine helps the VAEs to tune their latent representations to be classifier-friendly. We also show that VRoC is able to classify unseen rumors with high levels of accuracy. For the PHEME dataset, VRoC consistently outperforms several state-of-the-art techniques, on both observed and unobserved rumors, by up to 26.9%, in terms of macro-F1 scores.

...read moreread less

42 citations

Journal Article•DOI•

There Is Hope After All: Quantifying Opinion and Trustworthiness in Neural Networks.

[...]

Mingxi Cheng¹, Shahin Nazarian¹, Paul Bogdan¹•Institutions (1)

University of Southern California¹

31 Jul 2020

TL;DR: DeepTrust identifies proper multi-layered neural network (NN) topologies that have high projected trust probabilities, even when trained with untrusted data, and shows that uncertain opinion of data is not always malicious while evaluating NN's opinion and trustworthiness, whereas the disbelief opinion hurts trust the most.

...read moreread less

Abstract: Artificial Intelligence (AI) plays a fundamental role in the modern world, especially when used as an autonomous decision maker. One common concern nowadays is "how trustworthy the AIs are." Human operators follow a strict educational curriculum and performance assessment that could be exploited to quantify how much we entrust them. To quantify the trust of AI decision makers, we must go beyond task accuracy especially when facing limited, incomplete, misleading, controversial or noisy datasets. Toward addressing these challenges, we describe DeepTrust, a Subjective Logic (SL) inspired framework that constructs a probabilistic logic description of an AI algorithm and takes into account the trustworthiness of both dataset and inner algorithmic workings. DeepTrust identifies proper multi-layered neural network (NN) topologies that have high projected trust probabilities, even when trained with untrusted data. We show that uncertain opinion of data is not always malicious while evaluating NN's opinion and trustworthiness, whereas the disbelief opinion hurts trust the most. Also trust probability does not necessarily correlate with accuracy. DeepTrust also provides a projected trust probability of NN's prediction, which is useful when the NN generates an over-confident output under problematic datasets. These findings open new analytical avenues for designing and improving the NN topology by optimizing opinion and trustworthiness, along with accuracy, in a multi-objective optimization formulation, subject to space and time constraints.

...read moreread less

27 citations

Journal Article•DOI•

H₂O-Cloud: A Resource and Quality of Service-Aware Task Scheduling Framework for Warehouse-Scale Data Centers

[...]

Mingxi Cheng¹, Ji Li¹, Paul Bogdan¹, Shahin Nazarian¹•Institutions (1)

University of Southern California¹

01 Oct 2020-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: The proposed H2O-Cloud is highly scalable and considers comprehensive information, such as various workload scenarios, cloud platform configurations, user request information, and dynamic pricing model, to improve resource usage effectiveness while maintaining quality of service (QoS).

...read moreread less

Abstract: Cloud computing has attracted both end-users and cloud service providers (CSPs) in recent years. Improving resource utilization rate (RUtR), such as CPU and memory usages on servers, while maintaining quality of service (QoS) is one key challenge faced by CSPs with warehouse-scale datacenters. Prior works proposed various algorithms to reduce energy cost or to improve RUtR, which either lack the fine-grained task scheduling capabilities, or fail to take a comprehensive system model into consideration. This article presents H2O-Cloud, a Hierarchical and Hybrid Online task scheduling framework for warehouse-scale Cloud service providers, to improve resource usage effectiveness while maintaining QoS. H2O-Cloud is highly scalable and considers comprehensive information, such as various workload scenarios, cloud platform configurations, user request information, and dynamic pricing model. The hierarchy and hybridity of the framework, combined with its deep reinforcement learning (DRL) engines, enable H2O-Cloud to efficiently start on-the-go scheduling and learning in an unpredictable environment without pretraining. Our experiments confirm the high efficiency of the proposed H2O-Cloud when compared to baseline approaches, in terms of energy and cost while maintaining QoS. Compared with a state-of-the-art DRL-based algorithm, H2O-Cloud achieves up to 201.17% energy cost efficiency improvement, 47.88% energy efficiency improvement, and 551.76% reward rate improvement.

...read moreread less

16 citations

Proceedings Article•DOI•

Deep-PowerX: A Deep Learning-Based Framework for Low-Power Approximate Logic Synthesis

[...]

Ghasem Pasandi¹, Mackenzie Peterson¹, Moises Herrera¹, Shahin Nazarian¹, Massoud Pedram¹ - Show less +1 more•Institutions (1)

University of Southern California¹

03 Jul 2020-arXiv: Hardware Architecture

TL;DR: This paper aims at integrating three powerful techniques namely Deep Learning, Approximate Computing, and Low Power Design into a strategy to optimize logic at the synthesis level to reduce the dynamic power and area of a digital CMOS circuit.

...read moreread less

Abstract: This paper aims at integrating three powerful techniques namely Deep Learning, Approximate Computing, and Low Power Design into a strategy to optimize logic at the synthesis level. We utilize advances in deep learning to guide an approximate logic synthesis engine to minimize the dynamic power consumption of a given digital CMOS circuit, subject to a predetermined error rate at the primary outputs. Our framework, Deep-PowerX, focuses on replacing or removing gates on a technology-mapped network and uses a Deep Neural Network (DNN) to predict error rates at primary outputs of the circuit when a specific part of the netlist is approximated. The primary goal of Deep-PowerX is to reduce the dynamic power whereas area reduction serves as a secondary objective. Using the said DNN, Deep-PowerX is able to reduce the exponential time complexity of standard approximate logic synthesis to linear time. Experiments are done on numerous open source benchmark circuits. Results show significant reduction in power and area by up to 1.47 times and 1.43 times compared to exact solutions and by up to 22% and 27% compared to state-of-the-art approximate logic synthesis tools while having orders of magnitudes lower run-time.

...read moreread less

6 citations

Book Chapter•DOI•

SANSCrypt: Sporadic-Authentication-Based Sequential Logic Encryption

[...]

Yinghua Hu¹, Kaixin Yang¹, Shahin Nazarian¹, Pierluigi Nuzzo¹•Institutions (1)

University of Southern California¹

06 Oct 2020

TL;DR: SANSCrypt as mentioned in this paper adopts a new temporal dimension to logic encryption, by requiring the user to sporadically perform multiple authentications according to a protocol based on pseudo-random number generation.

...read moreread less

Abstract: Sequential logic encryption is a countermeasure against reverse engineering of sequential circuits based on modifying the original finite state machine of the circuit such that the circuit enters a wrong state upon being reset. A user must apply a certain sequence of input patterns, i.e., a key sequence, for the circuit to transition to the correct state. The circuit then remains functional unless it is powered off or reset again. Most sequential encryption methods require the correct key to be applied only once. In this paper, we propose a novel Sporadic-Authentication-Based Sequential Logic Encryption method (SANSCrypt) that circumvents the potential vulnerability associated with a single-authentication mechanism. SANSCrypt adopts a new temporal dimension to logic encryption, by requiring the user to sporadically perform multiple authentications according to a protocol based on pseudo-random number generation. We provide implementation details of SANSCrypt and present a design that is amenable to time-sensitive applications. In SANSCrypt, the authentication task does not significantly disrupt the normal circuit operation, as it can be interrupted or postponed upon request from a high-priority task with minimal impact on the overall performance. Analysis and validation results on a set of benchmark circuits show that SANSCrypt offers a substantial output corruptibility if the key sequences are applied incorrectly. Moreover, it exhibits exponential resilience to existing attacks, including SAT-based attacks, while maintaining a reasonably low overhead.

...read moreread less

4 citations

Proceedings Article•DOI•

An Efficient Task Mapping for Manycore Systems

[...]

Xiqian Wang¹, Jiajin Xi¹, Yinghao Wang¹, Paul Bogdan¹, Shahin Nazarian¹ - Show less +1 more•Institutions (1)

University of Southern California¹

01 Oct 2020

TL;DR: WAANSO, a scalable framework that incorporates a Wavelet Clustering based approach to cluster application tasks, is presented and it is shown that WAANSO can significantly increase the MCS energy and performance efficiencies.

...read moreread less

Abstract: System-on-chip (SoC) has migrated from single core to manycore architectures to cope with the increasing complexity of real-life applications. Application task mapping has a significant impact on the efficiency of manycore system (MCS) computation and communication. We present WAANSO, a scalable framework that incorporates a Wavelet Clustering based approach to cluster application tasks. We also introduce Ant Swarm Optimization (ASO) based on iterative execution of Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO) for task clustering and mapping to the MCS processing elements. We have shown that WAANSO can significantly increase the MCS energy and performance efficiencies. Based on our experiments on a 64-core system, WAANSO improves energy efficiency by 19%, compared to baseline approaches, namely DPSO, ACO and branch and bound (B&B). Additionally, the performance improves by 65.86% compared to Density-Based Spatial Clustering of Applications with Noise (DBSCAN) baseline.

...read moreread less

3 citations

Proceedings Article•DOI•

Deep-PowerX: a deep learning-based framework for low-power approximate logic synthesis

[...]

Ghasem Pasandi¹, Mackenzie Peterson¹, Moises Herrera¹, Shahin Nazarian¹, Massoud Pedram¹ - Show less +1 more•Institutions (1)

University of Southern California¹

10 Aug 2020

TL;DR: Deep-PowerX1 as discussed by the authors uses a Deep Neural Network (DNN) to predict error rates at primary outputs of the circuit when a specific part of the netlist is approximated.

...read moreread less

Abstract: This paper aims at integrating three powerful techniques namely Deep Learning, Approximate Computing, and Low Power Design into a strategy to optimize logic at the synthesis level. We utilize advances in deep learning to guide an approximate logic synthesis engine to minimize the dynamic power consumption of a given digital CMOS circuit, subject to a predetermined error rate at the primary outputs. Our framework, Deep-PowerX1, focuses on replacing or removing gates on a technology-mapped network and uses a Deep Neural Network (DNN) to predict error rates at primary outputs of the circuit when a specific part of the netlist is approximated. The primary goal of Deep-PowerX is to reduce the dynamic power whereas area reduction serves as a secondary objective. Using the said DNN, Deep-PowerX is able to reduce the exponential time complexity of standard approximate logic synthesis to linear time. Experiments are done on numerous open source benchmark circuits. Results show significant reduction in power and area by up to 1.47× and 1.43× compared to exact solutions and by up to 22% and 27% compared to state-of-the-art approximate logic synthesis tools while having orders of magnitudes lower run-time.

...read moreread less

3 citations

Proceedings Article•DOI•

S 4 oC: A Self-Optimizing, Self-Adapting Secure System-on-Chip Design Framework to Tackle Unknown Threats — A Network Theoretic, Learning Approach

[...]

Shahin Nazarian¹, Paul Bogdan¹•Institutions (1)

University of Southern California¹

12 Oct 2020

TL;DR: In this article, the authors propose a framework for the design and optimization of a secure self-optimizing, self-adapting system-on-chip (S4oC) architecture.

...read moreread less

Abstract: We propose a framework for the design and optimization of a secure self-optimizing, self-adapting system-on-chip (S4oC) architecture. The goal is to minimize the impact of attacks such as hardware Trojan and side-channel, by making real-time adjustments. S4oC learns to reconfigure itself, subject to various security measures and attacks, some of which possibly unknown at design time. Furthermore, the data types and patterns of the target applications, environmental conditions, and sources of variations are incorporated. S4oC is a manycore system, modeled as a four-layer graph, representing the model of computation (MoCp), model of connection (MoCn), model of memory (MoM) and model of storage (MoS), with a large number of elements including heterogeneous reconfigurable processing elements in MoCp, and memory elements in the MoM layer. Security driven community detection, and neural networks are utilized for application task clustering, and distributed reinforcement learning (RL) for task mapping.

...read moreread less

3 citations

Posted Content•

S4oC: A Self-optimizing, Self-adapting Secure System-on-Chip Design Framework to Tackle Unknown Threats -- A Network Theoretic, Learning Approach

[...]

Shahin Nazarian¹, Paul Bogdan¹•Institutions (1)

University of Southern California¹

05 Apr 2020-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This work proposes a framework for the design and optimization of a secure self-optimizing, self-adapting system-on-chip (S4oC) architecture, to minimize the impact of attacks such as hardware Trojan and side-channel, by making real-time adjustments.

...read moreread less

Abstract: We propose a framework for the design and optimization of a secure self-optimizing, self-adapting system-on-chip (S4oC) architecture. The goal is to minimize the impact of attacks such as hardware Trojan and side-channel, by making real-time adjustments. S4oC learns to reconfigure itself, subject to various security measures and attacks, some of which possibly unknown at design time. Furthermore, the data types and patterns of the target applications, environmental conditions, and sources of variations are incorporated. S4oC is a manycore system, modeled as a four-layer graph, representing the model of computation (MoCp), model of connection (MoCn), model of memory (MoM) and model of storage (MoS), with a large number of elements including heterogeneous reconfigurable processing elements in MoCp, and memory elements in the MoM layer. Security driven community detection, and neural networks are utilized for application task clustering, and distributed reinforcement learning (RL) for task mapping.

...read moreread less

3 citations

Proceedings Article•DOI•

Efficient Training of Deep Convolutional Neural Networks by Augmentation in Embedding Space

[...]

Mohammad Saeed Abrishami¹, Amir Erfan Eshratifar¹, David Eigen, Yanzhi Wang², Shahin Nazarian¹, Massoud Pedram¹ - Show less +2 more•Institutions (2)

University of Southern California¹, Northeastern University²

12 Feb 2020

TL;DR: In this article, the authors propose a method that replaces the augmentation in the raw input space with an approximate one that acts purely in the embedding space, which drastically reduces the computation, while the accuracy of models is negligibly compromised.

...read moreread less

Abstract: Recent advances in the field of artificial intelligence have been made possible by deep neural networks. In applications where data are scarce, transfer learning and data augmentation techniques are commonly used to improve the generalization of deep learning models. However, fine-tuning a transfer model with data augmentation in the raw input space has a high computational cost to run the full network for every augmented input. This is particularly critical when large models are implemented on embedded devices with limited computational and energy resources. In this work, we propose a method that replaces the augmentation in the raw input space with an approximate one that acts purely in the embedding space. Our experimental results show that the proposed method drastically reduces the computation, while the accuracy of models is negligibly compromised.

...read moreread less

3 citations

Proceedings Article•DOI•

SANSCrypt: A Sporadic-Authentication-Based Sequential Logic Encryption Scheme

[...]

Yinghua Hu¹, Kaixin Yang¹, Shahin Nazarian¹, Pierluigi Nuzzo¹•Institutions (1)

University of Southern California¹

11 Oct 2020-arXiv: Cryptography and Security

TL;DR: Analysis and validation results show that SANSCrypt offers a substantial output corruptibility if the key sequences are applied incorrectly, and exhibits an exponential resilience to existing attacks, including SAT-based attacks, while maintaining a reasonably low overhead.

...read moreread less

Abstract: We propose SANSCrypt, a novel sequential logic encryption scheme to protect integrated circuits against reverse engineering. Previous sequential encryption methods focus on modifying the circuit state machine such that the correct functionality can be accessed by applying the correct key sequence only once. Considering the risk associated with one-time authentication, SANSCrypt adopts a new temporal dimension to logic encryption, by requiring the user to sporadically perform multiple authentications according to a protocol based on pseudo-random number generation. Analysis and validation results on a set of benchmark circuits show that SANSCrypt offers a substantial output corruptibility if the key sequences are applied incorrectly. Moreover, it exhibits an exponential resilience to existing attacks, including SAT-based attacks, while maintaining a reasonably low overhead.

...read moreread less

Proceedings Article•DOI•

SANSCrypt: A Sporadic-Authentication-Based Sequential Logic Encryption Scheme

[...]

Yinghua Hu¹, Kaixin Yang¹, Shahin Nazarian¹, Pierluigi Nuzzo¹•Institutions (1)

University of Southern California¹

05 Oct 2020

TL;DR: SANSCrypt as mentioned in this paper adopts a new temporal dimension to logic encryption, by requiring the user to sporadically perform multiple authentications according to a protocol based on pseudorandom number generation.

...read moreread less

Abstract: We propose SANSCrypt, a novel sequential logic encryption scheme to protect integrated circuits against reverse engineering. Previous sequential encryption methods focus on modifying the circuit state machine such that the correct functionality can be accessed by applying the correct key sequence only once. Considering the risk associated with one-time authentication, SANSCrypt adopts a new temporal dimension to logic encryption, by requiring the user to sporadically perform multiple authentications according to a protocol based on pseudorandom number generation. Analysis and validation results on a set of benchmark circuits show that SANSCrypt offers a substantial output corruptibility if the key sequences are applied incorrectly. Moreover, it exhibits an exponential resilience to existing attacks, including SAT-based attacks, while maintaining a reasonably low overhead.

...read moreread less

Posted Content•

A Vertex Cut based Framework for Load Balancing and Parallelism Optimization in Multi-core Systems.

[...]

Guixiang Ma, Yao Xiao, Theodore L. Willke, Nesreen K. Ahmed, Shahin Nazarian, Paul Bogdan - Show less +2 more

09 Oct 2020-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: A vertex cut framework for partitioning LLVM IR graphs into clusters while taking into consideration the data communication and workload balance among clusters is designed and a memory-centric run-time mapping of the linear time complexity to map clusters generated from the vertex cut algorithms onto a multi-core platform is proposed.

...read moreread less

Abstract: High-level applications, such as machine learning, are evolving from simple models based on multilayer perceptrons for simple image recognition to much deeper and more complex neural networks for self-driving vehicle control systems.The rapid increase in the consumption of memory and computational resources by these models demands the use of multi-core parallel systems to scale the execution of the complex emerging applications that depend on them. However, parallel programs running on high-performance computers often suffer from data communication bottlenecks, limited memory bandwidth, and synchronization overhead due to irregular critical sections. In this paper, we propose a framework to reduce the data communication and improve the scalability and performance of these applications in multi-core systems. We design a vertex cut framework for partitioning LLVM IR graphs into clusters while taking into consideration the data communication and workload balance among clusters. First, we construct LLVM graphs by compiling high-level programs into LLVM IR, instrumenting code to obtain the execution order of basic blocks and the execution time for each memory operation, and analyze data dependencies in dynamic LLVM traces. Next, we formulate the problem as Weight Balanced $p$-way Vertex Cut, and propose a generic and flexible framework, wherein four different greedy algorithms are proposed for solving this problem. Lastly, we propose a memory-centric run-time mapping of the linear time complexity to map clusters generated from the vertex cut algorithms onto a multi-core platform. We conclude that our best algorithm, WB-Libra, provides performance improvements of 1.56x and 1.86x over existing state-of-the-art approaches for 8 and 1024 clusters running on a multi-core platform, respectively.

...read moreread less

Posted Content•

Logic Verification of Ultra-Deep Pipelined Beyond-CMOS Technologies.

[...]

Arash Fayyazi, Shahin Nazarian, Massoud Pedram¹•Institutions (1)

University of Southern California¹

28 May 2020-arXiv: Emerging Technologies

TL;DR: The Multi-Cycle Input Dependency (MCID) circuit model is presented which is a novel model representation of design to explicitly capture the dependency of primary outputs of the circuit on sequences of internal signals and inputs.

...read moreread less

Abstract: Traditional logical equivalence checking (LEC) which plays a major role in entire chip design process faces challenges of meeting the requirements demanded by the many emerging technologies that are based on logic models different from standard complementary metal oxide semiconductor (CMOS). In this paper, we propose a LEC framework to be employed in the verification process of beyond-CMOS circuits. Our LEC framework is compatible with existing CMOS technologies, but, also able to check features and capabilities that are unique to beyond-CMOS technologies. For instance, the performance of some emerging technologies benefits from ultra-deep pipelining and verification of such circuits requires new models and algorithms. We, therefore, present the Multi-Cycle Input Dependency (MCID) circuit model which is a novel model representation of design to explicitly capture the dependency of primary outputs of the circuit on sequences of internal signals and inputs. Embedding the proposed circuit model and several structural checking modules, the process of verification can be independent of the underlying technology and signaling. We benchmark the proposed framework on post-synthesis rapid single-flux-quantum (RSFQ) netlists. Results show a comparative verification time of RSFQ circuit benchmark including 32-bit Kogge-Stone adder, 16-bit integer divider, and ISCAS'85 circuits with respect to ABC tool for similar CMOS circuits.

...read moreread less

Proceedings Article•DOI•

NN-PARS: A Parallelized Neural Network Based Circuit Simulation Framework

[...]

Mohammad Saeed Abrishami¹, Hao Ge¹, Justin F. Calderon¹, Massoud Pedram¹, Shahin Nazarian¹ - Show less +1 more•Institutions (1)

University of Southern California¹

25 Mar 2020

TL;DR: NN-PARS as discussed by the authors is a neural network-based and parallelized circuit simulation framework with optimized event-driven scheduling of simulation tasks to maximize concurrency, according to the underlying GPU parallel processing capabilities.

...read moreread less

Abstract: The shrinking of transistor geometries as well as the increasing complexity of integrated circuits, significantly aggravate nonlinear design behavior. This demands accurate and fast circuit simulation to meet the design quality and time-to-market constraints. The existing circuit simulators which utilize lookup tables and/or closed-form expressions are either slow or inaccurate in analyzing the nonlinear behavior of designs with billions of transistors. To address these shortcomings, we present NN-PARS, a neural network (NN) based and parallelized circuit simulation framework with optimized event-driven scheduling of simulation tasks to maximize concurrency, according to the underlying GPU parallel processing capabilities. NN-PARS replaces the required memory queries in traditional techniques with parallelized NN-based computation tasks. Experimental results show that compared to a state-of-the-art current-based simulation method, NN-PARS reduces the simulation time by over two orders of magnitude in large circuits. NN-PARS also provides high accuracy levels in signal waveform calculations, with less than 2% error compared to HSPICE.

...read moreread less

Posted Content•

NN-PARS: A Parallelized Neural Network Based Circuit Simulation Framework

[...]

Mohammad Saeed Abrishami¹, Hao Ge¹, Justin F. Calderon¹, Massoud Pedram¹, Shahin Nazarian¹ - Show less +1 more•Institutions (1)

University of Southern California¹

13 Feb 2020-arXiv: Signal Processing

TL;DR: N-PARS is presented, a neural network (NN) based and parallelized circuit simulation framework with optimized event-driven scheduling of simulation tasks to maximize concurrency, according to the underlying GPU parallel processing capabilities.

...read moreread less

Abstract: The shrinking of transistor geometries as well as the increasing complexity of integrated circuits, significantly aggravate nonlinear design behavior. This demands accurate and fast circuit simulation to meet the design quality and time-to-market constraints. The existing circuit simulators which utilize lookup tables and/or closed-form expressions are either slow or inaccurate in analyzing the nonlinear behavior of designs with billions of transistors. To address these shortcomings, we present NN-PARS, a neural network (NN) based and parallelized circuit simulation framework with optimized event-driven scheduling of simulation tasks to maximize concurrency, according to the underlying GPU parallel processing capabilities. NN-PARS replaces the required memory queries in traditional techniques with parallelized NN-based computation tasks. Experimental results show that compared to a state-of-the-art current-based simulation method, NN-PARS reduces the simulation time by over two orders of magnitude in large circuits. NN-PARS also provides high accuracy levels in signal waveform calculations, with less than $2\%$ error compared to HSPICE.

...read moreread less

Posted Content•

Efficient Task Mapping for Manycore Systems.

[...]

Xiqian Wang, Jiajin Xi, Yinghao Wang, Paul Bogdan, Shahin Nazarian - Show less +1 more

05 Apr 2020-arXiv: Signal Processing

TL;DR: WAANSO as discussed by the authors is a scalable framework that incorporates a Wavelet clustering based approach to cluster application tasks and introduces Ant Swarm Optimization (ASO) based on iterative execution of ACO and PSO for task clustering and mapping to MCS processing elements.

...read moreread less

Abstract: System-on-chip (SoC) has migrated from single core to manycore architectures to cope with the increasing complexity of real-life applications. Application task mapping has a significant impact on the efficiency of manycore system (MCS) computation and communication. We present WAANSO, a scalable framework that incorporates a Wavelet Clustering based approach to cluster application tasks. We also introduce Ant Swarm Optimization (ASO) based on iterative execution of Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO) for task clustering and mapping to the MCS processing elements. We have shown that WAANSO can significantly increase the MCS energy and performance efficiencies. Based on our experiments on a 64-core system, WAANSO improves energy efficiency by 19%, compared to baseline approaches, namely DPSO, ACO and branch and bound (B&B). Additionally, the performance improves by 65.86% compared to Density-Based Spatial Clustering of Applications with Noise (DBSCAN) baseline.

...read moreread less

Posted Content•

Efficient Training of Deep Convolutional Neural Networks by Augmentation in Embedding Space

[...]

Mohammad Saeed Abrishami¹, Amir Erfan Eshratifar¹, David Eigen, Yanzhi Wang², Shahin Nazarian¹, Massoud Pedram¹ - Show less +2 more•Institutions (2)

University of Southern California¹, Northeastern University²

12 Feb 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes a method that replaces the augmentation in the raw input space with an approximate one that acts purely in the embedding space, and results show that the proposed method drastically reduces the computation, while the accuracy of models is negligibly compromised.

...read moreread less

Abstract: Recent advances in the field of artificial intelligence have been made possible by deep neural networks. In applications where data are scarce, transfer learning and data augmentation techniques are commonly used to improve the generalization of deep learning models. However, fine-tuning a transfer model with data augmentation in the raw input space has a high computational cost to run the full network for every augmented input. This is particularly critical when large models are implemented on embedded devices with limited computational and energy resources. In this work, we propose a method that replaces the augmentation in the raw input space with an approximate one that acts purely in the embedding space. Our experimental results show that the proposed method drastically reduces the computation, while the accuracy of models is negligibly compromised.

...read moreread less

Posted Content•

CSM-NN: Current Source Model Based Logic Circuit Simulation -- A Neural Network Approach

[...]

Mohammad Saeed Abrishami¹, Massoud Pedram¹, Shahin Nazarian¹•Institutions (1)

University of Southern California¹

13 Feb 2020-arXiv: Learning

TL;DR: CSM-NN as discussed by the authors is a scalable simulation framework with optimized neural network structures and processing algorithms, aimed at optimizing the simulation time by accounting for the latency of the required memory query and computation, given the underlying CPU and GPU parallel processing capabilities.

...read moreread less

Abstract: The miniaturization of transistors down to 5nm and beyond, plus the increasing complexity of integrated circuits, significantly aggravate short channel effects, and demand analysis and optimization of more design corners and modes. Simulators need to model output variables related to circuit timing, power, noise, etc., which exhibit nonlinear behavior. The existing simulation and sign-off tools, based on a combination of closed-form expressions and lookup tables are either inaccurate or slow, when dealing with circuits with more than billions of transistors. In this work, we present CSM-NN, a scalable simulation framework with optimized neural network structures and processing algorithms. CSM-NN is aimed at optimizing the simulation time by accounting for the latency of the required memory query and computation, given the underlying CPU and GPU parallel processing capabilities. Experimental results show that CSM-NN reduces the simulation time by up to $6\times$ compared to a state-of-the-art current source model based simulator running on a CPU. This speedup improves by up to $15\times$ when running on a GPU. CSM-NN also provides high accuracy levels, with less than $2\%$ error, compared to HSPICE.

...read moreread less

Showing papers by "Shahin Nazarian published in 2020"