scispace - formally typeset
Search or ask a question
Author

Massoud Pedram

Bio: Massoud Pedram is an academic researcher from University of Southern California. The author has contributed to research in topics: Deep learning & Logic gate. The author has an hindex of 14, co-authored 36 publications receiving 464 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: JointDNN as discussed by the authors proposes an efficient, adaptive, and practical engine, JointDNN, for collaborative computation between a mobile device and cloud for DNNs in both inference and training phase.
Abstract: Deep learning models are being deployed in many mobile intelligent applications. End-side services, such as intelligent personal assistants, autonomous cars, and smart home services often employ either simple local models on the mobile or complex remote models on the cloud. However, recent studies have shown that partitioning the DNN computations between the mobile and cloud can increase the latency and energy efficiencies. In this paper, we propose an efficient, adaptive, and practical engine, JointDNN, for collaborative computation between a mobile device and cloud for DNNs in both inference and training phase. JointDNN not only provides an energy and performance efficient method of querying DNNs for the mobile side but also benefits the cloud server by reducing the amount of its workload and communications compared to the cloud-only approach. Given the DNN architecture, we investigate the efficiency of processing some layers on the mobile device and some layers on the cloud server. We provide optimization formulations at layer granularity for forward- and backward-propagations in DNNs, which can adapt to mobile battery limitations and cloud server load constraints and quality of service. JointDNN achieves up to 18 and 32 times reductions on the latency and mobile energy consumption of querying DNNs compared to the status-quo approaches, respectively.

162 citations

Proceedings ArticleDOI
29 Jul 2019
TL;DR: BottleNet as mentioned in this paper proposes a training method for compensating for the potential accuracy loss due to the lossy compression of features before transmitting them to the cloud, which achieves on average 5.1× improvement in end-to-end latency and 6.9× energy consumption compared with the cloud-only approach with no accuracy loss.
Abstract: Recent studies have shown the latency and energy consumption of deep neural networks can be significantly improved by splitting the network between the mobile device and cloud. This paper introduces a new deep learning architecture, called BottleNet, for reducing the feature size needed to be sent to the cloud. Furthermore, we propose a training method for compensating for the potential accuracy loss due to the lossy compression of features before transmitting them to the cloud. BottleNet achieves on average 5.1× improvement in end-to-end latency and 6.9× improvement in mobile energy consumption compared to the cloud-only approach with no accuracy loss.

137 citations

Posted Content
TL;DR: This paper proposes an efficient, adaptive, and practical engine, JointDNN, for collaborative computation between a mobile device and cloud for DNNs in both inference and training phase, and achieves up to 18 and 32 times reductions on the latency and mobile energy consumption of querying Dnns compared to the status-quo approaches.
Abstract: Deep learning models are being deployed in many mobile intelligent applications. End-side services, such as intelligent personal assistants, autonomous cars, and smart home services often employ either simple local models on the mobile or complex remote models on the cloud. However, recent studies have shown that partitioning the DNN computations between the mobile and cloud can increase the latency and energy efficiencies. In this paper, we propose an efficient, adaptive, and practical engine, JointDNN, for collaborative computation between a mobile device and cloud for DNNs in both inference and training phase. JointDNN not only provides an energy and performance efficient method of querying DNNs for the mobile side but also benefits the cloud server by reducing the amount of its workload and communications compared to the cloud-only approach. Given the DNN architecture, we investigate the efficiency of processing some layers on the mobile device and some layers on the cloud server. We provide optimization formulations at layer granularity for forward- and backward-propagations in DNNs, which can adapt to mobile battery limitations and cloud server load constraints and quality of service. JointDNN achieves up to 18 and 32 times reductions on the latency and mobile energy consumption of querying DNNs compared to the status-quo approaches, respectively.

91 citations

Journal ArticleDOI
TL;DR: A dynamic programming based algorithm for path balancing technology mapping is presented, which generates optimal solutions for dc-biased SFQ (e.g., rapid SFQ or RSFQ) circuits with tree structure and acts as an effective heuristic for circuits with general directed acyclic graph structure.
Abstract: This paper presents a path balancing technology mapping algorithm, which is a new algorithm for generating a mapping solution for a given Boolean network such that the average logic level difference among fanin gates of each gate in the network is minimized. Path balancing technology mapping is required in dc-biased single flux quantum (SFQ) circuits for guaranteeing the correct operation, and it is beneficial in CMOS circuits to reduce the hazard issues. We present a dynamic programming based algorithm for path balancing technology mapping, which generates optimal solutions for dc-biased SFQ (e.g., rapid SFQ or RSFQ) circuits with tree structure and acts as an effective heuristic for circuits with general directed acyclic graph structure. Experimental results show that our path balancing technology mapper reduces the balancing overhead by up to 2.7 × and with an average of 21% compared to the state-of-the-art academic technology mappers.

43 citations

Proceedings ArticleDOI
27 May 2018
TL;DR: A novel technology mapping tool, called SFQmap, is presented, which provides optimization methods for minimizing first the circuit depth and path balancing overhead and then the worst-case stage delay of mapped SFQ circuits.
Abstract: Single flux quantum (SFQ) logic is a promising candidate to replace the CMOS logic for high speed and low power applications due to its superiority in providing high performance and energy efficient circuits. However, developing effective Electronic Design Automation (EDA) tools, which cater to special characteristics and requirements of SFQ circuits such as depth minimization and path balancing, are essential to automate the whole process of designing large SFQ circuits. In this paper, a novel technology mapping tool, called SFQmap, is presented, which provides optimization methods for minimizing first the circuit depth and path balancing overhead and then the worst-case stage delay of mapped SFQ circuits. Compared with the state-of-the-art technology mappers, SFQmap reduces the depth and path balancing overhead by an average of 14% and 31%, respectively.

37 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, the authors proposed Edgent, a framework that leverages edge computing for DNN collaborative inference through device-edge synergy, which adaptively partitions computation between device and edge for purpose of coordinating the powerful cloud resource and the proximal edge resource for real-time DNN inference.
Abstract: As a key technology of enabling Artificial Intelligence (AI) applications in 5G era, Deep Neural Networks (DNNs) have quickly attracted widespread attention. However, it is challenging to run computation-intensive DNN-based tasks on mobile devices due to the limited computation resources. What’s worse, traditional cloud-assisted DNN inference is heavily hindered by the significant wide-area network latency, leading to poor real-time performance as well as low quality of user experience. To address these challenges, in this paper, we propose Edgent , a framework that leverages edge computing for DNN collaborative inference through device-edge synergy. Edgent exploits two design knobs: (1) DNN partitioning that adaptively partitions computation between device and edge for purpose of coordinating the powerful cloud resource and the proximal edge resource for real-time DNN inference; (2) DNN right-sizing that further reduces computing latency via early exiting inference at an appropriate intermediate DNN layer. In addition, considering the potential network fluctuation in real-world deployment, Edgent is properly design to specialize for both static and dynamic network environment. Specifically, in a static environment where the bandwidth changes slowly, Edgent derives the best configurations with the assist of regression-based prediction models, while in a dynamic environment where the bandwidth varies dramatically, Edgent generates the best execution plan through the online change point detection algorithm that maps the current bandwidth state to the optimal configuration. We implement Edgent prototype based on the Raspberry Pi and the desktop PC and the extensive experimental evaluations demonstrate Edgent ’s effectiveness in enabling on-demand low-latency edge intelligence.

329 citations

Journal ArticleDOI
01 Dec 2019
TL;DR: A conceptual model for cloud futurology is proposed in this article to explore the influence of emerging paradigms and technologies on evolution of cloud computing. But, the model is limited to three technologies: Blockchain, IoT and Artificial Intelligence.
Abstract: Cloud computing plays a critical role in modern society and enables a range of applications from infrastructure to social media. Such system must cope with varying load and evolving usage reflecting societies’ interaction and dependency on automated computing systems whilst satisfying Quality of Service (QoS) guarantees. Enabling these systems are a cohort of conceptual technologies, synthesized to meet demand of evolving computing applications. In order to understand current and future challenges of such system, there is a need to identify key technologies enabling future applications. In this study, we aim to explore how three emerging paradigms (Blockchain, IoT and Artificial Intelligence) will influence future cloud computing systems. Further, we identify several technologies driving these paradigms and invite international experts to discuss the current status and future directions of cloud computing. Finally, we proposed a conceptual model for cloud futurology to explore the influence of emerging paradigms and technologies on evolution of cloud computing.

247 citations

01 Jun 1961
TL;DR: In this article, the Ashenhurst chart method is generalized to non-junctive decompositions by means of the don't care conditions, which leads to designs of more economical switching circuits to realize the given switching function.
Abstract: : A given switching function of n variables can frequently be decomposed into a composite function of several essentially simpler switching functions. Such decompositions lead to designs of more economical switching circuits to realize the given switching function. Ashenhurst's chart method is generalized to nondisjunctive decompositions by means of the don't care conditions. This extension provides an effective method of constructing all decompositions of switching functions. (Author)

227 citations

Journal ArticleDOI
TL;DR: The overall objective of this survey is to give microprocessor designers a broad perspective on various aspects of designing thermal-aware microprocessors and to guide future thermal management studies.
Abstract: Microprocessor design has recently encountered many constraints such as power, energy, reliability, and temperature. Among these challenging issues, temperature-related issues have become especially important within the past several years. We summarize recent thermal management techniques for microprocessors, focusing on those that affect or rely on the microarchitecture. We categorize thermal management techniques into six main categories: temperature monitoring, microarchitectural techniques, floorplanning, OS/compiler techniques, liquid cooling techniques, and thermal reliability/security. Temperature monitoring, a requirement for Dynamic Thermal Management (DTM), includes temperature estimation and sensor placement techniques for accurate temperature measurement or estimation. Microarchitectural techniques include both static and dynamic thermal management techniques that control hardware structures. Floorplanning covers a range of thermal-aware floorplanning techniques for 2D and 3D microprocessors. OS/compiler techniques include thermal-aware task scheduling and instruction scheduling techniques. Liquid cooling techniques are higher-capacity alternatives to conventional air cooling techniques. Thermal reliability/security issues cover temperature-dependent reliability modeling, Dynamic Reliability Management (DRM), and malicious codes that specifically cause overheating. Temperature-related issues will only become more challenging as process technology continues to evolve and transistor densities scale up faster than power per transistor scales down. The overall objective of this survey is to give microprocessor designers a broad perspective on various aspects of designing thermal-aware microprocessors and to guide future thermal management studies.

201 citations

Proceedings ArticleDOI
05 Nov 2007
TL;DR: It is proved that the problem of performance optimization for a set of periodic tasks with discrete voltage/frequency states under thermal constraints is NP-hard, and a pseudo-polynomial optimal algorithm and a fully polynomial time approximation technique (FPTAS) are presented.
Abstract: The paper addresses the problem of performance optimization for a set of periodic tasks with discrete voltage/frequency states under thermal constraints. We prove that the problem is NP-hard, and present a pseudo-polynomial optimal algorithm and a fully polynomial time approximation technique (FPTAS) for the problem. The FPTAS technique is able to generate solutions in polynomial time that are guaranteed to be within a designer specified quality bound (QB) (say within 1% of the optimal). We evaluate our techniques by experimentation with multimedia and synthetic benchmarks mapped on the 70 nm CMOS technology processor. The experimental results demonstrate our techniques are able to match optimal solutions when QB is set at 5%, can generate solutions that arc quite close to optimal ( 25%) for large task sets with 120 nodes (while the optimal solution takes several hundred seconds). We also analyze the effect of different thermal parameters, such as the initial temperature, the final temperature and the thermal resistance.

181 citations