scispace - formally typeset
Search or ask a question
Author

Andreas Gerstlauer

Bio: Andreas Gerstlauer is an academic researcher from University of Texas at Austin. The author has contributed to research in topics: Design space exploration & SpecC. The author has an hindex of 29, co-authored 176 publications receiving 3093 citations. Previous affiliations of Andreas Gerstlauer include University of California, Irvine & University of California.


Papers
More filters
Journal ArticleDOI
TL;DR: DeepThings is proposed, a framework for adaptively distributed execution of CNN-based inference applications on tightly resource-constrained IoT edge clusters that employs a scalable Fused Tile Partitioning of convolutional layers to minimize memory footprint while exposing parallelism.
Abstract: Edge computing has emerged as a trend to improve scalability, overhead, and privacy by processing large-scale data, e.g., in deep learning applications locally at the source. In IoT networks, edge devices are characterized by tight resource constraints and often dynamic nature of data sources, where existing approaches for deploying Deep/Convolutional Neural Networks (DNNs/CNNs) can only meet IoT constraints when severely reducing accuracy or using a static distribution that cannot adapt to dynamic IoT environments. In this paper, we propose DeepThings, a framework for adaptively distributed execution of CNN-based inference applications on tightly resource-constrained IoT edge clusters. DeepThings employs a scalable Fused Tile Partitioning (FTP) of convolutional layers to minimize memory footprint while exposing parallelism. It further realizes a distributed work stealing approach to enable dynamic workload distribution and balancing at inference runtime. Finally, we employ a novel work scheduling process to improve data reuse and reduce overall execution latency. Results show that our proposed FTP method can reduce memory footprint by more than 68% without sacrificing accuracy. Furthermore, compared to existing work sharing methods, our distributed work stealing and work scheduling improve throughput by $1.7\times -2.2\times$ with multiple dynamic data sources. When combined, DeepThings provides scalable CNN inference speedups of $1.7\times$ – $3.5\times$ on 2–6 edge devices with less than 23 MB memory each.

323 citations

Proceedings ArticleDOI
03 Mar 2003
TL;DR: This paper proposes a RTOS model built on top of existing SLDLs which, by providing the key features typically available in any RTOS, allows the designer to model the dynamic behavior of multi-tasking systems at higher abstraction levels to be incorporated into existing design flows.
Abstract: System level synthesis is widely seen as the solution for closing the productivity gap in system design. High level system models are used in system level design for early design exploration. While real time operating systems (RTOS) are an increasingly important component in system design, specific RTOS implementations can not be used directly in high level models. On the other hand, existing system level design languages (SLDL) lack support for RTOS modeling. In this paper we propose a RTOS model built on top of existing SLDLs which, by providing the key features typically available in any RTOS, allows the designer to model the dynamic behavior of multi-tasking systems at higher abstraction levels to be incorporated into existing design flows. Experimental result shows that our RTOS model is easy to use and efficient while being able to provide accurate results.

175 citations

Journal ArticleDOI
TL;DR: This paper develops and proposes a novel classification for ESL synthesis tools, and presents six different academic approaches in this context based on common principles and needs that are ultimately required for a true ESL synthesis solution.
Abstract: With ever-increasing system complexities, all major semiconductor roadmaps have identified the need for moving to higher levels of abstraction in order to increase productivity in electronic system design. Most recently, many approaches and tools that claim to realize and support a design process at the so-called electronic system level (ESL) have emerged. However, faced with the vast complexity challenges, in most cases at best, only partial solutions are available. In this paper, we develop and propose a novel classification for ESL synthesis tools, and we will present six different academic approaches in this context. Based on these observations, we can identify such common principles and needs as they are leading toward and are ultimately required for a true ESL synthesis solution, covering the whole design process from specification to implementation for complete systems across hardware and software boundaries.

174 citations

Journal ArticleDOI
TL;DR: This article presents a comprehensive design framework, the system-on-chip environment (SCE), which is based on the influential SpecC language and methodology and enables rapid design space exploration and efficient MPSoC implementation.
Abstract: The constantly growing complexity of embedded systems is a challenge that drives the development of novel design automation techniques. C-based system-level design addresses the complexity challenge by raising the level of abstraction and integrating the design processes for the heterogeneous system components. In this article, we present a comprehensive design framework, the system-on-chip environment (SCE) which is based on the influential SpecC language and methodology. SCE implements a top-down system design flow based on a specify-explore-refine paradigm with support for heterogeneous target platforms consisting of custom hardware components, embedded software processors, dedicated IP blocks, and complex communication bus architectures. Starting from an abstract specification of the desired system, models at various levels of abstraction are automatically generated through successive step-wise refinement, resulting in a pin-and cycle-accurate system implementation. The seamless integration of automatic model generation, estimation, and verification tools enables rapid design space exploration and efficient MPSoC implementation. Using a large set of industrial-strength examples with a wide range of target architectures, our experimental results demonstrate the effectiveness of our framework and show significant productivity gains in design time.

124 citations

Book
31 May 2001
TL;DR: System Design: A Practical Guide with SpecC will benefit designers and design managers of complex SOCs, or embedded systems in general, by allowing them to develop new methodologies from these results, in order to increase design productivity by orders of magnitude.
Abstract: From the Publisher: "System Design: A Practical Guide with SpecC will benefit designers and design managers of complex SOCs, or embedded systems in general, by allowing them to develop new methodologies from these results, in order to increase design productivity by orders of magnitude. Designers at RTL, logical or physical levels, who are interested in moving up to the system-level, will find a comprehensive overview within. The design models in the book define IP models and functions for IP exchange between IP providers and their users. A well-defined methodology like the one presented in this book will help product planning divisions to quickly develop new products or to derive completely new business models, like e-design or product-on-demand. Finally, researchers and students in the area of system design will find an example of a formal, well-structured design flow in this book."--BOOK JACKET.

123 citations


Cited by
More filters
Journal ArticleDOI
12 Jun 2019
TL;DR: A comprehensive survey of the recent research efforts on edge intelligence can be found in this paper, where the authors review the background and motivation for AI running at the network edge and provide an overview of the overarching architectures, frameworks, and emerging key technologies for deep learning model toward training/inference at the edge.
Abstract: With the breakthroughs in deep learning, the recent years have witnessed a booming of artificial intelligence (AI) applications and services, spanning from personal assistant to recommendation systems to video/audio surveillance. More recently, with the proliferation of mobile computing and Internet of Things (IoT), billions of mobile and IoT devices are connected to the Internet, generating zillions bytes of data at the network edge. Driving by this trend, there is an urgent need to push the AI frontiers to the network edge so as to fully unleash the potential of the edge big data. To meet this demand, edge computing, an emerging paradigm that pushes computing tasks and services from the network core to the network edge, has been widely recognized as a promising solution. The resulted new interdiscipline, edge AI or edge intelligence (EI), is beginning to receive a tremendous amount of interest. However, research on EI is still in its infancy stage, and a dedicated venue for exchanging the recent advances of EI is highly desired by both the computer system and AI communities. To this end, we conduct a comprehensive survey of the recent research efforts on EI. Specifically, we first review the background and motivation for AI running at the network edge. We then provide an overview of the overarching architectures, frameworks, and emerging key technologies for deep learning model toward training/inference at the network edge. Finally, we discuss future research opportunities on EI. We believe that this survey will elicit escalating attentions, stimulate fruitful discussions, and inspire further research ideas on EI.

977 citations

Proceedings ArticleDOI
27 May 2013
TL;DR: This paper reviews recent progress in the area, including design of approximate arithmetic blocks, pertinent error and quality measures, and algorithm-level techniques for approximate computing.
Abstract: Approximate computing has recently emerged as a promising approach to energy-efficient design of digital systems. Approximate computing relies on the ability of many systems and applications to tolerate some loss of quality or optimality in the computed result. By relaxing the need for fully precise or completely deterministic operations, approximate computing techniques allow substantially improved energy efficiency. This paper reviews recent progress in the area, including design of approximate arithmetic blocks, pertinent error and quality measures, and algorithm-level techniques for approximate computing.

921 citations

Journal ArticleDOI
15 Jul 2019
TL;DR: This paper will provide an overview of applications where deep learning is used at the network edge, discuss various approaches for quickly executing deep learning inference across a combination of end devices, edge servers, and the cloud, and describe the methods for training deep learning models across multiple edge devices.
Abstract: Deep learning is currently widely used in a variety of applications, including computer vision and natural language processing. End devices, such as smartphones and Internet-of-Things sensors, are generating data that need to be analyzed in real time using deep learning or used to train deep learning models. However, deep learning inference and training require substantial computation resources to run quickly. Edge computing, where a fine mesh of compute nodes are placed close to end devices, is a viable way to meet the high computation and low-latency requirements of deep learning on edge devices and also provides additional benefits in terms of privacy, bandwidth efficiency, and scalability. This paper aims to provide a comprehensive review of the current state of the art at the intersection of deep learning and edge computing. Specifically, it will provide an overview of applications where deep learning is used at the network edge, discuss various approaches for quickly executing deep learning inference across a combination of end devices, edge servers, and the cloud, and describe the methods for training deep learning models across multiple edge devices. It will also discuss open challenges in terms of systems performance, network technologies and management, benchmarks, and privacy. The reader will take away the following concepts from this paper: understanding scenarios where deep learning at the network edge can be useful, understanding common techniques for speeding up deep learning inference and performing distributed training on edge devices, and understanding recent trends and opportunities.

793 citations