scispace - formally typeset
Search or ask a question
Topic

Control reconfiguration

About: Control reconfiguration is a research topic. Over the lifetime, 22423 publications have been published within this topic receiving 334217 citations.


Papers
More filters
Journal ArticleDOI
Yunxuan Yu1, Chen Wu1, Tiandong Zhao1, Kun Wang1, Lei He1 
TL;DR: A domain-specific FPGA overlay processor, named OPU, is proposed to accelerate CNN networks, which offers software-like programmability for CNN end users, as CNN algorithms are automatically compiled into executable codes, which are loaded and executed by OPU without reconfiguration of FPGa for switch or update of CNN networks.
Abstract: Field-programmable gate array (FPGA) provides rich parallel computing resources with high energy efficiency, making it ideal for deep convolutional neural network (CNN) acceleration. In recent years, automatic compilers have been developed to generate network-specific FPGA accelerators. However, with more cascading deep CNN algorithms adapted by various complicated tasks, reconfiguration of FPGA devices during runtime becomes unavoidable when network-specific accelerators are employed. Such reconfiguration can be difficult for edge devices. Moreover, network-specific accelerator means regeneration of RTL code and physical implementation whenever the network is updated. This is not easy for CNN end users. In this article, we propose a domain-specific FPGA overlay processor, named OPU to accelerate CNN networks. It offers software-like programmability for CNN end users, as CNN algorithms are automatically compiled into executable codes, which are loaded and executed by OPU without reconfiguration of FPGA for switch or update of CNN networks. Our OPU instructions have complicated functions with variable runtimes but a uniform length. The granularity of instruction is optimized to provide good performance and sufficient flexibility, while reducing complexity to develop microarchitecture and compiler. Experiments show that OPU can achieve an average of 91% runtime multiplication and accumulation unit (MAC) efficiency (RME) among nine different networks. Moreover, for VGG and YOLO networks, OPU outperforms automatically compiled network-specific accelerators in the literature. In addition, OPU shows $5.35\times $ better power efficiency compared with Titan Xp. For a real-time cascaded CNN networks scenario, OPU is $2.9\times $ faster compared with edge computing GPU Jetson Tx2, which has a similar amount of computing resources.

74 citations

Journal ArticleDOI
01 Jan 1996
TL;DR: The Run-Time Reconfiguration Artificial Neural Network (RRANN), a proof-of-concept system that demonstrates the effectiveness of RTR for implementing neural networks, is tested and shown to increase the functional density of a network up to 500% when compared to FPGA-based implementations that do not use RTR.
Abstract: One way to further exploit the reconfigurable resources of SRAM FPGAs and increase functional density is to reconfigure them during system operation. This proces is referred to as Run-Time Reconfiguration (RTR). RTR is an approach to system implementation that divides an application or algorithm into time-exclusive operations that are implemented as separate configurations. The Run-Time Reconfiguration Artificial Neural Network (RRANN) is a proof-of-concept system that demonstrates the effectiveness of RTR for implementing neural networks. It implements the popular backpropagation training algorithm as three distinct time-exclusive FPGA configurations: feed-forward, backpropagation and update. System operation consists of sequencing through these three reconfigurationsat run-time, one configuration at a time. RRANN has been fully implemented with Xilinx FPGAs, tested and shown to increase the functional density of a network up to 500% when compared to FPGA-based implementations that do not use RTR.

74 citations

Journal ArticleDOI
TL;DR: A new interpretation of control signals as a composite of the residual and reference signals is revealed, which leads to the development of two kinds of schemes: extracting residual signals from an existing control loop and configuring control loops with an integrated residual access.
Abstract: Driven by the increasing needs for the integration of model-based fault diagnosis into the electronic control units (ECUs) with limited computation capacity and motivated by the recent study on the fault tolerant controller architecture, we investigate feedback controller structures aiming at accessing the residuals embedded in the control loops. For this purpose, we first develop an observer-based realization of the Youla parameterization. This result reveals a new interpretation of control signals as a composite of the residual and reference signals. From this viewpoint, different control schemes are studied and useful relationships between the controller structures and embedded residual signals are established. It leads to the development of two kinds of schemes: 1) extracting residual signals from an existing control loop and 2) configuring control loops with an integrated residual access. The achieved results are demonstrated by two examples of the feedback control loops in engine management systems.

74 citations

Proceedings ArticleDOI
04 Mar 1998
TL;DR: An architecture-based approach to runtime software reconfiguration is presented, highlighting the role of architectural styles and software connectors in facilitating runtime change.
Abstract: Society's increasing dependence on software-intensive systems is driving the need for dependable, robust, continuously available systems. Runtime system reconfiguration is one aspect of achieving continuous availability. We present an architecture-based approach to runtime software reconfiguration, highlighting the role of architectural styles and software connectors in facilitating runtime change. Finally, we describe the implementation of our tool suite, called ArchStudio, that supports runtime reconfiguration using our architecture-based approach.

73 citations

Proceedings ArticleDOI
17 Nov 2019
TL;DR: This work proposes PruneTrain, a cost-efficient mechanism that gradually reduces the training cost during training by using a structured group-lasso regularization approach that drives the training optimization toward both high accuracy and small weight values.
Abstract: State-of-the-art convolutional neural networks (CNNs) used in vision applications have large models with numerous weights. Training these models is very compute- and memory-resource intensive. Much research has been done on pruning or compressing these models to reduce the cost of inference, but little work has addressed the costs of training. We focus precisely on accelerating training. We propose PruneTrain, a cost-efficient mechanism that gradually reduces the training cost during training. PruneTrain uses a structured group-lasso regularization approach that drives the training optimization toward both high accuracy and small weight values. Small weights can then be periodically removed by reconfiguring the network model to a smaller one. By using a structured-pruning approach and additional reconfiguration techniques we introduce, the pruned model can still be efficiently processed on a GPU accelerator. Overall, PruneTrain achieves a reduction of 39% in the end-to-end training time of ResNet50 for ImageNet by reducing computation cost by 40% in FLOPs, memory accesses by 37% for memory bandwidth bound layers, and the inter-accelerator communication by 55%.

73 citations


Network Information
Related Topics (5)
Control theory
299.6K papers, 3.1M citations
85% related
Software
130.5K papers, 2M citations
85% related
Wireless sensor network
142K papers, 2.4M citations
84% related
Network packet
159.7K papers, 2.2M citations
83% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023784
20221,765
2021778
2020958
2019976
20181,060