Minimization of WCRT with Recovery Assurance from Hardware Trojans for Tasks on FPGA-based Cloud

doi:10.1145/3409479

Home
/
Papers
/
Minimization of WCRT with Recovery Assurance from Hardware Trojans for Tasks on FPGA-based Cloud

Journal Article•DOI•

Minimization of WCRT with Recovery Assurance from Hardware Trojans for Tasks on FPGA-based Cloud

Debasri Saha¹, Susmita Sur-Kolay²•Institutions (2)

Information Technology University¹, Indian Statistical Institute²

07 Dec 2020-ACM Transactions in Embedded Computing Systems (ACMPUB27New York, NY, USA)-Vol. 20, Iss: 1, pp 1-25

TL;DR: Dynamic partial reconfiguration (DPR) enabled FPGA-based Cloud architecture acts as a flexible and efficient shared environment to facilitates application support to users' request at low cost.

read less

Abstract: Dynamic partial reconfiguration (DPR) enabled FPGA-based Cloud architecture acts as a flexible and efficient shared environment to facilitates application support to users’ request at low cost. While on one hand we need to handle a variety of tasks, such as periodic or sporadic, deadline or non-deadline, high or low critical tasks from the point of producing correct results, on the other hand we are constrained to use untrusted FPGA-based application IP blocks procured from various third-party vendors, which may contain hardware Trojan horse (HTH) affecting throughput and reliability of the Cloud. We propose Trojan-aware processing of tasks by monitored execution of a task on different untrusted cores, and then one more execution is done upon detection of hardware Trojan effects. For this stringent scheduling environment, the proposed dynamic scheduling algorithm is also properly extended to guarantee successful recovery from Trojan effects for all accepted tasks. Experimental results show that our algorithm improves worst-case-response-time for all tasks including non-deadline tasks and achieves lower task rejection rate for the deadline tasks, through judicious non-uniform partitioning of FPGAs based on supported jobs and subsequent better resource utilization, compared to that for existing Trojan-aware scheduling techniques.

...read moreread less

Citations

PDF

Open Access

More filters

DOI•

Worst-Case Execution Time Guarantees for Runtime-Reconfigurable Architectures

[...]

Marvin Damschen

01 Jan 2019

TL;DR: This thesis contributes novel co-scheduling approaches to distribute work among CPU and GPU in an extensive analysis of how average-case performance is achieved on fused CPU-GPU architectures, a main trend in current high-performance microarchitectures that combines a CPU and a GPU on a single chip.

...read moreread less

Abstract: Real-time systems are ubiquitous in our everyday life, e.g., in safety-critical domains such as automotive, avionics or robotics. The correctness of a real-time system does not only depend on the correctness of its calculations, but also on the non-functional requirement of adhering to deadlines. Failing to meet a deadline may lead to severe malfunctions, therefore worst-case execution times (WCET) need to be guaranteed. Despite significant scientific advances, however, timing analysis of WCET guarantees lags years behind current high-performance microarchitectures with out-of-order scheduling pipelines, several hardware threads and multiple (shared) cache layers. To satisfy the increasing performance demands of real-time systems, analyzable performance features are required. In order to escape the scarcity of timing-analyzable performance features, the main contribution of this thesis is the introduction of runtime reconfiguration of hardware accelerators onto a field-programmable gate array (FPGA) as a novel means to achieve performance that is amenable to WCET guarantees. Instead of designing an architecture for a specific application domain, this approach preserves the flexibility of the system. First, this thesis contributes novel co-scheduling approaches to distribute work among CPU and GPU in an extensive analysis of how (average-case) performance is achieved on fused CPU-GPU architectures, a main trend in current high-performance microarchitectures that combines a CPU and a GPU on a single chip. Being able to employ such architectures in real-time systems would be highly desirable, because they provide high performance within a limited area and power budget. As a result of this analysis, however, a cache coherency bottleneck is uncovered in recent fused CPU-GPU architectures that share the last level cache between CPU and GPU. This insight (i) complicates performance predictions and (ii) adds a shared last level cache between CPU and GPU to the growing list of microarchitectural features that benefit average-case performance, but render the analysis of WCET guarantees on high-performance architectures virtually infeasible. Thus, further motivating the need for novel microarchitectural features that provide predictable performance and are amenable to timing analysis. Towards this end, a runtime reconfiguration controller called ``Command-based Reconfiguration Queue'' (CoRQ) is presented that provides guaranteed latencies for its operations, especially for the reconfiguration delay, i.e., the time it takes to reconfigure a hardware accelerator onto a reconfigurable fabric (e.g., FPGA). CoRQ enables the design of timing-analyzable runtime-reconfigurable architectures that support WCET guarantees. Based on the --now feasible-- guaranteed reconfiguration delay of accelerators, a WCET analysis is introduced that enables tasks to reconfigure application-specific custom instructions (CIs) at runtime. CIs are executed by a processor pipeline and invoke execution of one or more accelerators. Different measures to deal with reconfiguration delays are compared for their impact on accelerated WCET guarantees and overestimation. The timing anomaly of runtime reconfiguration is identified and safely bounded: a case where executing iterations of a computational kernel faster than in WCET during reconfiguration of CIs can prolong the total execution time of a task. Once tasks that perform runtime reconfiguration of CIs can be analyzed for WCET guarantees, the question of which CIs to configure on a constrained reconfigurable area to optimize the WCET is raised. The question is addressed for systems where multiple CIs with different implementations each (allowing to trade-off latency and area requirements) can be selected. This is generally the case, e.g., when employing high-level synthesis. This so-called WCET-optimizing instruction set selection problem is modeled based on the Implicit Path Enumeration Technique (IPET), which is the path analysis technique state-of-the-art timing analyzers rely on. To our knowledge, this is the first approach that enables WCET optimization with support for making use of global program flow information (and information about reconfiguration delay). An optimal algorithm (similar to Branch and Bound) and a fast greedy heuristic algorithm (that achieves the optimal solution in most cases) are presented. Finally, an approach is presented that, for the first time, combines optimized static WCET guarantees and runtime optimization of the average-case execution (maintaining WCET guarantees) using runtime reconfiguration of hardware accelerators by leveraging runtime slack (the amount of time that program parts are executed faster than in WCET). It comprises an analysis of runtime slack bounds that enable safe reconfiguration for average-case performance under WCET guarantees and presents a mechanism to monitor runtime slack using a simple performance counter that is commonly available in many microprocessors. Ultimately, this thesis shows that runtime reconfiguration of accelerators is a key feature to achieve predictable performance.

...read moreread less

1 citations

Journal Article•DOI•

LIGHT: Lightweight Authentication for Intra Embedded Integrated Electronic Systems

[...]

01 Mar 2023-IEEE Transactions on Dependable and Secure Computing

TL;DR: In this paper , a lightweight authenticated key exchange (AKE) protocol for embedded integrated electronic systems (EIESs) based on half-duplex and "command/response" bus is proposed.

...read moreread less

Abstract: As embedded integrated electronic systems (EIESs) become more pervasive (including in mission-critical applications), the need to ensure the security of data exchange in such a system against various malicious activities becomes more pronounced. However, designing secure and efficient solutions, such as authentication protocols, for the many different embedded systems with varying internal communication modes remains challenging. Therefore, in this paper, we propose a lightweight authenticated key-exchange (AKE) protocol for EIESs based on half-duplex and “command/response” bus. Specifically, the proposed protocol is designed to operate on resource-constrained devices, as well as having minimal number of interactions. We then prove the security of the proposed protocol and present the security parameter selection strategy for protocol implementation based on the empirical evaluations. Moreover, efficiency analysis also shows that the protocol can be effectively deployed in the EIESs environment.

...read moreread less

1 citations

Journal Article•DOI•

LIGHT: Lightweight Authentication for Intra Embedded Integrated Electronic Systems

[...]

Xuru Li, Daojing He, Yun Gao, Ximeng Liu, Sammy Y. N. Chan, Manghan Pan, Kim-Kwang Raymond Choo - Show less +3 more

01 Mar 2023-IEEE Transactions on Dependable and Secure Computing

TL;DR: In this paper , a lightweight authenticated key exchange (AKE) protocol for embedded integrated electronic systems (EIESs) based on half-duplex and "command/response" bus is proposed.

...read moreread less

References

PDF

Open Access

More filters

Journal Article•DOI•

Building Trustworthy Systems Using Untrusted Components: A High-Level Synthesis Approach

[...]

Jeyavijayan Rajendran¹, Ozgur Sinanoglu², Ramesh Karri³•Institutions (3)

University of Texas at Dallas¹, New York University Abu Dhabi², New York University³

11 Apr 2016-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This paper identifies design constraints for Trojan detection to achieving detection, collusion prevention, and isolating the Trojan-infected 3PIP, and incorporates them during high-level synthesis.

...read moreread less

Abstract: Trustworthiness of system-on-chip designs is undermined by malicious logic (Trojans) in third-party intellectual properties (3PIPs). In this paper, duplication, diversity, and isolation principles have been extended to detect build trustworthy systems using untrusted, potentially Trojan-infected 3PIPs. We use a diverse set of vendors to prevent collusions between the 3PIPs from the same vendor. We identify design constraints for Trojan detection to achieving detection, collusion prevention, and isolating the Trojan-infected 3PIP, and incorporate them during high-level synthesis. In addition, we develop techniques to reduce the number of vendors. The effectiveness of the proposed techniques is validated using the high-level synthesis benchmarks.

...read moreread less

64 citations

Journal Article•DOI•

Security Assurance for System-on-Chip Designs With Untrusted IPs

[...]

Abhishek Basak¹, Swarup Bhunia², Thomas Tkacik³, Sandip Ray³•Institutions (3)

Intel¹, University of Florida², NXP Semiconductors³

01 Jul 2017-IEEE Transactions on Information Forensics and Security

TL;DR: A novel, resilient SoC security architecture to ensure trusted SoC operation with untrusted IPs and demonstrates the effectiveness of this framework for system protection using several illustrative practical use cases.

...read moreread less

Abstract: Modern system-on-chip (SoC) designs involve integration of a large number of intellectual property (IP) blocks, many of which are acquired from untrusted third-party vendors. An IP containing a security vulnerability—whether inadvertent or malicious—may compromise the trustworthiness of the entire SoC, e.g. , by leaking sensitive information or causing execution failures at key points. Existing functional validation approaches, post-manufacturing tests, and IP trust verification techniques are inadequate to accomplish comprehensive system-level security assurance in the presence of untrusted IPs. In this paper, we analyze security issues at the SoC level caused by untrusted IPs. We also propose a novel, resilient SoC security architecture to ensure trusted SoC operation with untrusted IPs. Our architecture realizes fine-grained IP-trust aware security policies in an efficient security policy checker that enables run-time monitoring of security issues arising from untrusted IPs. It also exploits on-chip design-for-debug architecture to ensure trusted information flow from IP blocks to the security policy checker. Unlike existing solutions to the untrusted IP problem, which rely on verification of IP trust before they are integrated into an SoC, the proposed approach follows a fundamentally different architecture-level solution based on run-time resilience. We demonstrate the effectiveness of this framework for system protection using several illustrative practical use cases. We also provide experimental results to show that the overhead of the proposed architecture is modest on representative SoC designs.

...read moreread less

57 citations

Journal Article•DOI•

Floorplanning for Partially Reconfigurable FPGAs

[...]

Pritha Banerjee¹, M Sangtani², Susmita Sur-Kolay³•Institutions (3)

University of Calcutta¹, Nvidia², Indian Statistical Institute³

01 Jan 2011-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A global floorplan generation method PartialHeteroFP is proposed to obtain same positions for the common modules across all instances such that the heterogeneous resource requirements of all modules in each instance are satisfied, and the total half-perimeter wirelength over all instances is minimal.

...read moreread less

Abstract: Partial reconfiguration on heterogeneous field-programmable gate arrays with millions of gates yields better utilization of its different types of resources by swapping in and out the appropriate modules of one or more applications at any instant of time. Given a schedule of sub-task instances where each instance is specified as a netlist of active modules, reconfiguration overhead can be reduced by fixing the position and shapes of modules common across all instances. We propose a global floorplan generation method PartialHeteroFP to obtain same positions for the common modules across all instances such that the heterogeneous resource requirements of all modules in each instance are satisfied, and the total half-perimeter wirelength over all instances is minimal. Experimental results establish that the proposed PartialHeteroFP produces floorplans very fast, with 100% match of common modules and thereby minimizing the partial reconfiguration overhead.

...read moreread less

42 citations

Book Chapter•DOI•

Hardware Task Scheduling for Partially Reconfigurable FPGAs

[...]

George Charitopoulos¹, Iosif Koidis¹, Kyprianos Papadimitriou¹, Dionisios Pnevmatikatos¹•Institutions (1)

Foundation for Research & Technology – Hellas¹

13 Apr 2015

TL;DR: This work proposes and implements a run time system manager for scheduling software and hardware tasks on available processor(s) and hardware (HW) tasks on any number of reconfigurable regions of a partially reconfigured FPGA, and validate its correctness using its RTSM to execute an image processing application on a ZedBoard platform.

...read moreread less

Abstract: Partial reconfiguration (PR) of FPGAs can be used to dynamically extend and adapt the functionality of computing systems, swapping in and out HW tasks. To coordinate the on-demand task execution, we propose and implement a run time system manager for scheduling software (SW) tasks on available processor(s) and hardware (HW) tasks on any number of reconfigurable regions of a partially reconfigurable FPGA. Fed with the initial partitioning of the application into tasks, the corresponding task graph, and the available task mappings, the RTSM considers the runtime status of each task and region, e.g. busy, idle, scheduled for reconfiguration/execution, etc., to execute tasks. Our RTSM supports task reuse and configuration prefetching to minimize reconfigurations, task movement among regions to efficiently manage the FPGA area, and RR reservation for future reconfiguration and execution. We validate its correctness using our RTSM to execute an image processing application on a ZedBoard platform. We also evaluate its features within a simulation framework, and find that despite the technology limitations, our approach can give promising results in terms of quality of scheduling.

...read moreread less

36 citations

"Minimization of WCRT with Recovery ..." refers methods in this paper

...Very few works consider the effect of fault and hardware Trojan horse (HTH) on the scheduling techniques for DPR-enabled FPGAs....
[...]
...Publication date: December 2020. multiple hardware subtasks on the DPR-enabled FPGA to accelerate that task....
[...]
...Online scheduling on DPR-enabled FPGA [8] searches for appropriate best-fit allocation under reconfiguration port constraint and applies “reuse and partial reuse” policy that saves reconfiguration time by partially or completely implementing the function of the newly arrived task with the logic configured for an already placed task....
[...]
...Proper scheduling of each preceding task in DPR-enabled FPGA platform facilitates minimization of the waiting time for a new task....
[...]
...Dynamic partially reconfigurable Field Programmable Gate Array (DPR-enabled FPGA) is the best suited processing element to respect the required flexibility with high performance at low cost....
[...]

Proceedings Article•DOI•

High performance in the cloud with FPGA groups

[...]

Anca Iordache¹, Guillaume Pierre¹, Peter Sanders, Jose G. F. Coutinho², Mark Stillwell² - Show less +1 more•Institutions (2)

University of Rennes¹, Imperial College London²

06 Dec 2016

TL;DR: An autoscaling algorithm is presented to maximize FPGA groups' resource utilization and reduce user-perceived computation latencies, and which increases resource utilization from 52% to 61% compared to a static resource allocation, while reducing task execution latencies by 61%.

...read moreread less

Abstract: Field-programmable gate arrays (FPGAs) can offer invaluable computational performance for many compute-intensive algorithms. However, to justify their purchase and administration costs it is necessary to maximize resource utilization over their expected lifetime. Making FPGAs available in a cloud environment would make them attractive to new types of users and applications and help democratize this increasingly popular technology. However, there currently exists no satisfactory technique for offering FPGAs as cloud resources and sharing them between multiple tenants. We propose FPGA groups, which are seen by their clients as a single virtual FPGA, and which aggregate the computational power of multiple physical FPGAs. FPGA groups are elastic, and they may be shared among multiple tenants. We present an autoscaling algorithm to maximize FPGA groups' resource utilization and reduce user-perceived computation latencies. FPGA groups incur a low overhead in the order of 0.09 ms per submitted task. When faced with a challenging workload, the autoscaling algorithm increases resource utilization from 52% to 61% compared to a static resource allocation, while reducing task execution latencies by 61%.

...read moreread less

28 citations