scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Minimization of WCRT with Recovery Assurance from Hardware Trojans for Tasks on FPGA-based Cloud

07 Dec 2020-ACM Transactions in Embedded Computing Systems (ACMPUB27New York, NY, USA)-Vol. 20, Iss: 1, pp 1-25
TL;DR: Dynamic partial reconfiguration (DPR) enabled FPGA-based Cloud architecture acts as a flexible and efficient shared environment to facilitates application support to users' request at low cost.
Abstract: Dynamic partial reconfiguration (DPR) enabled FPGA-based Cloud architecture acts as a flexible and efficient shared environment to facilitates application support to users’ request at low cost. While on one hand we need to handle a variety of tasks, such as periodic or sporadic, deadline or non-deadline, high or low critical tasks from the point of producing correct results, on the other hand we are constrained to use untrusted FPGA-based application IP blocks procured from various third-party vendors, which may contain hardware Trojan horse (HTH) affecting throughput and reliability of the Cloud. We propose Trojan-aware processing of tasks by monitored execution of a task on different untrusted cores, and then one more execution is done upon detection of hardware Trojan effects. For this stringent scheduling environment, the proposed dynamic scheduling algorithm is also properly extended to guarantee successful recovery from Trojan effects for all accepted tasks. Experimental results show that our algorithm improves worst-case-response-time for all tasks including non-deadline tasks and achieves lower task rejection rate for the deadline tasks, through judicious non-uniform partitioning of FPGAs based on supported jobs and subsequent better resource utilization, compared to that for existing Trojan-aware scheduling techniques.
Citations
More filters
DOI
01 Jan 2019
TL;DR: This thesis contributes novel co-scheduling approaches to distribute work among CPU and GPU in an extensive analysis of how average-case performance is achieved on fused CPU-GPU architectures, a main trend in current high-performance microarchitectures that combines a CPU and a GPU on a single chip.
Abstract: Real-time systems are ubiquitous in our everyday life, e.g., in safety-critical domains such as automotive, avionics or robotics. The correctness of a real-time system does not only depend on the correctness of its calculations, but also on the non-functional requirement of adhering to deadlines. Failing to meet a deadline may lead to severe malfunctions, therefore worst-case execution times (WCET) need to be guaranteed. Despite significant scientific advances, however, timing analysis of WCET guarantees lags years behind current high-performance microarchitectures with out-of-order scheduling pipelines, several hardware threads and multiple (shared) cache layers. To satisfy the increasing performance demands of real-time systems, analyzable performance features are required. In order to escape the scarcity of timing-analyzable performance features, the main contribution of this thesis is the introduction of runtime reconfiguration of hardware accelerators onto a field-programmable gate array (FPGA) as a novel means to achieve performance that is amenable to WCET guarantees. Instead of designing an architecture for a specific application domain, this approach preserves the flexibility of the system. First, this thesis contributes novel co-scheduling approaches to distribute work among CPU and GPU in an extensive analysis of how (average-case) performance is achieved on fused CPU-GPU architectures, a main trend in current high-performance microarchitectures that combines a CPU and a GPU on a single chip. Being able to employ such architectures in real-time systems would be highly desirable, because they provide high performance within a limited area and power budget. As a result of this analysis, however, a cache coherency bottleneck is uncovered in recent fused CPU-GPU architectures that share the last level cache between CPU and GPU. This insight (i) complicates performance predictions and (ii) adds a shared last level cache between CPU and GPU to the growing list of microarchitectural features that benefit average-case performance, but render the analysis of WCET guarantees on high-performance architectures virtually infeasible. Thus, further motivating the need for novel microarchitectural features that provide predictable performance and are amenable to timing analysis. Towards this end, a runtime reconfiguration controller called ``Command-based Reconfiguration Queue'' (CoRQ) is presented that provides guaranteed latencies for its operations, especially for the reconfiguration delay, i.e., the time it takes to reconfigure a hardware accelerator onto a reconfigurable fabric (e.g., FPGA). CoRQ enables the design of timing-analyzable runtime-reconfigurable architectures that support WCET guarantees. Based on the --now feasible-- guaranteed reconfiguration delay of accelerators, a WCET analysis is introduced that enables tasks to reconfigure application-specific custom instructions (CIs) at runtime. CIs are executed by a processor pipeline and invoke execution of one or more accelerators. Different measures to deal with reconfiguration delays are compared for their impact on accelerated WCET guarantees and overestimation. The timing anomaly of runtime reconfiguration is identified and safely bounded: a case where executing iterations of a computational kernel faster than in WCET during reconfiguration of CIs can prolong the total execution time of a task. Once tasks that perform runtime reconfiguration of CIs can be analyzed for WCET guarantees, the question of which CIs to configure on a constrained reconfigurable area to optimize the WCET is raised. The question is addressed for systems where multiple CIs with different implementations each (allowing to trade-off latency and area requirements) can be selected. This is generally the case, e.g., when employing high-level synthesis. This so-called WCET-optimizing instruction set selection problem is modeled based on the Implicit Path Enumeration Technique (IPET), which is the path analysis technique state-of-the-art timing analyzers rely on. To our knowledge, this is the first approach that enables WCET optimization with support for making use of global program flow information (and information about reconfiguration delay). An optimal algorithm (similar to Branch and Bound) and a fast greedy heuristic algorithm (that achieves the optimal solution in most cases) are presented. Finally, an approach is presented that, for the first time, combines optimized static WCET guarantees and runtime optimization of the average-case execution (maintaining WCET guarantees) using runtime reconfiguration of hardware accelerators by leveraging runtime slack (the amount of time that program parts are executed faster than in WCET). It comprises an analysis of runtime slack bounds that enable safe reconfiguration for average-case performance under WCET guarantees and presents a mechanism to monitor runtime slack using a simple performance counter that is commonly available in many microprocessors. Ultimately, this thesis shows that runtime reconfiguration of accelerators is a key feature to achieve predictable performance.

1 citations

Journal ArticleDOI
TL;DR: In this paper , a lightweight authenticated key exchange (AKE) protocol for embedded integrated electronic systems (EIESs) based on half-duplex and "command/response" bus is proposed.
Abstract: As embedded integrated electronic systems (EIESs) become more pervasive (including in mission-critical applications), the need to ensure the security of data exchange in such a system against various malicious activities becomes more pronounced. However, designing secure and efficient solutions, such as authentication protocols, for the many different embedded systems with varying internal communication modes remains challenging. Therefore, in this paper, we propose a lightweight authenticated key-exchange (AKE) protocol for EIESs based on half-duplex and “command/response” bus. Specifically, the proposed protocol is designed to operate on resource-constrained devices, as well as having minimal number of interactions. We then prove the security of the proposed protocol and present the security parameter selection strategy for protocol implementation based on the empirical evaluations. Moreover, efficiency analysis also shows that the protocol can be effectively deployed in the EIESs environment.

1 citations

Journal ArticleDOI
TL;DR: In this paper , a lightweight authenticated key exchange (AKE) protocol for embedded integrated electronic systems (EIESs) based on half-duplex and "command/response" bus is proposed.
Abstract: As embedded integrated electronic systems (EIESs) become more pervasive (including in mission-critical applications), the need to ensure the security of data exchange in such a system against various malicious activities becomes more pronounced. However, designing secure and efficient solutions, such as authentication protocols, for the many different embedded systems with varying internal communication modes remains challenging. Therefore, in this paper, we propose a lightweight authenticated key-exchange (AKE) protocol for EIESs based on half-duplex and “command/response” bus. Specifically, the proposed protocol is designed to operate on resource-constrained devices, as well as having minimal number of interactions. We then prove the security of the proposed protocol and present the security parameter selection strategy for protocol implementation based on the empirical evaluations. Moreover, efficiency analysis also shows that the protocol can be effectively deployed in the EIESs environment.
References
More filters
Journal ArticleDOI
TL;DR: This paper identifies design constraints for Trojan detection to achieving detection, collusion prevention, and isolating the Trojan-infected 3PIP, and incorporates them during high-level synthesis.
Abstract: Trustworthiness of system-on-chip designs is undermined by malicious logic (Trojans) in third-party intellectual properties (3PIPs). In this paper, duplication, diversity, and isolation principles have been extended to detect build trustworthy systems using untrusted, potentially Trojan-infected 3PIPs. We use a diverse set of vendors to prevent collusions between the 3PIPs from the same vendor. We identify design constraints for Trojan detection to achieving detection, collusion prevention, and isolating the Trojan-infected 3PIP, and incorporate them during high-level synthesis. In addition, we develop techniques to reduce the number of vendors. The effectiveness of the proposed techniques is validated using the high-level synthesis benchmarks.

64 citations

Journal ArticleDOI
TL;DR: A novel, resilient SoC security architecture to ensure trusted SoC operation with untrusted IPs and demonstrates the effectiveness of this framework for system protection using several illustrative practical use cases.
Abstract: Modern system-on-chip (SoC) designs involve integration of a large number of intellectual property (IP) blocks, many of which are acquired from untrusted third-party vendors. An IP containing a security vulnerability—whether inadvertent or malicious—may compromise the trustworthiness of the entire SoC, e.g. , by leaking sensitive information or causing execution failures at key points. Existing functional validation approaches, post-manufacturing tests, and IP trust verification techniques are inadequate to accomplish comprehensive system-level security assurance in the presence of untrusted IPs. In this paper, we analyze security issues at the SoC level caused by untrusted IPs. We also propose a novel, resilient SoC security architecture to ensure trusted SoC operation with untrusted IPs. Our architecture realizes fine-grained IP-trust aware security policies in an efficient security policy checker that enables run-time monitoring of security issues arising from untrusted IPs. It also exploits on-chip design-for-debug architecture to ensure trusted information flow from IP blocks to the security policy checker. Unlike existing solutions to the untrusted IP problem, which rely on verification of IP trust before they are integrated into an SoC, the proposed approach follows a fundamentally different architecture-level solution based on run-time resilience. We demonstrate the effectiveness of this framework for system protection using several illustrative practical use cases. We also provide experimental results to show that the overhead of the proposed architecture is modest on representative SoC designs.

57 citations

Journal ArticleDOI
TL;DR: A global floorplan generation method PartialHeteroFP is proposed to obtain same positions for the common modules across all instances such that the heterogeneous resource requirements of all modules in each instance are satisfied, and the total half-perimeter wirelength over all instances is minimal.
Abstract: Partial reconfiguration on heterogeneous field-programmable gate arrays with millions of gates yields better utilization of its different types of resources by swapping in and out the appropriate modules of one or more applications at any instant of time. Given a schedule of sub-task instances where each instance is specified as a netlist of active modules, reconfiguration overhead can be reduced by fixing the position and shapes of modules common across all instances. We propose a global floorplan generation method PartialHeteroFP to obtain same positions for the common modules across all instances such that the heterogeneous resource requirements of all modules in each instance are satisfied, and the total half-perimeter wirelength over all instances is minimal. Experimental results establish that the proposed PartialHeteroFP produces floorplans very fast, with 100% match of common modules and thereby minimizing the partial reconfiguration overhead.

42 citations

Book ChapterDOI
13 Apr 2015
TL;DR: This work proposes and implements a run time system manager for scheduling software and hardware tasks on available processor(s) and hardware (HW) tasks on any number of reconfigurable regions of a partially reconfigured FPGA, and validate its correctness using its RTSM to execute an image processing application on a ZedBoard platform.
Abstract: Partial reconfiguration (PR) of FPGAs can be used to dynamically extend and adapt the functionality of computing systems, swapping in and out HW tasks. To coordinate the on-demand task execution, we propose and implement a run time system manager for scheduling software (SW) tasks on available processor(s) and hardware (HW) tasks on any number of reconfigurable regions of a partially reconfigurable FPGA. Fed with the initial partitioning of the application into tasks, the corresponding task graph, and the available task mappings, the RTSM considers the runtime status of each task and region, e.g. busy, idle, scheduled for reconfiguration/execution, etc., to execute tasks. Our RTSM supports task reuse and configuration prefetching to minimize reconfigurations, task movement among regions to efficiently manage the FPGA area, and RR reservation for future reconfiguration and execution. We validate its correctness using our RTSM to execute an image processing application on a ZedBoard platform. We also evaluate its features within a simulation framework, and find that despite the technology limitations, our approach can give promising results in terms of quality of scheduling.

36 citations


"Minimization of WCRT with Recovery ..." refers methods in this paper

  • ...Very few works consider the effect of fault and hardware Trojan horse (HTH) on the scheduling techniques for DPR-enabled FPGAs....

    [...]

  • ...Publication date: December 2020. multiple hardware subtasks on the DPR-enabled FPGA to accelerate that task....

    [...]

  • ...Online scheduling on DPR-enabled FPGA [8] searches for appropriate best-fit allocation under reconfiguration port constraint and applies “reuse and partial reuse” policy that saves reconfiguration time by partially or completely implementing the function of the newly arrived task with the logic configured for an already placed task....

    [...]

  • ...Proper scheduling of each preceding task in DPR-enabled FPGA platform facilitates minimization of the waiting time for a new task....

    [...]

  • ...Dynamic partially reconfigurable Field Programmable Gate Array (DPR-enabled FPGA) is the best suited processing element to respect the required flexibility with high performance at low cost....

    [...]

Proceedings ArticleDOI
06 Dec 2016
TL;DR: An autoscaling algorithm is presented to maximize FPGA groups' resource utilization and reduce user-perceived computation latencies, and which increases resource utilization from 52% to 61% compared to a static resource allocation, while reducing task execution latencies by 61%.
Abstract: Field-programmable gate arrays (FPGAs) can offer invaluable computational performance for many compute-intensive algorithms. However, to justify their purchase and administration costs it is necessary to maximize resource utilization over their expected lifetime. Making FPGAs available in a cloud environment would make them attractive to new types of users and applications and help democratize this increasingly popular technology. However, there currently exists no satisfactory technique for offering FPGAs as cloud resources and sharing them between multiple tenants. We propose FPGA groups, which are seen by their clients as a single virtual FPGA, and which aggregate the computational power of multiple physical FPGAs. FPGA groups are elastic, and they may be shared among multiple tenants. We present an autoscaling algorithm to maximize FPGA groups' resource utilization and reduce user-perceived computation latencies. FPGA groups incur a low overhead in the order of 0.09 ms per submitted task. When faced with a challenging workload, the autoscaling algorithm increases resource utilization from 52% to 61% compared to a static resource allocation, while reducing task execution latencies by 61%.

28 citations


"Minimization of WCRT with Recovery ..." refers methods in this paper

  • ...Tasks with similar requirements are assigned to an FPGA of appropriate group [20]....

    [...]