scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Design methodology for fault-tolerant heterogeneous MPSoC under real-time constraints

TL;DR: A system level approach for a fault tolerant heterogeneous multi-processor system-on-chip (HMPSoC) platform that can be customized at design phase according to the requirements and the environmental constraints of the target application is proposed.
Abstract: We are proposing a system level approach for a fault tolerant heterogeneous multi-processor system-on-chip (HMPSoC) platform that can be customized at design phase according to the requirements and the environmental constraints of the target application. This framework can provide optimal tradeoffs for maximizing the reliability of the system under real-time constraints. The proposed heterogeneous platform consists of a mesh-based network-on-chip (NoC) communication architecture, which is equipped with two different types of processing elements: (i) high fault tolerance (FT) and (ii) high performance (HP) processors. For critical applications, the designer can increase the proportion of fault tolerant processors in the HMPSoC in order to minimize the probability of failure by compromising the overall performance. Similarly, for less critical applications, the overall performance will increase by having a higher ratio of high performance processors. We will analyze the proposed HMPSoC architecture under different faults and performance constraints, so that the precise proportion of the diverse processors may be adjusted. The design space exploration has been carried out using our system-level design tool and the resulting schedules have been verified by executing the applications on cycle-accurate XHiNoC simulator. The objective of this approach is to generate a dependable system under hard real-time constraints with minimized hardware effort related to the adopted processing elements.
Citations
More filters
Book ChapterDOI
13 Apr 2015
TL;DR: Challenges faced, when designing NoCs for real-time applications are discussed and contributions in this area are surveyed on the level of QoS support, fault tolerance and adaptivity.
Abstract: Networks-on-Chip (NoCs) are the backbone of communications in a Multi-Processor System-on-Chip (MPSoC) platform. MPSoCs are becoming an unavoidable trend especially with the growing complexity of embedded applications requiring massive parallel computation. Real-time applications make out a significant portion of the embedded field, which cannot be overlooked. However, the use of NoCs in real-time systems imposes complex constraints on the overall design. In this paper, challenges faced, when designing NoCs for real-time applications are discussed. Contributions in this area are surveyed on the level of QoS support, fault tolerance and adaptivity. The surveyed work provides a comprehensive overview of existing real-time NoC architectures and gives an insight towards future promising research points in this field.

11 citations

Proceedings ArticleDOI
26 Aug 2014
TL;DR: This work addresses the under-utilization problem by proposing a mixed-critical scheduling method such that the overall system performance is increased but all deadlines of SC tasks are met even in the presence of transient faults.
Abstract: There is a lack of mixed-criticality support in system-level design frameworks for dependable Network-on-Chip (NoC) -based multiprocessor systems Such frameworks should address mixed-criticality in both computation and NoC communication In Mixed-Critical (MC) systems, only the Safety-Critical (SC) parts have strict predictability and dependability requirements, but conventional methods design the whole system with pessimistic settings to ensure these requirements are satisfied This however, results in under-utilization of computation and network resources, and a decrease in performance In this work, we integrate support of MC applications into an existing system-level design framework of dependable NoC-based multiprocessors This framework handles failures in both computation and inter-task communication We address the under-utilization problem by proposing a mixed-critical scheduling method such that the overall system performance is increased but all deadlines of SC tasks are met even in the presence of transient faults Our approach handles mixed-criticality not only in tasks but also in inter-task messages Our experiments demonstrate performance improvement in different run-time execution environments and with different MC benchmark applications including a realistic robot control system Performance improvement is achieved regardless of task graph size, NoC size or temporal redundancy level

8 citations


Cites background from "Design methodology for fault-tolera..."

  • ...It can also support dependable applications via fault-tolerant shifting-based scheduling (SBS) methodology which is enhanced for communication scheduling as well [17]....

    [...]

  • ...network routers and links) in order to ensure that the predictable schedule can accommodate task re-executions and message retransmissions [17], [21]....

    [...]

Proceedings ArticleDOI
23 Sep 2014
TL;DR: Results demonstrate that the proposed online monitoring strategy is highly scalable due to the compact monitor probe and the ability to reuse the existing NoC communication infrastructure and the traffic heat map generation and throughput display demonstrates benefits in aiding NoC system prototyping and debugging.
Abstract: Modern Networks-on-Chip (NoC) have the capability to tolerate and adapt to the faults and failures in the hardware. Monitoring and debugging is a real challenge due to the NoC system complexity and large scale size. A key requirement is an evaluation and benchmarking mechanism to quantitatively analyse a NoC system's fault tolerant capability. A novel monitoring mechanism is proposed to evaluate the fault tolerant capability of an NoC by: (1) using a compact monitor probe to detect the events of each NoC node, (2) re-using the exist NoC infrastructure to communicate analysis data of back to a terminal PC which removes the need for additional hardware resources and maintain hardware scalability and (3) calculating throughput, the number of lost/corrupted packets and generating a heat map of NoC traffic for quantitative analysis. The paper presents results on a case study using an example fault-tolerant routing algorithm and highlights the minimal area overhead of the monitoring mechanism (~6%). Results demonstrate that the proposed online monitoring strategy is highly scalable due to the compact monitor probe and the ability to reuse the existing NoC communication infrastructure. In addition, the traffic heat map generation and throughput display demonstrates benefits in aiding NoC system prototyping and debugging.

5 citations


Cites background from "Design methodology for fault-tolera..."

  • ...…infrastructure to communicate analysis data of back to a terminal PC which removes the need for additional hardware resources and maintain hardware scalability and (3) calculating throughput, the number of lost/corrupted packets and generating a heat map of NoC traffic for quantitative analysis....

    [...]

BookDOI
01 Jan 2015
TL;DR: This work proposes a Demand-based Cache Memory Block Manager (DCMBM) that allows the storing of regular instructions and reconfigurable contexts in a single memory structure and shows that the DCMBM-DIM spends, on average, 43.4% less energy maintaining the same performance of split memories structures with the same storage capacity.
Abstract: Reconfigurable architectures have emerged as energy efficient solution to increase the performance of the current embedded systems. However, the employment of such architectures causes area and power overhead mainly due to the mandatory attachment of a memory structure responsible for storing the reconfiguration contexts, named as context memory. However, most reconfigurable architectures, besides the context memory, employ a cache memory to store regular instructions which, somehow, cause a needless redundancy. In this work, we propose a Demand-based Cache Memory Block Manager (DCMBM) that allows the storing of regular instructions and reconfigurable contexts in a single memory structure. At runtime, depending on the application requirements, the proposed approach manages the ratio of memory blocks that is allocated for each type of information. Results show that the DCMBM-DIM spends, on average, 43.4% less energy maintaining the same performance of split memories structures with the same storage capacity.

5 citations

Journal ArticleDOI
TL;DR: A design methodology for a fault tolerant homogeneous MPSoC having additional design objectives that include low hardware overhead and performance is proposed and the most relevant scheme is identified in terms of the given design constraints.
Abstract: We are proposing a design methodology for a fault tolerant homogeneous MPSoC having additional design objectives that include low hardware overhead and performance. We have implemented three different FT methodologies on MPSoCs and compared them against the defined constraints. The comparison of these FT methodologies is carried out by modelling their architectures in VHDL-RTL, on Spartan 3 FPGA. The results obtained through simulations helped us to identify the most relevant scheme in terms of the given design constraints.

2 citations


Cites background or methods from "Design methodology for fault-tolera..."

  • ...In [17], two different self-checking mechanisms were utilized in MPSoC....

    [...]

  • ...Our long term objective is to design a heterogeneous MPSoC, forwhich the design objectives are already set in [17]....

    [...]

References
More filters
Journal ArticleDOI
13 May 1983-Science
TL;DR: There is a deep and useful connection between statistical mechanics and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters), and a detailed analogy with annealing in solids provides a framework for optimization of very large and complex systems.
Abstract: There is a deep and useful connection between statistical mechanics (the behavior of systems with many degrees of freedom in thermal equilibrium at a finite temperature) and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters). A detailed analogy with annealing in solids provides a framework for optimization of the properties of very large and complex systems. This connection to statistical mechanics exposes new information and provides an unfamiliar perspective on traditional optimization problems and methods.

41,772 citations

Journal Article
TL;DR: Dynamic power management (DPM) is a design methodology for dynamically reconfiguring systems to provide the requested services and performance levels with a minimum number of active components or a minimum load on such components as mentioned in this paper.
Abstract: Dynamic power management (DPM) is a design methodology for dynamically reconfiguring systems to provide the requested services and performance levels with a minimum number of active components or a minimum load on such components. DPM encompasses a set of techniques that achieves energy-efficient computation by selectively turning off (or reducing the performance of) system components when they are idle (or partially unexploited). In this paper, we survey several approaches to system-level dynamic power management. We first describe how systems employ power-manageable components and how the use of dynamic reconfiguration can impact the overall power consumption. We then analyze DPM implementation issues in electronic systems, and we survey recent initiatives in standardizing the hardware/software interface to enable software-controlled power management of hardware components.

1,181 citations

Journal ArticleDOI
TL;DR: This paper describes how systems employ power-manageable components and how the use of dynamic reconfiguration can impact the overall power consumption, and survey recent initiatives in standardizing the hardware/software interface to enable software-controlled power management of hardware components.
Abstract: Dynamic power management (DPM) is a design methodology for dynamically reconfiguring systems to provide the requested services and performance levels with a minimum number of active components or a minimum load on such components DPM encompasses a set of techniques that achieves energy-efficient computation by selectively turning off (or reducing the performance of) system components when they are idle (or partially unexploited) In this paper, we survey several approaches to system-level dynamic power management We first describe how systems employ power-manageable components and how the use of dynamic reconfiguration can impact the overall power consumption We then analyze DPM implementation issues in electronic systems, and we survey recent initiatives in standardizing the hardware/software interface to enable software-controlled power management of hardware components

1,138 citations


"Design methodology for fault-tolera..." refers background in this paper

  • ...At the same time, the on-chip transistor density has increased the overall power consumption per unit area which is another dramatic problem for all integrated circuits [4]....

    [...]

Journal ArticleDOI
TL;DR: This paper provides a general description of NoC architectures and applications and enumerates several related research problems organized under five main categories: Application characterization, communication paradigm, communication infrastructure, analysis, and solution evaluation.
Abstract: To alleviate the complex communication problems that arise as the number of on-chip components increases, network-on-chip (NoC) architectures have been recently proposed to replace global interconnects. In this paper, we first provide a general description of NoC architectures and applications. Then, we enumerate several related research problems organized under five main categories: Application characterization, communication paradigm, communication infrastructure, analysis, and solution evaluation. Motivation, problem description, proposed approaches, and open issues are discussed for each problem from system, microarchitecture, and circuit perspectives. Finally, we address the interactions among these research problems and put the NoC design process into perspective.

733 citations


"Design methodology for fault-tolera..." refers methods in this paper

  • ...In combination with NoCs being used as communication architectures, the resulting MPSoC platforms represent a flexible, scalable and unified layered communication platform [7]....

    [...]

Journal ArticleDOI
TL;DR: The history of MPSoCs is surveyed to argue that they represent an important and distinct category of computer architecture and to survey computer-aided design problems relevant to the design of MP soCs.
Abstract: The multiprocessor system-on-chip (MPSoC) uses multiple CPUs along with other hardware subsystems to implement a system. A wide range of MPSoC architectures have been developed over the past decade. This paper surveys the history of MPSoCs to argue that they represent an important and distinct category of computer architecture. We consider some of the technological trends that have driven the design of MPSoCs. We also survey computer-aided design problems relevant to the design of MPSoCs.

435 citations