An error-detection and self-repairing method for dynamically and partially reconfigurable systems
Summary (6 min read)
1 INTRODUCTION
- ECHNOLOGY scaling in the nano-metric domain and beyond supports the increasing usage of high performance and miniaturized embedded systems.
- Among the available technology solutions, the adoption of SRAM-based FPGAs is the most suitable for the realization of dynamically and partially reconfigurable systems; however, when used in harsh environments, SRAM-based FPGAs have to withstand the radiation effects in the form of Single Event Upsets (SEUs) and Multiple Event Upsets (MEUs), especially affecting their configuration memory [2] .
- On the contrary, the components in the dynamic region correspond to partially reconfigurable resources that can be configured in different ways depending on the system requirements [8] .
- The proposed approach provides significant advantages compared to already developed solutions [9] [10] , mainly because it increases the error detection and correction capabilities while introducing comparable area and performance overhead.
- Section 3 describes the proposed method, while the developed design flow is illustrated in Section 4.
2 PREVIOUS WORKS
- State-of-the-art SRAM-based FPGAs are heterogeneous devices containing several macro blocks, like Digital Signal Processors (DSPs), Block RAMS and IO Blocks (IOBs), along with Configurable Logic Blocks (CLBs) inside the FPGA reconfiguration fabric.
- Each of these resource types is arranged in columns that span from top to bottom of the device realizing a column of CLBs, IOBs and BRAM memories interconnected by a mesh of heterogeneous routing resources.
- In details, with dynamic reconfiguration, the FPGA configuration memory can be read-back continuously without interfering with the circuit functionality and if any upset is detected it can be selectively re-written with the correct values, thus avoiding the accumulation of radiationinduced errors [12] .
- Spatial redundancy using Triple Modular Redundancy (TMR) is complementarily used with the read-back and correction techniques: on one side TMR can tolerate faults with the limitation of withstanding a single fault per voting group [13] , on the other side read-back and correction avoids the accumulation of errors within the configuration memory.
- The results achievable with this combined solution are computationally expensive and area hungry.
2.1 Main contribution
- The main contribution of the present work, which is based on the platform preliminarily presented in [8] , is the description of an autonomous recovery approach that can be applied to Partially Reconfigurable Modules (PRMs) when errors are detected inside them.
- The approach is implemented by the static region providing effective capabilities of error detection and correction of faults within the dynamic region.
- In details, the proposed method is characterized by the ability of detecting MEUs into the FPGA's configuration memory, as well as to recover any number of faults in the dynamic partition, thus improving previously developed approaches, as presented in [9] , that cannot deal with MEUs.
- The authors solution is adaptable to all modern SRAM-based FPGAs equipped with an Internal Configuration Access Port (ICAP) and based on a LUTslice architecture.
3 THE PROPOSED METHOD
- The proposed method consists of two flows: one applied to the dynamically reconfigurable region for implementing error detection, the other one for instrumenting the circuit mapped on the FPGA so that it supports the execution of the self-repairing method against single and multiple-bit errors.
- The static region contains the main processor, which is in charge of controlling the partially reconfigurable system operational functionalities: therefore, it is very important to tolerate and recover errors in these modules.
- Each RF contains a different number of "minor frames", each having a height equal to the clock region (row) and numbered from left to right.
- Practically, the F-DWC approach can be adopted by acting at the Hardware Design Language (HDL) level: the combinational functions are duplicated and both copies of the circuit LUTs are placed in a single FPGA slice using two consecutive available LUT positions.
- In the Multiple Bit Error (MBE) region each pair of LUTs generates a check flag and thus the authors have two check flags per slice.
3.1 Error Detection Method
- In order to fully explain their proposal, in this section the authors will specifically refer to the architecture of Xilinx Virtex-5 FPGAs.
- As described in the previous section, the error detection mechanism implemented in the reconfigurable region is based on LUT-based checkers and carry chains for propagating the check flags.
- Please note that the LUT checkers are only deployed when the carry chain is unavailable for comparison purposes.
- This allows reducing the performance degradation of the circuit implemented with their method, although in this case the detection mechanism is implemented at the modular level.
- The authors focus on the method adopted for the error detection using the carry chains for comparison; a more detailed explanation of both the LUT checkers and the carry chains insertion inside the physical place and route description of the circuit will be given in Section 4.3.
3.1.1 Single-bit error detection
- In order to detect single-bit errors, the authors propose to duplicate each original LUT function into two identical LUTs.
- The multiplexer "M2" receives an inverted (through the AMUX_2_BX hardwired connection) and buffered copy of the LUT A output at its "0" and "1" inputs while the selection line is tied to LUT B (which is the copy of LUT A) thus effectively performing the EX-NOR function.
- In case the CLB column contains empty slices the dedicated COUT connection cannot be used to propagate the flag signal upwards along the column.
- Errors affecting flip-flops cannot be directly detected.
3.1.2 Multiple-Bit error detection
- Multiple bit errors can only be detected if the error detecting carry chain is inserted in a specific pattern that the authors will mention in this section.
- In order to reduce the number of flags the authors propose the usage of 2 slices (out of the available 20) for merging the check flags by OR-ing them.
- As the authors are producing two flags for each clock region (one for odd and one for even slices) they can have a maximum of 72 LUTs (out of 80 LUTs in an even or odd slice column) configured for computations in any slice column location (even or odd) within a single clock region.
- Thus, the MBE regions require an overhead of 11.11 % for flag reduction.
3.2 Error Correction Method
- Data errors affecting combinational logic or Flip-Flops are individuated by the error detection scheme previously described.
- Secondly, the clock enabling signals should be de-activated to disable the propagation of errors to the next stages in the design.
- This is possible since both static and dynamic regions have well-defined interfaces with clock enabling registers.
- Lastly, the main processor controller enables the clock to re-start the normal operation in the DUT region involved in the correction.
4 DESIGN FLOW
- In this section the authors describe the tool flow they developed in order to insert fine-grain duplication with comparison using the built-in slice carry chains.
- A pre-map step generates a number of constraints for directed packing, placement and sites prohibitions, while a post-map step inserts the error detecting carry chains and the convergence logic required to reduce the number of flag signals.
- This postmap modification is implemented by modifying the XDL file (i.e., the Xilinx interface for interacting with the Xilinx CAD flow).
- The tool flow has been developed as a C++based software environment making heavy use of boost library and Tools for Open Source Reconfiguration (TORC).
4.1 Net-list Extraction
- The flow starts by parsing the net-list description of the circuit implemented into the dynamic region, which was duplicated at the Hardware Description Level (HDL).
- It is important that both instances of the design should be labeled with "inst1" and "inst2" so that each synthesized element contains the hierarchical information of the top level instance to which it belongs.
- Global reset/clock signals are not duplicated at the module-level, as it will be explained in Section 4.2.
- The postsynthesis Verilog file contains the circuit net-list using the Xilinx primitive cell library elements.
- In details, each node of the graph corresponds to a data structure with a number of fields including: functional string, instance name, inputs vector, outputs vector and type of primitive element (LUT or FF).
4.2 DUT Regions Formation and Constraints Generation
- Once the circuit net-list is created in the form of a graph, it is necessary to generate user constraints, represented within the User Constraints File (UCF) in order to perform the DUT physical space division into regions and for packing the primitive cells into slices.
- Thirdly, LUTs with 6 inputs are grouped to form single bit error detection regions.
- For this reason the global clock and reset signals were not duplicated due to the architectural limitation of state-of-the-art FPGA devices.
- Slices in the single bit region use names like "SBESlice1", "SBESlice2" and so on.
- The algorithm illustrated in figure 6 performs the generation of the constraints used for the floorplanning of the circuit including the mapping of the SBE and MBE regions.
4.3 Low-level Manipulations
- Once the mapping is performed, the insertion of the carry chain and the definition of the comparator resources are implemented by modifying the physical place and route description of the circuit in order to properly use the hardwired combinational gates.
- Each inserted carry chain is labeled with a unique reference to differentiate it with respect to the ones used for arithmetic computation.
- It is also interesting to note that for each OR LUT an automatic procedure searches for an empty slice in the same CLB column and picks up the nearest one in terms of the slice site distance for the OR LUT placement.
- The single bit error region flags are converged resulting in error detection carry chains of varying lengths.
- Therefore, the placement should be such that an optimal balance between the usage of OR LUTs for flag convergence and the routing congestion is achieved.
5 EXPERIMENTAL RESULTS
- The authors implemented the proposed method targeting a Xilinx Virtex-5 LX110T SRAM-based FPGA.
- Based on an ad-hoc hardware unit), the authors adopted the Microblaze processor since it represents a state-of-the-art solution for a dynamically and partially reconfigurable system based on static and dynamic regions [4] .
- Moreover, another GPIO port connected to the flags stemming from the DUT region and configured in interrupt mode is responsible for informing the Microblaze in case of errors.
- The bit-stream for the C-DWC region is stored as a partial bit-stream by reading it with the ICAP from the start address to the end address.
- In the following sections, the authors present several results mainly related to the ability of quick error detection, localization and repairing.
5.1 Area Overhead
- The circuits include some relevant ITC'99 benchmark circuits with various complexity, two implementations of the CORDIC arithmetic processor, a miniMIPS processor, a lightweight 8080 SoC, an RS-Decoder and a DCT core from the opencores repository [24] [25] .
- Please note that the authors did not include the amount of resources related to the static region within the area count since the static region remains the same in any Dynamically Reconfigurable system, no matter the adopted solution.
- If compared with DMR, their approach requires 10% more resources on the average; however, DMR cannot correct errors, while their approach corrects errors and reduces the probability of single points of failure thanks to the developed fine-grain combinational logic infrastructure.
- The authors underline that the area comparison has been performed directly on the basis of LUTs and FFs counts; if comparison is made considering the number of FPGA slices, the ratio may by slightly different due to stringent packing and placement requirements adopted for the fine-grain redundancy with comparison logic.
- In particular, slices are used as a route-through and FFs may be placed in separate slices, since the FFs require different control signals that could not be packed together with LUTs.
5.2 Error Detection Latency
- The measurement of the error detection latency is the key factor for making a proper self-repairing system able to autonomously repair itself obeying to real-time constraints.
- The results the authors obtained are illustrated in Table II , where it is shown the maximum error detection latency for SBE and MBE regions.
- In detail, the table reports the length of the carry chain detector, the delay latency with routing and logic contributions of the SBE region, as well as the distance from the detector and the delay latency for the MBE region.
- It is notable that the SBE region latency is larger than for the MBE region because all the carry chains in each CLB that resides in the same column have been connected in a unique CLB column.
- Two alternatives have been used in order to reduce the routing delay time.
5.3 Error Correction and Detection
- The effectiveness of the proposed approach concerning the error correction and detection capabilities have been evaluated through the execution of a number of fault injection campaigns.
- The experiments have been performed on the Xilinx Virtex-5 LX110T SRAM-based FPGAs by injecting transient faults into the FPGA's configuration memory and evaluating the circuit's response through the execution of circuit specific workloads.
- Please note that the faulty bitstreams are generated by corrupting the FPGA's configuration memory bits belonging to the dynamic region, while the static region was kept fault free.
- Table III shows the fault injection results, where for each circuit 10,000 Single Event Upsets (SEUs) have been randomly injected into the whole FPGA configuration memory bits related to the reconfigurable region.
- All the circuits have been emulated at 50 MHz and SEUs are practically injected by downloading the corrupted bitstreams into the FPGA configuration memory.
Table III. Fault injection campaign experimental results
- In details, the Wrong Answer reports the number of SEUs and MEUs provoking a wrong answer on the circuit outputs; the Corrected column reports the number of SEUs and MEUs properly corrected by their approach.
- Please note that the MEU effect considered in their experiments always occurs in different slice columns involving the modification of two configuration memory bits.
- Fectiveness of their approach, which is able to correct more than 98% of the injected errors provoking wrong answers for all the considered circuits.
- The authors also measured the recovery time; in Table IV they reported the worst recovery time measured for all the circuits during the execution of the fault injection campaigns.
- The authors also computed the recovery time required by the redundancy approaches, such as TMR and DMR, using active configuration memory scrubbing of all the reconfigurable region area, which is about 1.2 ms; their approach shows an improvement of more than one order of magnitude, and the advantage provided by their approach is extremely large on all the considered circuits.
5.4 Timing Analysis
- Finally, the authors evaluated the impact on the circuit maximal working frequency on all the benchmark circuits comparing their approach with the DMR and TMR redundancy based techniques.
- In order to elaborate the timing data the authors used the static timing analysis tool provided by the Xilinx ISE environment.
- This phenomenon is due to the unconventional block placement of logic resources on slice columns for different circuit regions.
- This aspect affects the timing of the circuit because their technique does not include an optimal floorplan implementation of the different circuit regions.
- In figure 8 , the authors illustrated the obtained results showing the percentage contribution of each design phase constraints on the overall circuit delay: LUT blocks, SBE region, MBE region and Detectors.
Did you find this useful? Give us your feedback
Citations
20 citations
Cites background or methods from "An error-detection and self-repairi..."
...This resilience is expected, as fine-grained redundancy mechanisms are expected to handle multiple faults efficiently [16]–[18]....
[...]
...In [18], a technique similar to that in [17] is used to detect errors....
[...]
18 citations
14 citations
14 citations
14 citations
References
243 citations
"An error-detection and self-repairi..." refers background in this paper
...standing a single fault per voting group [13], on the other...
[...]
108 citations
"An error-detection and self-repairi..." refers methods in this paper
...A previous approach based on fine-granularity error masking has been developed in [20]; however, such solution can only be applied to a TMR technique with a majority voter logic scheme....
[...]
103 citations
91 citations
"An error-detection and self-repairi..." refers background or methods in this paper
...The increased probability of MEUs hitting the configuration memory of an FPGA can limit the effectiveness of traditional redundancy-based fault-tolerance approaches [3]....
[...]
...Our approach allows resilience to MEUs, since we adopt a static region protected with a fine-grain redundancy approach, as described by [3]....
[...]
77 citations
"An error-detection and self-repairi..." refers methods in this paper
...A Previous approach based on fine-granularity error masking have been developed in [16], such solution however is only...
[...]