scispace - formally typeset
Search or ask a question
Proceedings Article•DOI•

Heterogeneous built-in resiliency of application specific programmable processors

TL;DR: A new approach for permanent fault-tolerance: Heterogeneous Built-In-Resiliency (HBIR) is developed and the effectiveness of the overall approach, the synthesis algorithms, and software implementations on a number of designs are demonstrated.
Abstract: Using the flexibility provided by multiple functionalities we have developed a new approach for permanent fault-tolerance: Heterogeneous Built-In-Resiliency (HBIR). HBIR processor synthesis imposes several unique tasks on the synthesis process: (i) latency determination targeting k-unit fault-tolerance, (ii) application-to-faulty-unit matching and (iii) HBIR scheduling and assignment algorithms. We address each of them and demonstrate the effectiveness of the overall approach, the synthesis algorithms, and software implementations on a number of designs.

Content maybe subject to copyright    Report

Citations
More filters
Journal Article•DOI•
TL;DR: The method utilizes the register-transfer level (RTL) circuit description of an ASPP or ASIP to come up with a set of test microcode patterns which can be written into the instruction read-only memory (ROM) of the processor.
Abstract: In this paper, we present design for testability (DFT) and hierarchical test generation techniques for facilitating the testing of application-specific programmable processors (ASPPs) and application-specific instruction processors (ASIPs). The method utilizes the register-transfer level (RTL) circuit description of an ASPP or ASIP to come up with a set of test microcode patterns which can be written into the instruction read-only memory (ROM) of the processor. These lines of microcode dictate a new control/data flow in the circuit and can be used to test modules which are not easily testable. The new control/data flow is used to justify precomputed test sets of a module from the system primary inputs to the module inputs and propagate output responses from the module output to the system primary outputs. The testability analysis, which is based on the relevant control/data flow extracted from the RTL circuit, is symbolic. Thus, it is independent of the bit-width of the data path and is extremely fast. The test microcode patterns are a by-product of this analysis. If the derived test microcode cannot test all untested modules in the circuit, then test multiplexers are added (usually to the off-critical paths of the data path) to test these modules. This is done to guarantee the testability of all modules in the circuit. If the control microcode memory of the processor is erasable, then the test microcode lines can be erased once the testing of the chip is over. In that case, the DFT scheme has very little overhead (typically less than 1%). Otherwise, the test microcode lines remain as an overhead in the control memory. The method requires the addition of only one external test pin. Application of this technique to several examples has resulted in a very high fault coverage (above 99.6%) for all of them. The test generation time is about three orders of magnitude smaller compared to an efficient gate-level sequential test generator. The average area overhead (without assuming an erasable ROM) is 3.1% while the delay overheads are negligible. This method does not require any scan in the controller or data path. It is also amenable to at-speed testing.

39 citations

Journal Article•DOI•
TL;DR: Two low-cost approaches to graceful degradation-based permanent fault tolerance of ASPPs are presented and the effectiveness of the overall approach, the synthesis algorithms, and software implementations on a number of industrial-strength designs are demonstrated.
Abstract: Application Specific Programmable Processors (ASPP) provide efficient implementation for any of m specified functionalities. Due to their flexibility and convenient performance-cost trade-offs, ASPPs are being developed by DSP, video, multimedia, and embedded lC manufacturers. In this paper, we present two low-cost approaches to graceful degradation-based permanent fault tolerance of ASPPs. ASPP fault tolerance constraints are incorporated during scheduling, allocation, and assignment phases of behavioral synthesis: Graceful degradation is supported by implementing multiple schedules of the ASPP applications, each with a different throughput constraint. In this paper, we do not consider concurrent error detection. The first ASPP fault tolerance technique minimizes the hardware resources while guaranteeing that the ASPP remains operational in the presence of all k-unit faults. On the other hand, the second fault tolerance technique maximizes the ASPP fault tolerance subject to constraints on the hardware resources. These ASPP fault tolerance techniques impose several unique tasks, such as fault-tolerant scheduling, hardware allocation, and application-to-faulty-unit assignment. We address each of them and demonstrate the effectiveness of the overall approach, the synthesis algorithms, and software implementations on a number of industrial-strength designs.

24 citations

Journal Article•DOI•
TL;DR: Techniques to incorporate micropreemption constraints during multitask VLSI system synthesis are presented and algorithms to insert and refine preemption points in scheduled task graphs subject to preemption latency constraints are presented.
Abstract: Task preemption is a critical enabling mechanism in multitask very large scale integration (VLSI) systems. On preemption, data in the register files must be preserved for the task to be resumed. This entails extra memory to preserve the context and additional clock cycles to save and restore the context. In this paper, techniques and algorithms to incorporate micropreemption constraints during multitask VLSI system synthesis are presented. Specifically, algorithms to insert and refine preemption points in scheduled task graphs subject to preemption latency constraints, techniques to minimize the context switch overhead by considering the dedicated registers required to save the state of a task on preemption and the shared registers required to save the remaining values in the tasks, and a controller-based scheme to preclude the preemption-related performance degradation by: 1) partitioning the states of a task into critical sections; 2) executing the critical sections atomically; and 3) preserving atomicity by rolling forward to the end of the critical sections on preemption have been developed. The effectiveness of all approaches, algorithms, and software implementations is demonstrated on real examples. Validation of all the results is complete in the sense that functional simulation is conducted to complete layout implementation.

10 citations

Proceedings Article•DOI•
01 May 1998
TL;DR: A framework that makes it possible for a designer to rapidly explore the application-specific programmable processor design space under area constraints is reported, which can be valuable in making early design decisions such as area and architectural trade-offs, cache and instruction issue width trade-off under area constraint, and the number of branch units and issue width.
Abstract: In this paper we report a framework that makes it possible for a designer to rapidly explore the application-specific programmable processor design space under area constraints. The framework uses a production-quality compiler and simulation tools to synthesize a high performance machine for an application. Using the framework we evaluate the validity of the fundamental assumption behind the development of application-specific programmable processors. Application-specific processors are based on the idea that applications differ from each other in key architectural parameters, such as the available instruction-level parallelism, demand on various hardware components (e.g. cache memory units, register files) and the need for different number of functional units. We found that the framework introduced in this paper can be valuable in making early design decisions such as area and architectural trade-off, cache and instruction issue width trade-off under area constraint, and the number of branch units and issue width.

9 citations

Proceedings Article•DOI•
13 Nov 1997
TL;DR: In this paper, the authors present techniques and algorithms to incorporate micro-preemption constraints during multi-task VLSI system synthesis, and propose a controller based scheme to preclude preemption related performance degradation.
Abstract: Task preemption is a critical enabling mechanism in multi-task VLSI systems. On preemption, data in the register files must be preserved in order for the task to be resumed. This entails extra memory to preserve the context and additional clock cycles to save and restore the context. In this paper, we present techniques and algorithms to incorporate micro-preemption constraints during multi-task VLSI system synthesis. Specifically, we have developed: (i) Algorithms to insert and refine preemption points in scheduled task graphs subject to preemption latency constraints. (ii) Techniques to minimize the context switch overhead by considering the dedicated registers required to save the state of a task on preemption and the shared registers required to save the remaining values in the tasks. (iii) A controller based scheme to preclude preemption related performance degradation. The effectiveness of all approaches, algorithms, and software implementations is demonstrated on real examples.

3 citations

References
More filters
Journal Article•DOI•
01 Feb 1990
TL;DR: It is shown how the high-level synthesis task can be decomposed into a number of distinct but not independent subtasks.
Abstract: High-level synthesis systems start with an abstract behavioral specification of a digital system and find a register-transfer level structure that realizes the given behavior. The various tasks involved in developing a register-transfer level structure from an algorithmic level specification are described. In particular, it is shown how the high-level synthesis task can be decomposed into a number of distinct but not independent subtasks. The techniques that have been developed for solving those subtasks are presented. Areas related to high-level synthesis that are still open problems are examined. >

639 citations


"Heterogeneous built-in resiliency o..." refers background in this paper

  • ...Behavioral synthesis has been an active area of research for more than two decades [3, 81, and numerous outstanding systems have been built targeting both data path oriented and control oriented applications [15, 81. Behavioral synthesis traditionally has been addressing synthesis and optimization of a single CDFG for sampling rate, area, and more recently power and test hardware overhead minimization [ 8 ]....

    [...]

Journal Article•DOI•
TL;DR: The discussion covers behavioral specification, module selection, exploring the design space, transformations, scheduling and assignment, and hardware mapping of Hyper, a synthesis environment for real-time systems with datapath-intensive architectures.
Abstract: A description is given of Hyper, a synthesis environment for real-time systems with datapath-intensive architectures. Hyper uses a single, global quality measure throughout the system to drive the exploration of the design space. This approach effectively merges the allocation of hardware, the application of transformations, and the handling of hierarchy in a consistent way. Hyper's modular organization around a central database also allows new software modules to be introduced easily. The discussion covers behavioral specification, module selection, exploring the design space, transformations, scheduling and assignment, and hardware mapping. Four versions of an IIR filter generated using Hyper and Lager IV are compared. It is seen that layouts generated using Hyper are more area efficient than layouts done using the more traditional methods based on one-to-one mapping or the use of multiprocessors. >

289 citations

Journal Article•DOI•
TL;DR: An attempt was made to define the algorithmic level of design and to provide the designer with the means to explore various design issues within the framework of the System Architect's Workbench.
Abstract: An attempt was made to define the algorithmic level of design (also known as the behavioral level) and to provide the designer with the means to explore various design issues. Within the framework of the System Architect's Workbench, a new set of behavioral and structural transformations was developed to allow the interactive exploration of algorithmic-level design alternatives. A description is given of these transformations, and a set of examples is presented both to demonstrate the application of the transformations and to further illustrate their effects. >

89 citations


"Heterogeneous built-in resiliency o..." refers background in this paper

  • ...Behavioral synthesis has been an active area of research for more than two decades [3, 8], and numerous outstanding systems have been built targeting both data path oriented and control oriented applications [15, 8]....

    [...]

Journal Article•DOI•
TL;DR: An integrated system for synthesizing self-recovering microarchitectures called /spl SscR//spl Yscr//spl Nscr //spl Cscr //spl Escr / incorporates detection constraints by ensuring that two copies of the computation are executed on disjoint hardware.
Abstract: We describe an integrated system for synthesizing self-recovering microarchitectures called /spl Sscr//spl Yscr//spl Nscr//spl Cscr//spl Escr//spl Rscr//spl Escr/ in the /spl Sscr//spl Yscr//spl Nscr//spl Cscr//spl Escr//spl Rscr//spl Escr/ model for self-recovery, transient faults are detected using duplication and comparison, while recovery from transient faults is accomplished via checkpointing and rollback. /spl Sscr//spl Yscr//spl Nscr//spl Cscr//spl Escr//spl Rscr//spl Escr/ initially inserts checkpoints subject to designer specified recovery time constraints. Subsequently, /spl Sscr//spl Yscr//spl Nscr//spl Cscr//spl Escr//spl Rscr//spl Escr/ incorporates detection constraints by ensuring that two copies of the computation are executed on disjoint hardware. Towards ameliorating the dedicated hardware required for the original and duplicate computations, /spl Sscr//spl Yscr//spl Nscr//spl Cscr//spl Escr//spl Rscr//spl Escr/ imposes intercopy hardware disjointness at a sub-computation level instead of at the overall computation level. The overhead is further moderated by restructuring the pliable input representation of the computation. /spl Sscr//spl Yscr//spl Nscr//spl Cscr//spl Escr//spl Rscr//spl Escr/ has successfully derived numerous self-recovering microarchitectures. Towards validating the methodology for designing fault-tolerant VLSI ICs, we carried out a physical design of a self-recovering 16-point FIR filter.

58 citations


Additional excerpts

  • ...Karri and Orailoglu [9] presented scheduling, assignment...

    [...]

Journal Article•DOI•
TL;DR: An Integer Linear Programming model for the self-recovering microarchitecture synthesis problem is presented and the resulting ILP formulation can minimize either the number of voters or the overall hardware, subject to constraints on the numbers of clock cycles the retry period, and thenumber of checkpoints.
Abstract: The growing trend towards VLSI implementation of crucial tasks in critical applications has increased both the demand for and the scope of fault-tolerant VLSI systems. In this paper, we present a self-recovering microarchitecture synthesis system. In a self-recovering microarchitecture, intermediate results are compared at regular intervals, and if correct saved in registers (checkpointing). On the other hand, on detecting a fault, the self-recovering microarchitecture rolls back to a previous checkpoint and retries. The proposed synthesis system comprises of a heuristic and an optimal subsystem. The heuristic synthesis subsystem has two components. Whereas the checkpoint insertion algorithm identifies good checkpoints by successively eliminating clock cycle boundaries that either have a high checkpoint overhead or violate the retry period constraint, the novel edge-based schedule, assigns edges to clock cycle boundaries, in addition to scheduling nodes to clock cycles. Also, checkpoint insertion and edge-based scheduling are intertwined using a flexible synthesis methodology. We additionally show an Integer Linear Programming model for the self-recovering microarchitecture synthesis problem. The resulting ILP formulation can minimize either the number of voters or the overall hardware, subject to constraints on the number of clock cycles the retry period, and the number of checkpoints. >

44 citations


"Heterogeneous built-in resiliency o..." refers background in this paper

  • ...Karri and Orailoglu [ 9 ] presented scheduling, assignment...

    [...]