scispace - formally typeset
Search or ask a question
Proceedings Article•DOI•

Heterogeneous built-in resiliency of application specific programmable processors

TL;DR: A new approach for permanent fault-tolerance: Heterogeneous Built-In-Resiliency (HBIR) is developed and the effectiveness of the overall approach, the synthesis algorithms, and software implementations on a number of designs are demonstrated.
Abstract: Using the flexibility provided by multiple functionalities we have developed a new approach for permanent fault-tolerance: Heterogeneous Built-In-Resiliency (HBIR). HBIR processor synthesis imposes several unique tasks on the synthesis process: (i) latency determination targeting k-unit fault-tolerance, (ii) application-to-faulty-unit matching and (iii) HBIR scheduling and assignment algorithms. We address each of them and demonstrate the effectiveness of the overall approach, the synthesis algorithms, and software implementations on a number of designs.

Content maybe subject to copyright    Report

Citations
More filters
Journal Article•DOI•
TL;DR: The method utilizes the register-transfer level (RTL) circuit description of an ASPP or ASIP to come up with a set of test microcode patterns which can be written into the instruction read-only memory (ROM) of the processor.
Abstract: In this paper, we present design for testability (DFT) and hierarchical test generation techniques for facilitating the testing of application-specific programmable processors (ASPPs) and application-specific instruction processors (ASIPs). The method utilizes the register-transfer level (RTL) circuit description of an ASPP or ASIP to come up with a set of test microcode patterns which can be written into the instruction read-only memory (ROM) of the processor. These lines of microcode dictate a new control/data flow in the circuit and can be used to test modules which are not easily testable. The new control/data flow is used to justify precomputed test sets of a module from the system primary inputs to the module inputs and propagate output responses from the module output to the system primary outputs. The testability analysis, which is based on the relevant control/data flow extracted from the RTL circuit, is symbolic. Thus, it is independent of the bit-width of the data path and is extremely fast. The test microcode patterns are a by-product of this analysis. If the derived test microcode cannot test all untested modules in the circuit, then test multiplexers are added (usually to the off-critical paths of the data path) to test these modules. This is done to guarantee the testability of all modules in the circuit. If the control microcode memory of the processor is erasable, then the test microcode lines can be erased once the testing of the chip is over. In that case, the DFT scheme has very little overhead (typically less than 1%). Otherwise, the test microcode lines remain as an overhead in the control memory. The method requires the addition of only one external test pin. Application of this technique to several examples has resulted in a very high fault coverage (above 99.6%) for all of them. The test generation time is about three orders of magnitude smaller compared to an efficient gate-level sequential test generator. The average area overhead (without assuming an erasable ROM) is 3.1% while the delay overheads are negligible. This method does not require any scan in the controller or data path. It is also amenable to at-speed testing.

39 citations

Journal Article•DOI•
TL;DR: Two low-cost approaches to graceful degradation-based permanent fault tolerance of ASPPs are presented and the effectiveness of the overall approach, the synthesis algorithms, and software implementations on a number of industrial-strength designs are demonstrated.
Abstract: Application Specific Programmable Processors (ASPP) provide efficient implementation for any of m specified functionalities. Due to their flexibility and convenient performance-cost trade-offs, ASPPs are being developed by DSP, video, multimedia, and embedded lC manufacturers. In this paper, we present two low-cost approaches to graceful degradation-based permanent fault tolerance of ASPPs. ASPP fault tolerance constraints are incorporated during scheduling, allocation, and assignment phases of behavioral synthesis: Graceful degradation is supported by implementing multiple schedules of the ASPP applications, each with a different throughput constraint. In this paper, we do not consider concurrent error detection. The first ASPP fault tolerance technique minimizes the hardware resources while guaranteeing that the ASPP remains operational in the presence of all k-unit faults. On the other hand, the second fault tolerance technique maximizes the ASPP fault tolerance subject to constraints on the hardware resources. These ASPP fault tolerance techniques impose several unique tasks, such as fault-tolerant scheduling, hardware allocation, and application-to-faulty-unit assignment. We address each of them and demonstrate the effectiveness of the overall approach, the synthesis algorithms, and software implementations on a number of industrial-strength designs.

24 citations

Journal Article•DOI•
TL;DR: Techniques to incorporate micropreemption constraints during multitask VLSI system synthesis are presented and algorithms to insert and refine preemption points in scheduled task graphs subject to preemption latency constraints are presented.
Abstract: Task preemption is a critical enabling mechanism in multitask very large scale integration (VLSI) systems. On preemption, data in the register files must be preserved for the task to be resumed. This entails extra memory to preserve the context and additional clock cycles to save and restore the context. In this paper, techniques and algorithms to incorporate micropreemption constraints during multitask VLSI system synthesis are presented. Specifically, algorithms to insert and refine preemption points in scheduled task graphs subject to preemption latency constraints, techniques to minimize the context switch overhead by considering the dedicated registers required to save the state of a task on preemption and the shared registers required to save the remaining values in the tasks, and a controller-based scheme to preclude the preemption-related performance degradation by: 1) partitioning the states of a task into critical sections; 2) executing the critical sections atomically; and 3) preserving atomicity by rolling forward to the end of the critical sections on preemption have been developed. The effectiveness of all approaches, algorithms, and software implementations is demonstrated on real examples. Validation of all the results is complete in the sense that functional simulation is conducted to complete layout implementation.

10 citations

Proceedings Article•DOI•
01 May 1998
TL;DR: A framework that makes it possible for a designer to rapidly explore the application-specific programmable processor design space under area constraints is reported, which can be valuable in making early design decisions such as area and architectural trade-offs, cache and instruction issue width trade-off under area constraint, and the number of branch units and issue width.
Abstract: In this paper we report a framework that makes it possible for a designer to rapidly explore the application-specific programmable processor design space under area constraints. The framework uses a production-quality compiler and simulation tools to synthesize a high performance machine for an application. Using the framework we evaluate the validity of the fundamental assumption behind the development of application-specific programmable processors. Application-specific processors are based on the idea that applications differ from each other in key architectural parameters, such as the available instruction-level parallelism, demand on various hardware components (e.g. cache memory units, register files) and the need for different number of functional units. We found that the framework introduced in this paper can be valuable in making early design decisions such as area and architectural trade-off, cache and instruction issue width trade-off under area constraint, and the number of branch units and issue width.

9 citations

Proceedings Article•DOI•
13 Nov 1997
TL;DR: In this paper, the authors present techniques and algorithms to incorporate micro-preemption constraints during multi-task VLSI system synthesis, and propose a controller based scheme to preclude preemption related performance degradation.
Abstract: Task preemption is a critical enabling mechanism in multi-task VLSI systems. On preemption, data in the register files must be preserved in order for the task to be resumed. This entails extra memory to preserve the context and additional clock cycles to save and restore the context. In this paper, we present techniques and algorithms to incorporate micro-preemption constraints during multi-task VLSI system synthesis. Specifically, we have developed: (i) Algorithms to insert and refine preemption points in scheduled task graphs subject to preemption latency constraints. (ii) Techniques to minimize the context switch overhead by considering the dedicated registers required to save the state of a task on preemption and the shared registers required to save the remaining values in the tasks. (iii) A controller based scheme to preclude preemption related performance degradation. The effectiveness of all approaches, algorithms, and software implementations is demonstrated on real examples.

3 citations

References
More filters
Journal Article•DOI•
TL;DR: The technology for laser welding and cutting, the design methodology and CAD tools developed for wafer-scale integration, and the integrator itself are described.
Abstract: Wafer-scale integration has been demonstrated by fabricating a digital integrator on a monolithic 20-cm2silicon chip, the first laser-restructured digital logic system. Large-area integration is accomplished by laser programming of metal interconnect for defect avoidance. This paper describes the technology for laser welding and cutting, the design methodology and CAD tools developed for wafer-scale integration, and the integrator itself.

42 citations

Proceedings Article•DOI•
27 Oct 1993
TL;DR: It is shown that in ASIC designs it is possible to enable replacement of modules of different types with the same spare units by exploiting the flexibility of high level synthesis solutions, using a novel statistical methodology for heuristic algorithm development and improvement.
Abstract: Built-in-self-repair (BISR) is a hardware redundancy fault tolerance technique, where a set of spare modules is provided in addition to core operational modules Until now, the application of BISR methodology has been limited to situations where a failed module of one type can only be replaced by a backup module of the same type It is shown that in ASIC designs it is possible to enable replacement of modules of different types with the same spare units by exploiting the flexibility of high level synthesis solutions Resource allocation, assignment and scheduling techniques that support a new BISR methodology are presented All mentioned high level synthesis algorithms are developed on top of the HYPER high level synthesis system, using a novel statistical methodology for heuristic algorithm development and improvement The effectiveness of the approach is verified and yield improvement data is presented for numerous real-life examples

24 citations

Proceedings Article•DOI•
01 Dec 1995
TL;DR: In this article, an area-efficient technique for fabrication-time reconfigurability is presented, which adds extra interconnects to render the resulting microarchitecture reconfigurable in the presence of any functional unit failure.
Abstract: Phantom redundancy, an area-efficient technique for fabrication-time reconfigurability is presented. Phantom redundancy adds extra interconnect so as to render the resulting microarchitecture reconfigurable in the presence of any (single) functional unit failure. The proposed technique yields partially good chips in addition to perfect chips. A genetic algorithm is used to incorporate phantom redundancy constraints into microarchitecture synthesis. The algorithm minimizes tire performance degradation due to any faulty functional unit of the resulting microarchitecture. The effectiveness of the technique is illustrated on benchmark examples.

20 citations


"Heterogeneous built-in resiliency o..." refers methods in this paper

  • ...Recently Iyer et al [5] introduced a method which explores trade-o s between performance and yield....

    [...]

Proceedings Article•DOI•
27 Jun 1995
TL;DR: Efficient algorithms that provide provably optimal solutions for the problem of automatic insertion of recovery points in recoverable microarchitectures are presented.
Abstract: The paper considers the problem of automatic insertion of recovery points in recoverable microarchitectures. Previous work on this problem provided heuristic algorithms that attempted either to minimize computation time with a bounded hardware overhead or to minimize hardware overhead with a bounded computation time. We present efficient algorithms that provide provably optimal solutions for both of these formulations of the problem. These algorithms take as their input a scheduled control-data flow graph describing the behavior of the system and they output either a minimum-time or a minimum-cost set of recovery point locations. We demonstrate the performance of our algorithms using some well-known benchmark control-data flow graphs. Over all parameter values for each of these benchmarks, our optimal algorithms are shown to perform as well as, and in many cases better than, the previously proposed heuristics. >

17 citations


"Heterogeneous built-in resiliency o..." refers background in this paper

  • ...More recently, Blough, et. al. [ 2 ] presented an algorithm for recovery point insertion in recoverable microarchitectures....

    [...]