scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Computer aided design of fault-tolerant application specific programmable processors

01 Nov 2000-IEEE Transactions on Computers (IEEE Computer Society)-Vol. 49, Iss: 11, pp 1272-1284
TL;DR: Two low-cost approaches to graceful degradation-based permanent fault tolerance of ASPPs are presented and the effectiveness of the overall approach, the synthesis algorithms, and software implementations on a number of industrial-strength designs are demonstrated.
Abstract: Application Specific Programmable Processors (ASPP) provide efficient implementation for any of m specified functionalities. Due to their flexibility and convenient performance-cost trade-offs, ASPPs are being developed by DSP, video, multimedia, and embedded lC manufacturers. In this paper, we present two low-cost approaches to graceful degradation-based permanent fault tolerance of ASPPs. ASPP fault tolerance constraints are incorporated during scheduling, allocation, and assignment phases of behavioral synthesis: Graceful degradation is supported by implementing multiple schedules of the ASPP applications, each with a different throughput constraint. In this paper, we do not consider concurrent error detection. The first ASPP fault tolerance technique minimizes the hardware resources while guaranteeing that the ASPP remains operational in the presence of all k-unit faults. On the other hand, the second fault tolerance technique maximizes the ASPP fault tolerance subject to constraints on the hardware resources. These ASPP fault tolerance techniques impose several unique tasks, such as fault-tolerant scheduling, hardware allocation, and application-to-faulty-unit assignment. We address each of them and demonstrate the effectiveness of the overall approach, the synthesis algorithms, and software implementations on a number of industrial-strength designs.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Proceedings ArticleDOI
14 Apr 2010
TL;DR: This paper describes a purely software-based approach to handle permanent faults in the data path of a statically scheduled superscalar processor architecture; e.g. a VLIW processor.
Abstract: This paper describes a purely software-based approach to handle permanent faults in the data path of a statically scheduled superscalar processor architecture; e.g. a VLIW processor. This approach does not need any hardware in the processor core itself to reconfigure the data path. Rather the reconfiguration is done in software by modifying the binary code in the program memory of the processor. By this, the usage of a faulty component in the data path can be avoided during the execution of the program. The modification of the program is carried out by a repair routine that is executed by the faulty processor itself. It is shown that this is possible even if there is a fault in the data path. The modification of the binary code takes place immediately after a software-based self-test. Both, the self-test and the modification are carried out immediately after the start-up of the system. Thus, the detection and repair of a fault can happen in the field. Furthermore, the compiler must generate the program in a special way in order to guarantee that the binary code can be modified under all specified fault situations. A simple scheduling algorithm that produces such fault-tolerant binary code is presented, too.

16 citations


Cites background from "Computer aided design of fault-tole..."

  • ...In [7, 8] several schedules, one for each possible fault situation, are pre-computed off-line and kept in the program memory....

    [...]

Proceedings ArticleDOI
01 Sep 2007
TL;DR: This paper proposes a new idea for built-in-self-repair of application specific VLIW processors, which relies on a special kind of triple modular redundancy, which it calls Reduced Triple Modular Redundancy (RTMR).
Abstract: In this paper we propose a new idea for built-in-self-repair of application specific VLIW processors, which relies on a special kind of triple modular redundancy, which we call Reduced Triple Modular Redundancy (RTMR). The key idea is to employ the redundancy of operators in the data path of a VLIW processor. I.e., every operation is executed twice by two different operators during normal program execution. Only in case a mismatch between both computed results occurs, the operation is executed by a third operator. Therefore, during most of the execution time, the third operator can be used for executing regular operations of the program. We propose modifications of the VLIW architecture in order to detect a mismatch in computed results. Necessary program transformations are introduced, in order to obtain an internal representation for fault tolerant programs that can be scheduled to the proposed VLIW architecture. Furthermore, we propose the program execution model that is used in case a permanent fault in the data path has been detected and give some preliminary results.

12 citations

Proceedings ArticleDOI
03 Oct 2011
TL;DR: This paper describes a fine-grained software-based self-repair method for statically scheduled super scalar processors, and how the scheduling algorithm bypasses these failure points such that the affected components can be used partially, even if they contain some permanent faults.
Abstract: This paper describes a fine-grained software-based self-repair method for statically scheduled super scalar processors. An important property of this processor type is that for each operation of the executed program it is known in advance, which resources of the processor will be used by that operation. A scheduling algorithm is introduced that employs this knowledge in order to rearrange the operations in a VLIW program in the field in such a way that components with permanent faults are no longer used. It is explained, how the scheduling algorithm bypasses these failure points such that the affected components can be used partially, even if they contain some permanent faults. The fine-grained self-repair approach is compared with state-of-the art coarse-grained approaches. It turns out that the number of systems that are still running after injecting 10 faults is about 80%, while less than 1% of these systems will survive if a coarse-grained approach is used.

10 citations

Proceedings ArticleDOI
08 Mar 2010
TL;DR: This paper describes a hardware-/software-based technique to make the data path of a statically scheduled super scalar processor fault tolerant and shows that for medium and large scaled data paths this extension provides an up to 98% better reliability than triple modular redundancy.
Abstract: This paper describes a hardware-/software-based technique to make the data path of a statically scheduled super scalar processor fault tolerant. The results of concurrently executed operations can be compared with little hardware overhead in order to detect a transient or permanent fault. Furthermore, the hardware extension allows to recover from a fault within one to two clock cycles and to distinguish between transient and permanent faults. If a permanent fault was detected, this fault is masked for the rest of the program execution such that no further time is needed for recovering from that fault. The proposed extensions were implemented in the data path of a simple VLIW processor in order to prove the feasibility and to determine the hardware overhead. Finally a reliability analysis is presented. It shows that for medium and large scaled data paths our extension provides an up to 98% better reliability than triple modular redundancy.

10 citations


Cites background or methods from "Computer aided design of fault-tole..."

  • ...In [8] an other precomputed static schedule is used that avoids the usage of the faulty component....

    [...]

  • ...Some other work on masking a detected permanent fault in the data path of a statically scheduled processor has been published in [10], [2] and [8]....

    [...]

References
More filters
Book
01 Jan 1979
TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.
Abstract: This is the second edition of a quarterly column the purpose of which is to provide a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’’ W. H. Freeman & Co., San Francisco, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed. Readers having results they would like mentioned (NP-hardness, PSPACE-hardness, polynomial-time-solvability, etc.), or open problems they would like publicized, should send them to David S. Johnson, Room 2C355, Bell Laboratories, Murray Hill, NJ 07974, including details, or at least sketches, of any new proofs (full papers are preferred). In the case of unpublished results, please state explicitly that you would like the results mentioned in the column. Comments and corrections are also welcome. For more details on the nature of the column and the form of desired submissions, see the December 1981 issue of this journal.

40,020 citations

Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations


"Computer aided design of fault-tole..." refers methods in this paper

  • ...[17] introduced a method which explores trade-offs between performance and yield....

    [...]

Book
01 Dec 1989
TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.
Abstract: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today. In this edition, the authors bring their trademark method of quantitative analysis not only to high-performance desktop machine design, but also to the design of embedded and server systems. They have illustrated their principles with designs from all three of these domains, including examples from consumer electronics, multimedia and Web technologies, and high-performance computing.

11,671 citations


"Computer aided design of fault-tole..." refers background in this paper

  • ...The one-unit and two-unit fault-tolerant ASPPs have significantly superior hardware utilization characteristics when compared to the less than 5 percent utilization of general purpose processors [ 28 ]....

    [...]

  • ...ASPP designs. Programmable Controllers [ 28 ] often bring a somewhat large implementation area overhead and a limited degradation in performance....

    [...]

Book
01 Jan 1990
TL;DR: The new edition of Breuer-Friedman's Diagnosis and Reliable Design ofDigital Systems offers comprehensive and state-ofthe-art treatment of both testing and testable design.
Abstract: For many years, Breuer-Friedman's Diagnosis and Reliable Design ofDigital Systems was the most widely used textbook in digital system testing and testable design. Now, Computer Science Press makes available a new and greativ expanded edition. Incorporating a significant amount of new material related to recently developed technologies, the new edition offers comprehensive and state-ofthe-art treatment of both testing and testable design.

2,758 citations


"Computer aided design of fault-tole..." refers methods in this paper

  • ...We assume a widely used single stuck-at fault model [ 1 ]....

    [...]