Showing papers in &quot;IEEE Transactions on Computers in 1995&quot;

Effective hardware-based data prefetching for high-performance processors

TL;DR: A novel method for tolerating up to two disk failures in RAID architectures based on Reed-Solomon error-correcting codes, which can be used in any system requiring large symbols and relatively short codes, for instance, in multitrack magnetic recording.

...read moreread less

Abstract: We present a novel method, that we call EVENODD, for tolerating up to two disk failures in RAID architectures. EVENODD employs the addition of only two redundant disks and consists of simple exclusive-OR computations. This redundant storage is optimal, in the sense that two failed disks cannot be retrieved with less than two redundant disks. A major advantage of EVENODD is that it only requires parity hardware, which is typically present in standard RAID-5 controllers. Hence, EVENODD can be implemented on standard RAID-5 controllers without any hardware changes. The most commonly used scheme that employes optimal redundant storage (i.e., two extra disks) is based on Reed-Solomon (RS) error-correcting codes. This scheme requires computation over finite fields and results in a more complex implementation. For example, we show that the complexity of implementing EVENODD in a disk array with 15 disks is about 50% of the one required when using the RS scheme. The new scheme is not limited to RAID architectures: it can be used in any system requiring large symbols and relatively short codes, for instance, in multitrack magnetic recording. To this end, we also present a decoding algorithm for one column (track) in error. >

...read moreread less

745 citations

Journal Article•DOI•

[...]

Tien-Fu Chen¹, Jean-Loup Baer²•Institutions (2)

National Chung Cheng University¹, University of Washington²

01 May 1995-IEEE Transactions on Computers

TL;DR: The results show that the three hardware prefetching schemes all yield significant reductions in the data access penalty when compared with regular caches, the benefits are greater when the hardware assist augments small on-chip caches, and the lookahead scheme is the preferred one cost-performance wise.

...read moreread less

Abstract: Memory latency and bandwidth are progressing at a much slower pace than processor performance. In this paper, we describe and evaluate the performance of three variations of a hardware function unit whose goal is to assist a data cache in prefetching data accesses so that memory latency is hidden as often as possible. The basic idea of the prefetching scheme is to keep track of data access patterns in a reference prediction table (RPT) organized as an instruction cache. The three designs differ mostly on the timing of the prefetching. In the simplest scheme (basic), prefetches can be generated one iteration ahead of actual use. The lookahead variation takes advantage of a lookahead program counter that ideally stays one memory latency time ahead of the real program counter and that is used as the control mechanism to generate the prefetches. Finally the correlated scheme uses a more sophisticated design to detect patterns across loop levels. These designs are evaluated by simulating the ten SPEC benchmarks on a cycle-by-cycle basis. The results show that 1) the three hardware prefetching schemes all yield significant reductions in the data access penalty when compared with regular caches, 2) the benefits are greater when the hardware assist augments small on-chip caches, and 3) the lookahead scheme is the preferred one cost-performance wise. >

...read moreread less

543 citations

Journal Article•DOI•

A dynamic priority assignment technique for streams with (m, k)-firm deadlines

[...]

M. Hamdaoui¹, Parameswaran Ramanathan²•Institutions (2)

Nortel¹, University of Wisconsin-Madison²

01 Dec 1995-IEEE Transactions on Computers

TL;DR: A priority-based policy for scheduling N such streams on a single server to reduce the probability of dynamic failure and assign higher priorities to customers from streams that are closer to a dynamic failure so as to improve their chances of meeting their deadlines.

...read moreread less

Abstract: The problem of scheduling multiple streams of real-time customers, is addressed in this paper. The paper first introduces the notion of (m, k)-firm deadlines to better characterize the timing constraints of real-time streams. More specifically, a stream is said to have (m, k)-firm deadlines if at least m out of any k consecutive customers must meet their deadlines. A stream with (m, k)-firm deadlines experiences a dynamic failure if fewer than m out of any k consecutive customers meet their deadlines. The paper then proposes a priority-based policy for scheduling N such streams on a single server to reduce the probability of dynamic failure. The basic idea is to assign higher priorities to customers from streams that are closer to a dynamic failure so as to improve their chances of meeting their deadlines. The paper proposes a heuristic for assigning these priorities. The effectiveness of this approach is evaluated through simulation under various customer arrival and service patterns. The scheme is compared to a conventional scheme where all customers are serviced at the same priority level and to an imprecise computation model approach. The evaluation shows that substantial reductions in the probability of dynamic failure are achieved when the proposed policy is used.

...read moreread less

512 citations

Journal Article•DOI•

Built-in test for circuits with scan based on reseeding of multiple-polynomial linear feedback shift registers

[...]

Sybille Hellebrand¹, Janusz Rajski², Steffen Tarnick³, S. Venkataraman⁴, Bernard Courtois - Show less +1 more•Institutions (4)

University of Siegen¹, Mentor Graphics², University of Potsdam³, University of Illinois at Urbana–Champaign⁴

The deferrable server algorithm for enhanced aperiodic responsiveness in hard real-time environments

TL;DR: A new scheme for built-in test that uses multiple-polynomial linear feedback shift registers (MP-LFSR's) and an implicit polynomial identification reduces the number of extra bits per seed to one bit is presented.

...read moreread less

Abstract: We propose a new scheme for built-in test (BIT) that uses multiple-polynomial linear feedback shift registers (MP-LFSR's). The same MP-LFSR that generates random patterns to cover easy to test faults is loaded with seeds to generate deterministic vectors for difficult to test faults. The seeds are obtained by solving systems of linear equations involving the seed variables for the positions where the test cubes have specified values. We demonstrate that MP-LFSR's produce sequences with significantly reduced probability of linear dependence compared to single polynomial LFSR's. We present a general method to determine the probability of encoding as a function of the number of specified bits in the test cube, the length of the LFSR and the number of polynomials. Theoretical analysis and experiments show that the probability of encoding a test cube with s specified bits in an s-stage LFSR with 16 polynomials is 1-10/sup -6/. We then present the new BIT scheme that allows for an efficient encoding of the entire test set. Here the seeds are grouped according to the polynomial they use and an implicit polynomial identification reduces the number of extra bits per seed to one bit. The paper also shows methods of processing the entire test set consisting of test cubes with varied number of specified bits. Experimental results show the tradeoffs between test data storage and test application time while maintaining complete fault coverage. >

...read moreread less

439 citations

Journal Article•DOI•

[...]

Jay K. Strosnider¹, John P. Lehoczky¹, Lui Sha¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 1995-IEEE Transactions on Computers

TL;DR: The Deferrable Server (DS) algorithm is introduced which will be shown to provide improved aperiodic response time performance over traditional background and polling approaches and significantly reduced response times of a periodic tasks while still maintaining guaranteed periodic task deadlines.

...read moreread less

Abstract: Most existing scheduling algorithms for hard real-time systems apply either to periodic tasks or aperiodic tasks but not to both. In practice, real-time systems require an integrated, consistent approach to scheduling that is able to simultaneously meet the timing requirements of hard deadline periodic tasks, hard deadline aperiodic (alert-class) tasks, and soft deadline aperiodic tasks. This paper introduces the Deferrable Server (DS) algorithm which will be shown to provide improved aperiodic response time performance over traditional background and polling approaches. Taking advantage of the fact that, typically, there is no benefit in early completion of the periodic tasks, the Deferrable Server (DS) algorithm assigns higher priority to the aperiodic tasks up until the point where the periodic tasks would start to miss their deadlines. Guaranteed alert-class aperiodic service and greatly reduced response times for soft deadline aperiodic tasks are important features of the DS algorithm, and both are obtained with the hard deadlines of the periodic tasks still being guaranteed. The results of a simulation study performed to evaluate the response time performance of the new algorithm against traditional background and polling approaches are presented. In all cases, the response times of aperiodic tasks are significantly reduced (often by an order of magnitude) while still maintaining guaranteed periodic task deadlines. >

...read moreread less

423 citations

Journal Article•DOI•

FERRARI: a flexible software-based fault and error injection system

[...]

G.A. Kanawati, N.A. Kanawati, Jacob A. Abraham¹•Institutions (1)

University of Texas at Austin¹

Fault-tolerant wormhole routing algorithms for mesh networks

TL;DR: The methodology and guidelines for the design of flexible software based fault and error injection are described and a tool, FERRARI, that incorporates the techniques are presented that demonstrates the effectiveness of the software-based error injection tool in evaluating the dependability properties of complex systems.

...read moreread less

Abstract: A major step toward the development of fault-tolerant computer systems is the validation of the dependability properties of these systems. Fault/error injection has been recognized as a powerful approach to validate the fault tolerance mechanisms of a system and to obtain statistics on parameters such as coverages and latencies. This paper describes the methodology and guidelines for the design of flexible software based fault and error injection and presents a tool, FERRARI, that incorporates the techniques. The techniques used to emulate transient errors and permanent faults in software are described in detail. Experimental results are presented for several error detection techniques, and they demonstrate the effectiveness of the software-based error injection tool in evaluating the dependability properties of complex systems. >

...read moreread less

370 citations

Journal Article•DOI•

[...]

Rajendra V. Boppana, Suresh Chalasani¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jul 1995-IEEE Transactions on Computers

TL;DR: It is shown that, using just one extra virtual channel per physical channel, the well known e cube algorithm can be used to provide deadlock free routing in networks with nonoverlapping fault rings and it is proved that at most four additional virtual channels are sufficient to make fully adaptive algorithms tolerant to multiple faulty blocks in n dimensional meshes.

...read moreread less

Abstract: We present simple methods to enhance the current minimal wormhole routing algorithms developed for high radix, low dimensional mesh networks for fault tolerant routing. We consider arbitrarily located faulty blocks and assume only local knowledge of faults. Messages are routed minimally when not blocked by faults and this constraint is relaxed to route around faults. The key concept we use is a fault ring consisting of fault free nodes and links can be formed around each fault region. Our fault tolerant techniques use these fault rings to route messages around fault regions. We show that, using just one extra virtual channel per physical channel, the well known e cube algorithm can be used to provide deadlock free routing in networks with nonoverlapping fault rings; there is no restriction on the number of faults. For the more complex faults with overlapping fault rings, four virtual channels are used. We also prove that at most four additional virtual channels are sufficient to make fully adaptive algorithms tolerant to multiple faulty blocks in n dimensional meshes. All these algorithms are deadlock and livelock free. Further, we present simulation results for the e cube and a fully adaptive algorithm fortified with our fault tolerant routing techniques and show that good performance may be obtained with as many as 10% links faulty. >

...read moreread less

325 citations

Journal Article•DOI•

The stochastic rendezvous network model for performance of synchronous client-server-like distributed software

[...]

Murray Woodside¹, J.E. Neilson¹, Dorina C. Petriu¹, Shikharesh Majumdar¹•Institutions (1)

Carleton University¹

01 Jan 1995-IEEE Transactions on Computers

TL;DR: Stochastic rendezvous networks are queueing networks of a new type which have been proposed as a modelling framework for these systems are extended to also incorporate different services or entries associated with each task, to give approximate performance estimates.

...read moreread less

Abstract: Distributed or parallel software with synchronous communication via rendezvous is found in client-server systems and in proposed open distributed systems, in implementation environments such as Ada, V, remote procedure call systems, in transputer systems, and in specification techniques such as CSP, CCS and LOTOS The delays induced by rendezvous can cause serious performance problems, which are not easy to estimate using conventional models which focus on hardware contention, or on a restricted view of the parallelism which ignores implementation constraints Stochastic rendezvous networks are queueing networks of a new type which have been proposed as a modelling framework for these systems They incorporate the two key phenomena of included service and a second phase of service This paper extends the model to also incorporate different services or entries associated with each task Approximations to arrival-instant probabilities are employed with a mean-value analysis framework, to give approximate performance estimates The method has been applied to moderately large industrial software systems >

...read moreread less

306 citations

Journal Article•DOI•

New strategies for assigning real-time tasks to multiprocessor systems

[...]

Almut Burchard¹, Jörg Liebeherr¹, Yingfeng Oh¹, Sang H. Son¹•Institutions (1)

Princeton University¹

01 Dec 1995-IEEE Transactions on Computers

TL;DR: New schedulability conditions are presented for homogeneous multiprocessor systems where individual processors execute the rate-monotonic scheduling algorithm and it is shown that the processors can be almost fully utilized.

...read moreread less

Abstract: Optimal scheduling of real-time tasks on multiprocessor systems is known to be computationally intractable for large task sets. Any practical scheduling algorithm for assigning real-time tasks to a multiprocessor system presents a trade-off between its computational complexity and its performance. In this study, new schedulability conditions are presented for homogeneous multiprocessor systems where individual processors execute the rate-monotonic scheduling algorithm. The conditions are used to develop new strategies for assigning real-time tasks to processors. The performance of the new strategies is shown to be significantly better than suggested by the existing literature. Under the realistic assumption that the load of each real-time task is small compared to the processing speed of each processor, it is shown that the processors can be almost fully utilized.

...read moreread less

248 citations

Journal Article•DOI•

Lee distance and topological properties of k-ary n-cubes

[...]

Bella Bose¹, Bob Broeg², Younggeun Kwon³, Y. Ashir⁴•Institutions (4)

Oregon State University¹, Western Oregon University², Korea Air Force Academy³, University of Bahrain⁴

The Montgomery inverse and its applications

TL;DR: Two single-node broadcasting algorithms are presented that are optimal when single-port and multi-port I/O is used and how Lee distance can be applied to message routing and single- node broadcasting in a Q/sub n//sup k/.

...read moreread less

Abstract: In this paper, we consider various topological properties of a k-ary n-cube (Q/sub n//sup k/) using Lee distance. We feel that Lee distance is a natural metric for defining and studying a Q/sub n//sup k/. After defining a Q/sub n//sup k/ graph using Lee distance, we show how to find all disjoint paths between any two nodes. Given a sequence of radix k numbers, a function mapping the sequence to a Gray code sequence is presented, and this function is used to generate a Hamiltonian cycle. Embedding the graph of a mesh and the graph of a binary hypercube into the graph of a Q/sub n//sup k/ is considered. Using a k-ary Gray code, we show the embedding of a k(n/sub 1/)/spl times/k(n/sub 2/)/spl times/.../spl times/k(n/sub m/)-dimensional mesh into a Q/sub n//sup k/ where n=/spl Sigma//sub i=l//sup m/n/sub i/. Then using a single digit, 4-ary reflective Gray code, we demonstrate embedding a Q/sub n/ into Q/sub [n/2]//sup 4/. We look at how Lee distance may be applied to the problem of resource placement in a Q/sub n//sup k/ by using a Lee distance error-correcting code. Although the results in this paper are only preliminary, Lee distance error-correcting codes have not been applied previously to this problem. Finally, we consider how Lee distance can be applied to message routing and single-node broadcasting in a Q/sub n//sup k/. In this section we present two single-node broadcasting algorithms that are optimal when single-port and multi-port I/O is used. >

...read moreread less

234 citations

Journal Article•DOI•

The Mobius cubes

[...]

P. Cull¹, S.M. Larson•Institutions (1)

Oregon State University¹

01 May 1995-IEEE Transactions on Computers

TL;DR: The results show that the 1-Mobius cube has dynamic performance superior to that of the hypercube, which contradicts current literature, which implies that twisted cube variants will have worse dynamic performance.

...read moreread less

Abstract: The Mobius cubes are hypercube variants that give better performance with the same number of links and processors. We show that the diameter of the Mobius cubes is about one half the diameter of the equivalent hypercube, and that the average number of steps between processors for a Mobius cube is about two-thirds of the average for a hypercube. We give an efficient routing algorithm for the Mobius cubes. This routing algorithm finds a shortest path and operates in time proportional to the dimension of the cube. We also give efficient broadcast algorithms for the Mobius cubes. We show that the Mobius cubes contain ring networks and other networks. We report results of simulation studies on the dynamic message-passing performance of the hypercube, the Twisted Cube of P.A.J. Hilbers et al. (1987), and the Mobius cubes. Our results are in agreement with S. Abraham (1990), showing that the Twisted Cube has worse dynamic performance than the hypercube, but our results show that the 1-Mobius cube has dynamic performance superior to that of the hypercube. This contradicts current literature, which implies that twisted cube variants will have worse dynamic performance. >

...read moreread less

Journal Article•DOI•

[...]

Jr. B.S. Kaliski

Measuring cache and TLB performance and their effect on benchmark runtimes

TL;DR: The right-shifting binary algorithm for modular inversion is shown naturally to compute the new inverse in fewer operations than the ordinary modular inverse.

...read moreread less

Abstract: The Montgomery inverse of b module a is b/sup -1/2/sup n/ mod a, where n is the number of bits in a. The right-shifting binary algorithm for modular inversion is shown naturally to compute the new inverse in fewer operations than the ordinary modular inverse. The new inverse facilitates recent work by Koc on modular exponentiation and has other applications in cryptography. >

...read moreread less

Journal Article•DOI•

[...]

Rafael H. Saavedra¹, Alan Jay Smith•Institutions (1)

University of Southern California¹

01 Oct 1995-IEEE Transactions on Computers

TL;DR: A powerful technique for the evaluation and analysis of both computer systems and their workloads is extended; this methodology is valuable both to computer users and computer system designers.

...read moreread less

Abstract: In previous research, we have developed and presented a model for measuring machines and analyzing programs, and for accurately predicting the running time of any analyzed program on any measured machine. That work is extended here by: (1) developing a high level program to measure the design and performance of the cache and TLB units; (2) using those measurements, along with published miss ratio data, to improve the accuracy of our runtime predictions; (3) using our analysis tools and measurements to study and compare the design of several machines, with particular reference to their cache and TLB performance. As part of this work, we describe the design and performance of the cache and TLB for ten machines. The work presented, in this paper extends a powerful technique for the evaluation and analysis of both computer systems and their workloads; this methodology is valuable both to computer users and computer system designers. >

...read moreread less

Journal Article•DOI•

An algorithm for exact bounds on the time separation of events in concurrent systems

[...]

Henrik Hulgaard¹, Steven M. Burns¹, T. Amon¹, G. Borriello²•Institutions (2)

University of Washington¹, Texas State University²

01 Nov 1995-IEEE Transactions on Computers

TL;DR: An efficient algorithm to find exact (tight) bounds on the separation time of events in an arbitrary process graph without conditional behavior is presented, which will form a basis for exploration of timing-constrained synthesis techniques.

...read moreread less

Abstract: Determining the time separation of events is a fundamental problem in the analysis, synthesis, and optimization of concurrent systems. Applications range from logic optimization of asynchronous digital circuits to evaluation of execution times of programs for real-time systems. We present an efficient algorithm to find exact (tight) bounds on the separation time of events in an arbitrary process graph without conditional behavior. This result is more general than the methods presented in several previously published papers as it handles cyclic graphs and yields the tightest possible bounds on event separations. The algorithm is based on a functional decomposition technique that permits the implicit evaluation of an infinitely unfolded process graph. Examples are presented that demonstrate the utility and efficiency of the solution. The algorithm will form a basis for exploration of timing-constrained synthesis techniques. >

...read moreread less

Journal Article•DOI•

A new design technique for column compression multipliers

[...]

Zhongde Wang¹, Graham A. Jullien¹, William C. Miller¹•Institutions (1)

University of Windsor¹

Embedding binary trees into crossed cubes

TL;DR: It is shown that architectures obtained from this new design technique are more area efficient, and have shorter interconnections than the classical Dadda CC multiplier, and that the technique is also suitable for the design of twos complement multipliers.

...read moreread less

Abstract: In this paper, a new design technique for column-compression (CC) multipliers is presented. Constraints for column compression with full and half adders are analyzed and, under these constraints, considerable flexibility for implementation of the CC multiplier, including the allocation of adders, and choosing the length of the final fast adder, is exploited. Using the example of an 8/spl times/8 bit CC multiplier, we show that architectures obtained from this new design technique are more area efficient, and have shorter interconnections than the classical Dadda CC multiplier. We finally show that our new technique is also suitable for the design of twos complement multipliers. >

...read moreread less

Journal Article•DOI•

[...]

Priyalal Kulasinghe¹, Said Bettayeb¹•Institutions (1)

Louisiana State University¹

01 Jul 1995-IEEE Transactions on Computers

TL;DR: It is shown that the (2/sup n/-1) node complete binary tree can be embedded into the n-dimensional crossed cube with dilation 1.

...read moreread less

Abstract: The recently introduced interconnection network, crossed cube, has attracted much attention in the parallel processing area due to its many attractive features. Like the ordinary hypercube, the n-dimensional crossed cube is a regular graph with 2/sup n/ vertices and n2/sup n-1/ edges. The diameter of the crossed cube is approximately half that of the ordinary hypercube. These advantages of the crossed cube motivated the study of how well it can simulate other networks such as the complete binary tree. We show that the (2/sup n/-1) node complete binary tree can be embedded into the n-dimensional crossed cube with dilation 1. >

...read moreread less

Journal Article•DOI•

Hazards, critical races, and metastability

[...]

Stephen H. Unger¹•Institutions (1)

Columbia University¹

01 Jun 1995-IEEE Transactions on Computers

TL;DR: It is shown that the use of simulation for verifying the correctness of a circuit with given bounds on the branch delays cannot be relied upon to expose all timing problems and refutes a plausible conjecture that replacing pure delays with inertial delays can never introduce, but only eliminate glitches.

...read moreread less

Abstract: The various modes of failure of asynchronous sequential logic circuits due to timing problems are considered. These are hazards, critical races and metastable states. It is shown that there is a mechanism common to all forms of hazards and to metastable states. A similar mechanism, with added complications, is shown to characterize critical races. Means for defeating various types of hazards and critical races through the use of one-sided delay constraints are introduced. A method is described for determining from a flow table situations in which metastable states may be entered. A circuit technique is presented for extending a previously known technique for defeating metastability problems in self-timed systems. It is shown that the use of simulation for verifying the correctness of a circuit with given bounds on the branch delays cannot be relied upon to expose all timing problems. An example is presented that refutes a plausible conjecture that replacing pure delays with inertial delays can never introduce, but only eliminate glitches. >

...read moreread less

Journal Article•DOI•

Distributed memory compiler design for sparse problems

[...]

Jan-Jan Wu¹, Raja Das², Joel H. Saltz³, H. Berryman², S. Hiranandan - Show less +1 more•Institutions (3)

Yale University¹, Langley Research Center², University of Maryland, College Park³

01 Jun 1995-IEEE Transactions on Computers

TL;DR: A run time support mechanism is proposed that is used effectively by a compiler to generate efficient code in these situations of compiling concurrent loop nests in the presence of complicated array references and irregularly distributed arrays.

...read moreread less

Abstract: This paper addresses the issue of compiling concurrent loop nests in the presence of complicated array references and irregularly distributed arrays. Arrays accessed within loops may contain accesses that make it impossible to precisely determine the reference pattern at compile time. This paper proposes a run time support mechanism that is used effectively by a compiler to generate efficient code in these situations. The compiler accepts as input a Fortran 77 program enhanced with specifications for distributing data, and outputs a message passing program that runs on the nodes of a distributed memory machine. The runtime support for the compiler consists of a library of primitives designed to support irregular patterns of distributed array accesses and irregularly distributed array partitions. A variety of performance results on the Intel iPSC/860 are presented. >

...read moreread less

Journal Article•DOI•

Fast combinatorial RNS processors for DSP applications

[...]

E.D. Di Claudio, Francesco Piazza, G. Orlandi

01 May 1995-IEEE Transactions on Computers

TL;DR: It is proven that existing combinatorial or look-up table approaches for RNS are tailored to small designs or special applications, while the pseudo-RNS approach remains competitive also for complex systems.

...read moreread less

Abstract: It is known that RNS VLSI processors can parallelize fixed-point addition and multiplication operations by the use of the Chinese remainder theorem (CRT). The required modular operations, however, must use specialized hardware whose design and implementation can create several problems. In this paper a modified residue arithmetic, called pseudo-RNS is introduced in order to alleviate some of the RNS problems when digital signal processing (DSP) structures are implemented. Pseudo-RNS requires only the use of modified binary processors and exhibits a speed performance comparable with other RNS traditional approaches. Some applications of the pseudo-RNS to common DSP architectures, such as multipliers and filters, are also presented in this paper. They are compared in terms of the area-time square product versus other RNS and weighted binary structures. It is proven that existing combinatorial or look-up table approaches for RNS are tailored to small designs or special applications, while the pseudo-RNS approach remains competitive also for complex systems. >

...read moreread less

Journal Article•DOI•

Parametric dispatching of hard real-time tasks

[...]

Richard Gerber¹, William Pugh¹, M. Saksena¹•Institutions (1)

University of Maryland, College Park¹

Architecture for a low complexity rate-adaptive Reed-Solomon encoder

TL;DR: This work presents the technique of parametric dispatching to enforce relative timing constraints on a set of tasks and produces a calendar that allows the on-line algorithm to generate upper and lower bounds on the start time of each task, based on thestart times and execution times of previous tasks.

...read moreread less

Abstract: In many real-time systems relative timing constraints are imposed on a set of tasks. Generating a correct ordering for the tasks and deriving their proper start-time assignments is an NP-hard problem; it subsumes the non-preemptive scheduling problem. Even when the application imposes a total order on the tasks, generating proper start-times is still nontrivial if execution times may range between upper and lower bounds. We present the technique of parametric dispatching to enforce such timing constraints. During an off-line component, we check if the constraints can be guaranteed. If so, a calendar is produced that allows our on-line algorithm to generate upper and lower bounds on the start time of each task, based on the start times and execution times of previous tasks. A suitable start time for the task may then be selected taking into account the presence of other non-critical tasks in the system. >

...read moreread less

Journal Article•DOI•

[...]

M.A. Hasan¹, Vijay K. Bhargava²•Institutions (2)

University of Waterloo¹, University of Victoria²

01 Jul 1995-IEEE Transactions on Computers

TL;DR: A triangular basis for representing the field elements is considered, and an architecture for a rate-adaptive RS encoder using a triangular basis multiplication algorithm is presented that supports pipeline and bit-serial operations, and has a low circuit complexity.

...read moreread less

Abstract: Multiple error-correcting Reed-Solomon (RS) codes have many practical applications. The complexity of an RS encoder depends on multiplications in the finite field over which the code is defined. We consider a triangular basis for representing the field elements, and present an architecture for a rate-adaptive RS encoder using a triangular basis multiplication algorithm. The architecture supports pipeline and bit-serial operations, and has a low circuit complexity. >

...read moreread less

Journal Article•DOI•

Synthesis of delay-verifiable combinational circuits

[...]

W. Ke, P.R. Menon

A distributed system-level diagnosis algorithm for arbitrary network topologies

TL;DR: It is shown that verifying the timing of a circuit may require tests which can detect the simultaneous presence of more than one path delay fault, and a general framework for examining delay-verifiability is provided.

...read moreread less

Abstract: We address the problem of testing circuits for temporal correctness. A circuit is considered delay-verifiable if its timing correctness can be established by applying delay tests. It is shown that verifying the timing of a circuit may require tests which can detect the simultaneous presence of more than one path delay fault. We provide a general framework for examining delay-verifiability by introducing a special class of faults called primitive path delay faults. It is necessary and sufficient to test every fault in this class to ensure the temporal correctness of combinational circuits. Based on this result, we develop a synthesis procedure for combinational circuits that can be tested for correct timing. Experimental data show that such implementations usually require less area than completely delay testable implementations. >

...read moreread less

Journal Article•DOI•

[...]

S. Rangarajan¹, A.T. Dahbura², E.A. Ziegler¹•Institutions (2)

Northeastern University¹, Motorola²

Estimators for fault tolerance coverage evaluation

TL;DR: A distributed algorithm is described for detecting and diagnosing faulty processors in an arbitrary network and it is formally proven that the algorithm is correct and that it is optimal in terms of the time required for all of the fault free processors in the network to learn of a new event.

...read moreread less

Abstract: A distributed algorithm is described for detecting and diagnosing faulty processors in an arbitrary network. Fault free processors perform simple periodic tests on one another; when a fault is detected or a newly repaired processor joins the network, this new information is disseminated in parallel throughout the network. It is formally proven that the algorithm is correct, and it is also shown that the algorithm is optimal in terms of the time required for all of the fault free processors in the network to learn of a new event. Simulation results are given for arbitrary network topologies. >

...read moreread less

Journal Article•DOI•

[...]

David Powell¹, Eliane Martins¹, Jean Arlat¹, Yves Crouzet¹•Institutions (1)

Centre national de la recherche scientifique¹

Householder CORDIC algorithms

TL;DR: The "no-reply" problem that hampers most practical fault-injection experiments is discussed and an a posteriori stratification technique is proposed that allows the scope of incomplete tests to be widened by accounting for available structural information about the target system.

...read moreread less

Abstract: This paper addresses the problem of estimating the coverage of a fault tolerance mechanism through statistical professing of observations collected in fault injection experiments. A formal definition of coverage is given in terms of the fault and system activity sets that characterize the input space. Two categories of sampling techniques are considered for coverage estimation: sampling in the whole space and sampling in a space partitioned into classes. The estimators for each technique are compared by means of hypothetical examples. Techniques for early estimations of coverage are then studied. These techniques allow unbiased estimations of coverage to be made before all classes of the sampling space have been tested. Then, the "no-reply" problem that hampers most practical fault-injection experiments is discussed and an a posteriori stratification technique is proposed that allows the scope of incomplete tests to be widened by accounting for available structural information about the target system. >

...read moreread less

Journal Article•DOI•

[...]

Shen-Fu Hsiao¹, J.-M. Delosme²•Institutions (2)

National Sun Yat-sen University¹, Yale University²

Algorithms for scheduling imprecise computations with timing constraints to minimize maximum error

TL;DR: VLSI implementations of Householder CORDIC processors are presented and their speed and area are estimated, and the method employed to prove the convergence of these multidimensional algorithms differs from the one used in the 2D case.

...read moreread less

Abstract: Matrix computations are often expressed in terms of plane rotations, which may be implemented using COordinate Rotation Digital Computer (CORDIC) arithmetic. As matrix sizes increase multiprocessor systems employing traditional CORDIC arithmetic, which operates on two-dimensional (2D) vectors, become unable to achieve sufficient speed. Speed may be increased by expressing the matrix computations in terms of higher dimensional rotations and implementing these rotations using novel CORDIC algorithms-called Householder CORDIC-that extend CORDIC arithmetic to arbitrary dimensions. The method employed to prove the convergence of these multidimensional algorithms differs from the one used in the 2D case. After a discussion of scaling factor decomposition, range extension and numerical errors, VLSI implementations of Householder CORDIC processors are presented and their speed and area are estimated. Finally, some applications of the Householder CORDIC algorithms are listed. >

...read moreread less

Journal Article•DOI•

[...]

Wei-Kuan Shih, J.W.S. Liu

Avalanche characteristics of substitution-permutation encryption networks

TL;DR: This work describes two preemptive algorithms for scheduling on a processor n dependent tasks with rational ready times, deadlines, and processing times and finds an optimal schedule with the minimum total error.

...read moreread less

Abstract: We consider the problem of scheduling tasks in the imprecise computation model to minimize the maximum error. Given a task system and a schedule of it, the maximum error of the task system is equal to the error of the task that has the largest error when the task system is executed according to the schedule. We describe two preemptive algorithms for scheduling on a processor n dependent tasks with rational ready times, deadlines, and processing times. Each schedule found by our algorithms is an optimal schedule with the minimum total error, and according to this schedule the maximum error is minimized. The run times of our algorithms are O(n/sup 3/) and O(n/sup 2/). >

...read moreread less

Journal Article•DOI•

[...]

Howard M. Heys¹, Stafford E. Tavares²•Institutions (2)

Memorial University of Newfoundland¹, Queen's University²

01 Sep 1995-IEEE Transactions on Computers

TL;DR: The results presented in this paper demonstrate that the avalanche behavior of encryption networks can be improved by using larger S- boxes and it is shown that increasing the diffusion properties of the S-boxes or replacing the permutations by diffusive linear transformations is effective in improving the network avalanche characteristics.

...read moreread less

Abstract: This paper develops analytical models for the avalanche characteristics of a class of block ciphers usually referred to as substitution-permutation encryption networks or SPNs. An SPN is considered to display good avalanche characteristics if a one bit change in the plaintext input is expected to result in close to half the ciphertext output bits changing. Good avalanche characteristics are important to ensure that a cipher is not susceptible to statistical attacks and the strength of an SPN's avalanche characteristics may be considered as a measure of the randomness of the ciphertext. The results presented in this paper demonstrate that the avalanche behavior of encryption networks can be improved by using larger S-boxes. As well, it is shown that increasing the diffusion properties of the S-boxes or replacing the permutations by diffusive linear transformations is effective in improving the network avalanche characteristics. >

...read moreread less

Journal Article•DOI•

Fast evaluation of the elementary functions in single precision

[...]

Weng-Fai Wong¹, E. Goto•Institutions (1)

National University of Singapore¹

Test generation for path delay faults using binary decision diagrams

TL;DR: A new method for the fast evaluation of the elementary functions in single precision based on the evaluation of truncated Taylor series using a difference method, which can calculate the basic elementary functions, namely reciprocal, square root, logarithm, exponential, trig onometric and inverse trigonometric functions, within the latency of two to four floating point multiplies.

...read moreread less

Abstract: In this paper we introduce a new method for the fast evaluation of the elementary functions in single precision based on the evaluation of truncated Taylor series using a difference method. We assume the availability of large and fast (at least for read purposes) memory. We call this method the ATA (Add-Table lookup-Add) method. As the name implies, the hardware required for the method are adders (both two/ and multi/operand adders) and fast tables. For IEEE single precision numbers our initial estimates indicate that we can calculate the basic elementary functions, namely reciprocal, square root, logarithm, exponential, trigonometric and inverse trigonometric functions, within the latency of two to four floating point multiplies. >

...read moreread less

Journal Article•DOI•

[...]

Debashis Bhattacharya¹, Prathima Agrawal², Vishwani D. Agrawal²•Institutions (2)

Yale University¹, AT&T²

Determining redundancy levels for fault tolerant real-time systems

TL;DR: A new test generation technique for path delay faults in circuits employing scan/hold type flip-flops is presented, and results show that the algebraic technique is one to two orders of magnitude faster than previously reported methods based on branch-and-bound algorithms.

...read moreread less

Abstract: A new test generation technique for path delay faults in circuits employing scan/hold type flip-flops is presented. Reduced ordered binary decision diagrams (ROBDDs) are used to represent Boolean functions realized by all signals in the circuit, as well as to represent the constraints to be satisfied by the delay fault test. Two faults are considered for each path in the circuit. For each fault, a pair of constraint functions, corresponding to the two time frames that constitute a transition, is evaluated. If the constraint function in the second time frame is non-null, robust-hazard-free-test generation for the delay fault is attempted. A robust test thus generated belongs either to the class of fully transitional path (FTP) tests or to the class of single input transition (SIT) tests. If a robust test cannot be found, the existence of a non-robust test is checked. Boolean algebraic manipulation of the constraint functions guarantees that if neither robust nor non-robust tests exist, the fault is undetectable. In its present form the method is applicable to all circuits that are amenable to analysis using ROBDDs. An implementation of this technique is used to analyze delay fault testability of ISCAS '89 benchmark circuits. These results show that the algebraic technique is one to two orders of magnitude faster than previously reported methods based on branch-and-bound algorithms. >

...read moreread less

Journal Article•DOI•

[...]

Fuxing Wang, Krithi Ramamritham, John A. Stankovic