scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Computers in 1995"


Journal ArticleDOI
TL;DR: A novel method for tolerating up to two disk failures in RAID architectures based on Reed-Solomon error-correcting codes, which can be used in any system requiring large symbols and relatively short codes, for instance, in multitrack magnetic recording.
Abstract: We present a novel method, that we call EVENODD, for tolerating up to two disk failures in RAID architectures. EVENODD employs the addition of only two redundant disks and consists of simple exclusive-OR computations. This redundant storage is optimal, in the sense that two failed disks cannot be retrieved with less than two redundant disks. A major advantage of EVENODD is that it only requires parity hardware, which is typically present in standard RAID-5 controllers. Hence, EVENODD can be implemented on standard RAID-5 controllers without any hardware changes. The most commonly used scheme that employes optimal redundant storage (i.e., two extra disks) is based on Reed-Solomon (RS) error-correcting codes. This scheme requires computation over finite fields and results in a more complex implementation. For example, we show that the complexity of implementing EVENODD in a disk array with 15 disks is about 50% of the one required when using the RS scheme. The new scheme is not limited to RAID architectures: it can be used in any system requiring large symbols and relatively short codes, for instance, in multitrack magnetic recording. To this end, we also present a decoding algorithm for one column (track) in error. >

745 citations


Journal ArticleDOI
TL;DR: The results show that the three hardware prefetching schemes all yield significant reductions in the data access penalty when compared with regular caches, the benefits are greater when the hardware assist augments small on-chip caches, and the lookahead scheme is the preferred one cost-performance wise.
Abstract: Memory latency and bandwidth are progressing at a much slower pace than processor performance. In this paper, we describe and evaluate the performance of three variations of a hardware function unit whose goal is to assist a data cache in prefetching data accesses so that memory latency is hidden as often as possible. The basic idea of the prefetching scheme is to keep track of data access patterns in a reference prediction table (RPT) organized as an instruction cache. The three designs differ mostly on the timing of the prefetching. In the simplest scheme (basic), prefetches can be generated one iteration ahead of actual use. The lookahead variation takes advantage of a lookahead program counter that ideally stays one memory latency time ahead of the real program counter and that is used as the control mechanism to generate the prefetches. Finally the correlated scheme uses a more sophisticated design to detect patterns across loop levels. These designs are evaluated by simulating the ten SPEC benchmarks on a cycle-by-cycle basis. The results show that 1) the three hardware prefetching schemes all yield significant reductions in the data access penalty when compared with regular caches, 2) the benefits are greater when the hardware assist augments small on-chip caches, and 3) the lookahead scheme is the preferred one cost-performance wise. >

543 citations


Journal ArticleDOI
TL;DR: A priority-based policy for scheduling N such streams on a single server to reduce the probability of dynamic failure and assign higher priorities to customers from streams that are closer to a dynamic failure so as to improve their chances of meeting their deadlines.
Abstract: The problem of scheduling multiple streams of real-time customers, is addressed in this paper. The paper first introduces the notion of (m, k)-firm deadlines to better characterize the timing constraints of real-time streams. More specifically, a stream is said to have (m, k)-firm deadlines if at least m out of any k consecutive customers must meet their deadlines. A stream with (m, k)-firm deadlines experiences a dynamic failure if fewer than m out of any k consecutive customers meet their deadlines. The paper then proposes a priority-based policy for scheduling N such streams on a single server to reduce the probability of dynamic failure. The basic idea is to assign higher priorities to customers from streams that are closer to a dynamic failure so as to improve their chances of meeting their deadlines. The paper proposes a heuristic for assigning these priorities. The effectiveness of this approach is evaluated through simulation under various customer arrival and service patterns. The scheme is compared to a conventional scheme where all customers are serviced at the same priority level and to an imprecise computation model approach. The evaluation shows that substantial reductions in the probability of dynamic failure are achieved when the proposed policy is used.

512 citations


Journal ArticleDOI
TL;DR: A new scheme for built-in test that uses multiple-polynomial linear feedback shift registers (MP-LFSR's) and an implicit polynomial identification reduces the number of extra bits per seed to one bit is presented.
Abstract: We propose a new scheme for built-in test (BIT) that uses multiple-polynomial linear feedback shift registers (MP-LFSR's). The same MP-LFSR that generates random patterns to cover easy to test faults is loaded with seeds to generate deterministic vectors for difficult to test faults. The seeds are obtained by solving systems of linear equations involving the seed variables for the positions where the test cubes have specified values. We demonstrate that MP-LFSR's produce sequences with significantly reduced probability of linear dependence compared to single polynomial LFSR's. We present a general method to determine the probability of encoding as a function of the number of specified bits in the test cube, the length of the LFSR and the number of polynomials. Theoretical analysis and experiments show that the probability of encoding a test cube with s specified bits in an s-stage LFSR with 16 polynomials is 1-10/sup -6/. We then present the new BIT scheme that allows for an efficient encoding of the entire test set. Here the seeds are grouped according to the polynomial they use and an implicit polynomial identification reduces the number of extra bits per seed to one bit. The paper also shows methods of processing the entire test set consisting of test cubes with varied number of specified bits. Experimental results show the tradeoffs between test data storage and test application time while maintaining complete fault coverage. >

439 citations


Journal ArticleDOI
TL;DR: The Deferrable Server (DS) algorithm is introduced which will be shown to provide improved aperiodic response time performance over traditional background and polling approaches and significantly reduced response times of a periodic tasks while still maintaining guaranteed periodic task deadlines.
Abstract: Most existing scheduling algorithms for hard real-time systems apply either to periodic tasks or aperiodic tasks but not to both. In practice, real-time systems require an integrated, consistent approach to scheduling that is able to simultaneously meet the timing requirements of hard deadline periodic tasks, hard deadline aperiodic (alert-class) tasks, and soft deadline aperiodic tasks. This paper introduces the Deferrable Server (DS) algorithm which will be shown to provide improved aperiodic response time performance over traditional background and polling approaches. Taking advantage of the fact that, typically, there is no benefit in early completion of the periodic tasks, the Deferrable Server (DS) algorithm assigns higher priority to the aperiodic tasks up until the point where the periodic tasks would start to miss their deadlines. Guaranteed alert-class aperiodic service and greatly reduced response times for soft deadline aperiodic tasks are important features of the DS algorithm, and both are obtained with the hard deadlines of the periodic tasks still being guaranteed. The results of a simulation study performed to evaluate the response time performance of the new algorithm against traditional background and polling approaches are presented. In all cases, the response times of aperiodic tasks are significantly reduced (often by an order of magnitude) while still maintaining guaranteed periodic task deadlines. >

423 citations


Journal ArticleDOI
TL;DR: The methodology and guidelines for the design of flexible software based fault and error injection are described and a tool, FERRARI, that incorporates the techniques are presented that demonstrates the effectiveness of the software-based error injection tool in evaluating the dependability properties of complex systems.
Abstract: A major step toward the development of fault-tolerant computer systems is the validation of the dependability properties of these systems. Fault/error injection has been recognized as a powerful approach to validate the fault tolerance mechanisms of a system and to obtain statistics on parameters such as coverages and latencies. This paper describes the methodology and guidelines for the design of flexible software based fault and error injection and presents a tool, FERRARI, that incorporates the techniques. The techniques used to emulate transient errors and permanent faults in software are described in detail. Experimental results are presented for several error detection techniques, and they demonstrate the effectiveness of the software-based error injection tool in evaluating the dependability properties of complex systems. >

370 citations


Journal ArticleDOI
TL;DR: It is shown that, using just one extra virtual channel per physical channel, the well known e cube algorithm can be used to provide deadlock free routing in networks with nonoverlapping fault rings and it is proved that at most four additional virtual channels are sufficient to make fully adaptive algorithms tolerant to multiple faulty blocks in n dimensional meshes.
Abstract: We present simple methods to enhance the current minimal wormhole routing algorithms developed for high radix, low dimensional mesh networks for fault tolerant routing. We consider arbitrarily located faulty blocks and assume only local knowledge of faults. Messages are routed minimally when not blocked by faults and this constraint is relaxed to route around faults. The key concept we use is a fault ring consisting of fault free nodes and links can be formed around each fault region. Our fault tolerant techniques use these fault rings to route messages around fault regions. We show that, using just one extra virtual channel per physical channel, the well known e cube algorithm can be used to provide deadlock free routing in networks with nonoverlapping fault rings; there is no restriction on the number of faults. For the more complex faults with overlapping fault rings, four virtual channels are used. We also prove that at most four additional virtual channels are sufficient to make fully adaptive algorithms tolerant to multiple faulty blocks in n dimensional meshes. All these algorithms are deadlock and livelock free. Further, we present simulation results for the e cube and a fully adaptive algorithm fortified with our fault tolerant routing techniques and show that good performance may be obtained with as many as 10% links faulty. >

325 citations


Journal ArticleDOI
TL;DR: Stochastic rendezvous networks are queueing networks of a new type which have been proposed as a modelling framework for these systems are extended to also incorporate different services or entries associated with each task, to give approximate performance estimates.
Abstract: Distributed or parallel software with synchronous communication via rendezvous is found in client-server systems and in proposed open distributed systems, in implementation environments such as Ada, V, remote procedure call systems, in transputer systems, and in specification techniques such as CSP, CCS and LOTOS The delays induced by rendezvous can cause serious performance problems, which are not easy to estimate using conventional models which focus on hardware contention, or on a restricted view of the parallelism which ignores implementation constraints Stochastic rendezvous networks are queueing networks of a new type which have been proposed as a modelling framework for these systems They incorporate the two key phenomena of included service and a second phase of service This paper extends the model to also incorporate different services or entries associated with each task Approximations to arrival-instant probabilities are employed with a mean-value analysis framework, to give approximate performance estimates The method has been applied to moderately large industrial software systems >

306 citations


Journal ArticleDOI
TL;DR: New schedulability conditions are presented for homogeneous multiprocessor systems where individual processors execute the rate-monotonic scheduling algorithm and it is shown that the processors can be almost fully utilized.
Abstract: Optimal scheduling of real-time tasks on multiprocessor systems is known to be computationally intractable for large task sets. Any practical scheduling algorithm for assigning real-time tasks to a multiprocessor system presents a trade-off between its computational complexity and its performance. In this study, new schedulability conditions are presented for homogeneous multiprocessor systems where individual processors execute the rate-monotonic scheduling algorithm. The conditions are used to develop new strategies for assigning real-time tasks to processors. The performance of the new strategies is shown to be significantly better than suggested by the existing literature. Under the realistic assumption that the load of each real-time task is small compared to the processing speed of each processor, it is shown that the processors can be almost fully utilized.

248 citations


Journal ArticleDOI
TL;DR: Two single-node broadcasting algorithms are presented that are optimal when single-port and multi-port I/O is used and how Lee distance can be applied to message routing and single- node broadcasting in a Q/sub n//sup k/.
Abstract: In this paper, we consider various topological properties of a k-ary n-cube (Q/sub n//sup k/) using Lee distance. We feel that Lee distance is a natural metric for defining and studying a Q/sub n//sup k/. After defining a Q/sub n//sup k/ graph using Lee distance, we show how to find all disjoint paths between any two nodes. Given a sequence of radix k numbers, a function mapping the sequence to a Gray code sequence is presented, and this function is used to generate a Hamiltonian cycle. Embedding the graph of a mesh and the graph of a binary hypercube into the graph of a Q/sub n//sup k/ is considered. Using a k-ary Gray code, we show the embedding of a k(n/sub 1/)/spl times/k(n/sub 2/)/spl times/.../spl times/k(n/sub m/)-dimensional mesh into a Q/sub n//sup k/ where n=/spl Sigma//sub i=l//sup m/n/sub i/. Then using a single digit, 4-ary reflective Gray code, we demonstrate embedding a Q/sub n/ into Q/sub [n/2]//sup 4/. We look at how Lee distance may be applied to the problem of resource placement in a Q/sub n//sup k/ by using a Lee distance error-correcting code. Although the results in this paper are only preliminary, Lee distance error-correcting codes have not been applied previously to this problem. Finally, we consider how Lee distance can be applied to message routing and single-node broadcasting in a Q/sub n//sup k/. In this section we present two single-node broadcasting algorithms that are optimal when single-port and multi-port I/O is used. >

234 citations


Journal ArticleDOI
TL;DR: The results show that the 1-Mobius cube has dynamic performance superior to that of the hypercube, which contradicts current literature, which implies that twisted cube variants will have worse dynamic performance.
Abstract: The Mobius cubes are hypercube variants that give better performance with the same number of links and processors. We show that the diameter of the Mobius cubes is about one half the diameter of the equivalent hypercube, and that the average number of steps between processors for a Mobius cube is about two-thirds of the average for a hypercube. We give an efficient routing algorithm for the Mobius cubes. This routing algorithm finds a shortest path and operates in time proportional to the dimension of the cube. We also give efficient broadcast algorithms for the Mobius cubes. We show that the Mobius cubes contain ring networks and other networks. We report results of simulation studies on the dynamic message-passing performance of the hypercube, the Twisted Cube of P.A.J. Hilbers et al. (1987), and the Mobius cubes. Our results are in agreement with S. Abraham (1990), showing that the Twisted Cube has worse dynamic performance than the hypercube, but our results show that the 1-Mobius cube has dynamic performance superior to that of the hypercube. This contradicts current literature, which implies that twisted cube variants will have worse dynamic performance. >

Journal ArticleDOI
TL;DR: The right-shifting binary algorithm for modular inversion is shown naturally to compute the new inverse in fewer operations than the ordinary modular inverse.
Abstract: The Montgomery inverse of b module a is b/sup -1/2/sup n/ mod a, where n is the number of bits in a. The right-shifting binary algorithm for modular inversion is shown naturally to compute the new inverse in fewer operations than the ordinary modular inverse. The new inverse facilitates recent work by Koc on modular exponentiation and has other applications in cryptography. >

Journal ArticleDOI
TL;DR: A powerful technique for the evaluation and analysis of both computer systems and their workloads is extended; this methodology is valuable both to computer users and computer system designers.
Abstract: In previous research, we have developed and presented a model for measuring machines and analyzing programs, and for accurately predicting the running time of any analyzed program on any measured machine. That work is extended here by: (1) developing a high level program to measure the design and performance of the cache and TLB units; (2) using those measurements, along with published miss ratio data, to improve the accuracy of our runtime predictions; (3) using our analysis tools and measurements to study and compare the design of several machines, with particular reference to their cache and TLB performance. As part of this work, we describe the design and performance of the cache and TLB for ten machines. The work presented, in this paper extends a powerful technique for the evaluation and analysis of both computer systems and their workloads; this methodology is valuable both to computer users and computer system designers. >

Journal ArticleDOI
TL;DR: An efficient algorithm to find exact (tight) bounds on the separation time of events in an arbitrary process graph without conditional behavior is presented, which will form a basis for exploration of timing-constrained synthesis techniques.
Abstract: Determining the time separation of events is a fundamental problem in the analysis, synthesis, and optimization of concurrent systems. Applications range from logic optimization of asynchronous digital circuits to evaluation of execution times of programs for real-time systems. We present an efficient algorithm to find exact (tight) bounds on the separation time of events in an arbitrary process graph without conditional behavior. This result is more general than the methods presented in several previously published papers as it handles cyclic graphs and yields the tightest possible bounds on event separations. The algorithm is based on a functional decomposition technique that permits the implicit evaluation of an infinitely unfolded process graph. Examples are presented that demonstrate the utility and efficiency of the solution. The algorithm will form a basis for exploration of timing-constrained synthesis techniques. >

Journal ArticleDOI
TL;DR: It is shown that architectures obtained from this new design technique are more area efficient, and have shorter interconnections than the classical Dadda CC multiplier, and that the technique is also suitable for the design of twos complement multipliers.
Abstract: In this paper, a new design technique for column-compression (CC) multipliers is presented. Constraints for column compression with full and half adders are analyzed and, under these constraints, considerable flexibility for implementation of the CC multiplier, including the allocation of adders, and choosing the length of the final fast adder, is exploited. Using the example of an 8/spl times/8 bit CC multiplier, we show that architectures obtained from this new design technique are more area efficient, and have shorter interconnections than the classical Dadda CC multiplier. We finally show that our new technique is also suitable for the design of twos complement multipliers. >

Journal ArticleDOI
TL;DR: It is shown that the (2/sup n/-1) node complete binary tree can be embedded into the n-dimensional crossed cube with dilation 1.
Abstract: The recently introduced interconnection network, crossed cube, has attracted much attention in the parallel processing area due to its many attractive features. Like the ordinary hypercube, the n-dimensional crossed cube is a regular graph with 2/sup n/ vertices and n2/sup n-1/ edges. The diameter of the crossed cube is approximately half that of the ordinary hypercube. These advantages of the crossed cube motivated the study of how well it can simulate other networks such as the complete binary tree. We show that the (2/sup n/-1) node complete binary tree can be embedded into the n-dimensional crossed cube with dilation 1. >

Journal ArticleDOI
TL;DR: It is shown that the use of simulation for verifying the correctness of a circuit with given bounds on the branch delays cannot be relied upon to expose all timing problems and refutes a plausible conjecture that replacing pure delays with inertial delays can never introduce, but only eliminate glitches.
Abstract: The various modes of failure of asynchronous sequential logic circuits due to timing problems are considered. These are hazards, critical races and metastable states. It is shown that there is a mechanism common to all forms of hazards and to metastable states. A similar mechanism, with added complications, is shown to characterize critical races. Means for defeating various types of hazards and critical races through the use of one-sided delay constraints are introduced. A method is described for determining from a flow table situations in which metastable states may be entered. A circuit technique is presented for extending a previously known technique for defeating metastability problems in self-timed systems. It is shown that the use of simulation for verifying the correctness of a circuit with given bounds on the branch delays cannot be relied upon to expose all timing problems. An example is presented that refutes a plausible conjecture that replacing pure delays with inertial delays can never introduce, but only eliminate glitches. >

Journal ArticleDOI
TL;DR: A run time support mechanism is proposed that is used effectively by a compiler to generate efficient code in these situations of compiling concurrent loop nests in the presence of complicated array references and irregularly distributed arrays.
Abstract: This paper addresses the issue of compiling concurrent loop nests in the presence of complicated array references and irregularly distributed arrays. Arrays accessed within loops may contain accesses that make it impossible to precisely determine the reference pattern at compile time. This paper proposes a run time support mechanism that is used effectively by a compiler to generate efficient code in these situations. The compiler accepts as input a Fortran 77 program enhanced with specifications for distributing data, and outputs a message passing program that runs on the nodes of a distributed memory machine. The runtime support for the compiler consists of a library of primitives designed to support irregular patterns of distributed array accesses and irregularly distributed array partitions. A variety of performance results on the Intel iPSC/860 are presented. >

Journal ArticleDOI
TL;DR: It is proven that existing combinatorial or look-up table approaches for RNS are tailored to small designs or special applications, while the pseudo-RNS approach remains competitive also for complex systems.
Abstract: It is known that RNS VLSI processors can parallelize fixed-point addition and multiplication operations by the use of the Chinese remainder theorem (CRT). The required modular operations, however, must use specialized hardware whose design and implementation can create several problems. In this paper a modified residue arithmetic, called pseudo-RNS is introduced in order to alleviate some of the RNS problems when digital signal processing (DSP) structures are implemented. Pseudo-RNS requires only the use of modified binary processors and exhibits a speed performance comparable with other RNS traditional approaches. Some applications of the pseudo-RNS to common DSP architectures, such as multipliers and filters, are also presented in this paper. They are compared in terms of the area-time square product versus other RNS and weighted binary structures. It is proven that existing combinatorial or look-up table approaches for RNS are tailored to small designs or special applications, while the pseudo-RNS approach remains competitive also for complex systems. >

Journal ArticleDOI
TL;DR: This work presents the technique of parametric dispatching to enforce relative timing constraints on a set of tasks and produces a calendar that allows the on-line algorithm to generate upper and lower bounds on the start time of each task, based on thestart times and execution times of previous tasks.
Abstract: In many real-time systems relative timing constraints are imposed on a set of tasks. Generating a correct ordering for the tasks and deriving their proper start-time assignments is an NP-hard problem; it subsumes the non-preemptive scheduling problem. Even when the application imposes a total order on the tasks, generating proper start-times is still nontrivial if execution times may range between upper and lower bounds. We present the technique of parametric dispatching to enforce such timing constraints. During an off-line component, we check if the constraints can be guaranteed. If so, a calendar is produced that allows our on-line algorithm to generate upper and lower bounds on the start time of each task, based on the start times and execution times of previous tasks. A suitable start time for the task may then be selected taking into account the presence of other non-critical tasks in the system. >

Journal ArticleDOI
TL;DR: A triangular basis for representing the field elements is considered, and an architecture for a rate-adaptive RS encoder using a triangular basis multiplication algorithm is presented that supports pipeline and bit-serial operations, and has a low circuit complexity.
Abstract: Multiple error-correcting Reed-Solomon (RS) codes have many practical applications. The complexity of an RS encoder depends on multiplications in the finite field over which the code is defined. We consider a triangular basis for representing the field elements, and present an architecture for a rate-adaptive RS encoder using a triangular basis multiplication algorithm. The architecture supports pipeline and bit-serial operations, and has a low circuit complexity. >

Journal ArticleDOI
TL;DR: It is shown that verifying the timing of a circuit may require tests which can detect the simultaneous presence of more than one path delay fault, and a general framework for examining delay-verifiability is provided.
Abstract: We address the problem of testing circuits for temporal correctness. A circuit is considered delay-verifiable if its timing correctness can be established by applying delay tests. It is shown that verifying the timing of a circuit may require tests which can detect the simultaneous presence of more than one path delay fault. We provide a general framework for examining delay-verifiability by introducing a special class of faults called primitive path delay faults. It is necessary and sufficient to test every fault in this class to ensure the temporal correctness of combinational circuits. Based on this result, we develop a synthesis procedure for combinational circuits that can be tested for correct timing. Experimental data show that such implementations usually require less area than completely delay testable implementations. >

Journal ArticleDOI
TL;DR: A distributed algorithm is described for detecting and diagnosing faulty processors in an arbitrary network and it is formally proven that the algorithm is correct and that it is optimal in terms of the time required for all of the fault free processors in the network to learn of a new event.
Abstract: A distributed algorithm is described for detecting and diagnosing faulty processors in an arbitrary network. Fault free processors perform simple periodic tests on one another; when a fault is detected or a newly repaired processor joins the network, this new information is disseminated in parallel throughout the network. It is formally proven that the algorithm is correct, and it is also shown that the algorithm is optimal in terms of the time required for all of the fault free processors in the network to learn of a new event. Simulation results are given for arbitrary network topologies. >

Journal ArticleDOI
TL;DR: The "no-reply" problem that hampers most practical fault-injection experiments is discussed and an a posteriori stratification technique is proposed that allows the scope of incomplete tests to be widened by accounting for available structural information about the target system.
Abstract: This paper addresses the problem of estimating the coverage of a fault tolerance mechanism through statistical professing of observations collected in fault injection experiments. A formal definition of coverage is given in terms of the fault and system activity sets that characterize the input space. Two categories of sampling techniques are considered for coverage estimation: sampling in the whole space and sampling in a space partitioned into classes. The estimators for each technique are compared by means of hypothetical examples. Techniques for early estimations of coverage are then studied. These techniques allow unbiased estimations of coverage to be made before all classes of the sampling space have been tested. Then, the "no-reply" problem that hampers most practical fault-injection experiments is discussed and an a posteriori stratification technique is proposed that allows the scope of incomplete tests to be widened by accounting for available structural information about the target system. >

Journal ArticleDOI
TL;DR: VLSI implementations of Householder CORDIC processors are presented and their speed and area are estimated, and the method employed to prove the convergence of these multidimensional algorithms differs from the one used in the 2D case.
Abstract: Matrix computations are often expressed in terms of plane rotations, which may be implemented using COordinate Rotation Digital Computer (CORDIC) arithmetic. As matrix sizes increase multiprocessor systems employing traditional CORDIC arithmetic, which operates on two-dimensional (2D) vectors, become unable to achieve sufficient speed. Speed may be increased by expressing the matrix computations in terms of higher dimensional rotations and implementing these rotations using novel CORDIC algorithms-called Householder CORDIC-that extend CORDIC arithmetic to arbitrary dimensions. The method employed to prove the convergence of these multidimensional algorithms differs from the one used in the 2D case. After a discussion of scaling factor decomposition, range extension and numerical errors, VLSI implementations of Householder CORDIC processors are presented and their speed and area are estimated. Finally, some applications of the Householder CORDIC algorithms are listed. >

Journal ArticleDOI
TL;DR: This work describes two preemptive algorithms for scheduling on a processor n dependent tasks with rational ready times, deadlines, and processing times and finds an optimal schedule with the minimum total error.
Abstract: We consider the problem of scheduling tasks in the imprecise computation model to minimize the maximum error. Given a task system and a schedule of it, the maximum error of the task system is equal to the error of the task that has the largest error when the task system is executed according to the schedule. We describe two preemptive algorithms for scheduling on a processor n dependent tasks with rational ready times, deadlines, and processing times. Each schedule found by our algorithms is an optimal schedule with the minimum total error, and according to this schedule the maximum error is minimized. The run times of our algorithms are O(n/sup 3/) and O(n/sup 2/). >

Journal ArticleDOI
TL;DR: The results presented in this paper demonstrate that the avalanche behavior of encryption networks can be improved by using larger S- boxes and it is shown that increasing the diffusion properties of the S-boxes or replacing the permutations by diffusive linear transformations is effective in improving the network avalanche characteristics.
Abstract: This paper develops analytical models for the avalanche characteristics of a class of block ciphers usually referred to as substitution-permutation encryption networks or SPNs. An SPN is considered to display good avalanche characteristics if a one bit change in the plaintext input is expected to result in close to half the ciphertext output bits changing. Good avalanche characteristics are important to ensure that a cipher is not susceptible to statistical attacks and the strength of an SPN's avalanche characteristics may be considered as a measure of the randomness of the ciphertext. The results presented in this paper demonstrate that the avalanche behavior of encryption networks can be improved by using larger S-boxes. As well, it is shown that increasing the diffusion properties of the S-boxes or replacing the permutations by diffusive linear transformations is effective in improving the network avalanche characteristics. >

Journal ArticleDOI
TL;DR: A new method for the fast evaluation of the elementary functions in single precision based on the evaluation of truncated Taylor series using a difference method, which can calculate the basic elementary functions, namely reciprocal, square root, logarithm, exponential, trig onometric and inverse trigonometric functions, within the latency of two to four floating point multiplies.
Abstract: In this paper we introduce a new method for the fast evaluation of the elementary functions in single precision based on the evaluation of truncated Taylor series using a difference method. We assume the availability of large and fast (at least for read purposes) memory. We call this method the ATA (Add-Table lookup-Add) method. As the name implies, the hardware required for the method are adders (both two/ and multi/operand adders) and fast tables. For IEEE single precision numbers our initial estimates indicate that we can calculate the basic elementary functions, namely reciprocal, square root, logarithm, exponential, trigonometric and inverse trigonometric functions, within the latency of two to four floating point multiplies. >

Journal ArticleDOI
TL;DR: A new test generation technique for path delay faults in circuits employing scan/hold type flip-flops is presented, and results show that the algebraic technique is one to two orders of magnitude faster than previously reported methods based on branch-and-bound algorithms.
Abstract: A new test generation technique for path delay faults in circuits employing scan/hold type flip-flops is presented. Reduced ordered binary decision diagrams (ROBDDs) are used to represent Boolean functions realized by all signals in the circuit, as well as to represent the constraints to be satisfied by the delay fault test. Two faults are considered for each path in the circuit. For each fault, a pair of constraint functions, corresponding to the two time frames that constitute a transition, is evaluated. If the constraint function in the second time frame is non-null, robust-hazard-free-test generation for the delay fault is attempted. A robust test thus generated belongs either to the class of fully transitional path (FTP) tests or to the class of single input transition (SIT) tests. If a robust test cannot be found, the existence of a non-robust test is checked. Boolean algebraic manipulation of the constraint functions guarantees that if neither robust nor non-robust tests exist, the fault is undetectable. In its present form the method is applicable to all circuits that are amenable to analysis using ROBDDs. An implementation of this technique is used to analyze delay fault testability of ISCAS '89 benchmark circuits. These results show that the algebraic technique is one to two orders of magnitude faster than previously reported methods based on branch-and-bound algorithms. >

Journal ArticleDOI
TL;DR: This work presents a technique based on a continuous task model that very closely approximates discrete models and tasks with varying characteristics and aims to maximize the total performance index, which is a performance-related reliability measurement.
Abstract: Many real-time systems have both performance requirements and reliability requirements. Performance is usually measured in terms of the value in completing tasks on time. Reliability is evaluated by hardware and software failure models. In many situations, there are trade-offs between task performance and task reliability. Thus, a mathematical assessment of performance-reliability trade-offs is necessary to evaluate the performance of real-time fault-tolerance systems. Assuming that the reliability of task execution is achieved through task replication, we present an approach that mathematically determines the replication factor for tasks. Our approach is novel in that it is a task schedule based analysis rather than a state based analysis as found in other models. Because we use a task schedule based analysis, we can provide a fast method to determine optimal redundancy levels, we are not limited to hardware reliability given by constant failure rate functions as in most other models, and we hypothesize that we can more naturally integrate with online real-time scheduling than when state based techniques are used. In this work, the goal is to maximize the total performance index, which is a performance-related reliability measurement. We present a technique based on a continuous task model and show how it very closely approximates discrete models and tasks with varying characteristics. >