scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Computers in 1998"


Journal ArticleDOI
TL;DR: This study compares three consistency approaches: adaptive TTL, polling-every-time and invalidation, through analysis, implementation, and trace replay in a simulated environment and shows that strong cache consistency can be maintained for the Web with little or no extra cost than the current weak consistency approaches.
Abstract: As the Web continues to explode in size, caching becomes increasingly important. With caching comes the problem of cache consistency. Conventional wisdom holds that strong cache consistency is too expensive for the Web, and weak consistency methods, such as Time-To-Live (TTL), are most appropriate. This study compares three consistency approaches: adaptive TTL, polling-every-time and invalidation, through analysis, implementation, and trace replay in a simulated environment. Our analysis shows that weak consistency methods save network bandwidth mostly at the expense of returning stale documents to users. Our experiments show that invalidation generates a comparable amount of network traffic and server workload to adaptive TTL and has similar average client response times, while polling-every-time results in more control messages, higher server workload, and longer client response times. We show that, contrary to popular belief, strong cache consistency can be maintained for the Web with little or no extra cost than the current weak consistency approaches, and it should be maintained using an invalidation-based protocol.

369 citations


Journal ArticleDOI
TL;DR: The results show that the proposed approach gives a prediction of the worst case cache-related preemption delay that is up to 60 percent tighter than those obtained from the previous approaches.
Abstract: We propose a technique for analyzing cache-related preemption delays of tasks that cause unpredictable variation in task execution time in the context of fixed-priority preemptive scheduling. The proposed technique consists of two steps. The first step performs a per-task analysis to estimate cache-related preemption cost for each execution point in a given task. The second step computes the worst case response time of each task that includes the cache-related preemption delay using a response time equation and a linear programming technique. This step takes as its input the preemption cost information of tasks obtained in the first step. This paper also compares the proposed approach with previous approaches. The results show that the proposed approach gives a prediction of the worst case cache-related preemption delay that is up to 60 percent tighter than those obtained from the previous approaches.

264 citations


Journal ArticleDOI
M. Barbehenn1
TL;DR: For directed graphs in which each vertex has a nonnegative weight, it is shown that the time complexity of Dijkstra's algorithm, implemented with a binary heap, is O(|E|+ |V|log|V|).
Abstract: Let G(V, E) be a directed graph in which each vertex has a nonnegative weight. The cost of a path between two vertices in G is the sum of the weights of the vertices on that path. We show that, for such graphs, the time complexity of Dijkstra's algorithm (E.W. Dijkstra, 1959), implemented with a binary heap, is O(|E|+|V|log|V|).

254 citations


Journal ArticleDOI
TL;DR: A novel method to derive a Petri net from any specification model that can be mapped into a state-based representation with arcs labeled with symbols from an alphabet of events (a Transition System, TS) by using the following three mechanisms.
Abstract: This paper presents a novel method to derive a Petri net from any specification model that can be mapped into a state-based representation with arcs labeled with symbols from an alphabet of events (a Transition System, TS). The method is based on the theory of regions for Elementary Transition Systems (ETS). Previous work has shown that, for any ETS, there exists a Petri Net with minimum transition count (one transition for each label) with a reachability graph isomorphic to the original Transition System. Our method extends and implements that theory by using the following three mechanisms that provide a framework for synthesis of safe Petri nets from arbitrary TSs. First, the requirement of isomorphism is relaxed to bisimulation of TSs, thus extending the class of synthesizable TSs to a new class called Excitation-Closed Transition Systems (ECTS). Second, for the first time, we propose a method of PN synthesis for an arbitrary TS based on mapping a TS event into a set of transition labels in a PN. Third, the notion of irredundant region set is exploited, to minimize the number of places in the net without affecting its behavior. The synthesis method can derive different classes of place-irredundant Petri Nets (e.g., pure, free choice, unique choice) from the same TS, depending on the constraints imposed on the synthesis algorithm. This method has been implemented and applied in different frameworks. The results obtained from the experiments have demonstrated the wide applicability of the method.

252 citations


Journal ArticleDOI
TL;DR: The analysis illustrates why the exponential assumption for call hold time results in the underestimation of handoff rate, which then leads to the actual blocking probabilities being higher than the blocking probabilities for MC/PCS networks designed using the exponential distribution approximation for call holding time.
Abstract: This paper presents a study of channel occupancy times and handoff rate for mobile computing in MC (Mobile Computing) and PCS (Personal Communications Services) networks, using general operational assumptions. It is shown that, for exponentially distributed call holding times, a distribution more appropriate for conventional voice telephony, the channel occupancy times are exponentially distributed if and only if the cell residence times are exponentially distributed. It is further shown that the merged traffic from new calls and handoff calls is Poisson if and only if the cell residence times are exponentially distributed, too. When cell residence times follow a general distribution, a more appropriate way to model mobile computing sessions, new formulae for channel occupancy time distributions are obtained. Moreover, when the call holding times and the cell residence times have general (nonlattice) distributions, general formulae for computing the handoff rate during a call connection and handoff call arrival rate to a cell are given. Our analysis illustrates why the exponential assumption for call holding time results in the underestimation of handoff rate, which then leads to the actual blocking probabilities being higher than the blocking probabilities for MC/PCS networks designed using the exponential distribution approximation for call holding time. The analytical results presented in this paper can be expected to play a significant role in teletraffic analysis and system design for MC/PCS networks.

244 citations


Journal ArticleDOI
TL;DR: An analytical model of a software system which serves transactions is presented and expressions for resulting steady state availability, probability that an arriving transaction is lost and an upper bound on the expected response time of a transition are derived.
Abstract: Preventive maintenance of operational software systems, a novel technique for software fault tolerance, is used specifically to counteract the phenomenon of software "aging". However, it incurs some overhead. The necessity to do preventive maintenance, not only in general purpose software systems of mass use, but also in safety-critical and highly available systems, clearly indicates the need to follow an analysis based approach to determine the optimal times to perform preventive maintenance. In this paper, we present an analytical model of a software system which serves transactions. Due to aging, not only the service rate of the software decreases with time, but also the software itself experiences crash/hang failures which result in its unavailability. Two policies for preventive maintenance are modeled and expressions for resulting steady state availability, probability that an arriving transaction is lost and an upper bound on the expected response time of a transition are derived. Numerical examples are presented to illustrate the applicability of the models.

214 citations


Journal ArticleDOI
TL;DR: A new low-complexity bit-parallel canonical basis multiplier for the field GF(2m) generated by an all-one-polynomial is presented and extended to obtain a new bit-Parallel normal basis multiplier.
Abstract: We present a new low-complexity bit-parallel canonical basis multiplier for the field GF(2m) generated by an all-one-polynomial. The proposed canonical basis multiplier requires m/sup 2/-1 XOR gates and m/sup 2/ AND gates. We also extend this canonical basis multiplier to obtain a new bit-parallel normal basis multiplier.

205 citations


Journal ArticleDOI
TL;DR: A new RNS modular multiplication for very large operands is presented, based on Montgomery's method adapted to mixed radix, and is performed using a residue number system.
Abstract: We present a new RNS modular multiplication for very large operands. The algorithm is based on Montgomery's method adapted to mixed radix, and is performed using a residue number system. By choosing the moduli of the RNS system reasonably large and implementing the system on a ring of fairly simple processors, an effect corresponding to a redundant high-radix implementation is achieved. The algorithm can be implemented to run in O(n) time on O(n) processors, where n is the number of moduli in the RNS system, and the unit of time is a simple residue operation, possibly by table look-up. Two different implementations are proposed, one based on processors attached to a broadcast bus, another on an oriented ring structure.

185 citations


Journal ArticleDOI
TL;DR: It has been shown, for a fanout free circuit under test, that the transition test generation cost for a fault is the minimum number of transitions required to test a given stuck-at fault.
Abstract: A automatic test pattern generator (ATPG) algorithm is proposed that reduces switching activity (between successive test vectors) during test application. The main objective is to permit safe and inexpensive testing of low power circuits and bare die that might otherwise require expensive heat removal equipment for testing at high speeds, Three new cost functions, namely transition controllability, observability, and test generation costs, have been defined. It has been shown, for a fanout free circuit under test, that the transition test generation cost for a fault is the minimum number of transitions required to test a given stuck-at fault. The proposed algorithm has been implemented and the generated tests are compared with those generated by a standard PODEM implementation for the larger ISCAS85 benchmark circuits. The results clearly demonstrate that the tests generated using the proposed ATPG can decrease the average number of (weighted) transitions between successive test vectors by a factor of 2 to 23.

166 citations


Journal ArticleDOI
TL;DR: This work provides a formal characterization of optimal PPRT circuits and proves a number of properties about them and presents an algorithm that produces a minimum delay circuit in time linear in the size of the inputs.
Abstract: We present new design and analysis techniques for the synthesis of parallel multiplier circuits that have smaller predicted delay than the best current multipliers. V.G. Oklobdzija et al. (1996) suggested a new approach, the Three-Dimensional Method (TDM), for Partial Product Reduction Tree (PPRT) design that produces multipliers that outperform the current best designs. The goal of TDM is to produce a minimum delay PPRT using full adders. This is done by carefully modeling the relationship of the output delays to the input delays in an adder and, then, interconnecting the adders in a globally optimal way. Oklobdzija et al. suggested a good heuristic for finding the optimal PPRT, but no proofs about the performance of this heuristic were given. We provide a formal characterization of optimal PPRT circuits and prove a number of properties about them. For the problem of summing a set of input bits within the minimum delay, we present an algorithm that produces a minimum delay circuit in time linear in the size of the inputs. Our techniques allow us to prove tight lower bounds on multiplier circuit delays. These results are combined to create a program that finds optimal TDM multiplier designs. Using this program, we can show that, while the heuristic used by Oklobdzija et al. does not always find the optimal TDM circuit, it performs very well in terms of overall PPRT circuit delay. However, our search algorithms find better PPRT circuits for reducing the delay of the entire multiplier.

158 citations


Journal ArticleDOI
TL;DR: The FRIENDS software-based architecture, the object-oriented development of metaobjects, the experiments that are done, and the advantages and drawbacks of a metaobject approach for building fault-tolerant systems are described.
Abstract: The FRIENDS system developed at LAAS-CNRS is a metalevel architecture providing libraries of metaobjects for fault tolerance, secure communication, and group-based distributed applications. The use of metaobjects provides a nice separation of concerns between mechanisms and applications. Metaobjects can be used transparently by applications and can be composed according to the needs of a given application, a given architecture, and its underlying properties. In FRIENDS, metaobjects are used recursively to add new properties to applications. They are designed using an object oriented design method and implemented on top of basic system services. This paper describes the FRIENDS software-based architecture, the object-oriented development of metaobjects, the experiments that we have done, and summarizes the advantages and drawbacks of a metaobject approach for building fault-tolerant systems.

Journal ArticleDOI
TL;DR: An efficient scheme to compress and decompress in parallel deterministic test patterns for circuits with multiple scan chains while achieving a complete fault coverage for any fault model for which test cubes are obtainable is presented.
Abstract: The paper presents an efficient scheme to compress and decompress in parallel deterministic test patterns for circuits with multiple scan chains. It employs a boundary-scan-based environment for high quality testing with flexible trade-offs between test data volume and test application time while achieving a complete fault coverage for any fault model for which test cubes are obtainable. It also reduces bandwidth requirements, as all test cube transfers involve compressed data. The test patterns are generated by the reseeding of a two-dimensional hardware structure which is comprised of a linear feedback shift register (LFSR), a network of exclusive-or (XOR) gates used to scramble the bits of test vectors, and extra feedbacks which allow including internal scan flip-flops into the decompressor structure to minimize the area overhead. The test data decompressor operates in two modes: pseudorandom and deterministic. In the first mode, the pseudorandom pattern generator (PRPG) is used purely as a generator of test vectors. In the latter case, variable-length seeds are serially scanned through the boundary-scan interface into the PRPG and parts of internal scan chains and, subsequently, a decompression is performed in parallel by means of the PRPG and selected scan flip-flops interconnected to form the decompression device. Extensive experiments with the largest ISCAS' 89 benchmarks show that the proposed technique greatly reduces the amount of test data in a cost effective manner.

Journal ArticleDOI
TL;DR: Compared to other techniques for fault tolerance in FPGAs, these methods are shown to provide significantly greater yield improvement, and a 35 percent non-FT chip yield for a 16/spl times/16 FPGA is more than doubled.
Abstract: The very high levels of integration and submicron device sizes used in current and emerging VLSI technologies for FPGAs lead to higher occurrences of defects and operational faults. Thus, there is a critical need for fault tolerance and reconfiguration techniques for FPGAs to increase chip yields (with factory reconfiguration) and/or system reliability (with field reconfiguration). We first propose techniques utilizing the principle of node-covering to tolerate logic or cell faults in SRAM-based FPGAs. A routing discipline is developed that allows each cell to cover-to be able to replace-its neighbor in a row. Techniques are also proposed for tolerating wiring faults by means of replacement with spare portions. The replaceable portions can be individual segments, or else sets of segments, called "grids". Fault detection in the FPGAs is accomplished by separate testing, either at the factory or by the user. If reconfiguration around faulty cells and wiring is performed at the factory (with laser-burned fuses, for example), it is completely transparent to the user. In other words, user configuration data loaded into the SRAM remains the same, independent of whether the chip is detect-free or whether it has been reconfigured around defective cells or wiring-a major advantage for hardware vendors who design and sell FPGA-based logic (e.g., glue logic in microcontrollers, video cards, DSP cards) in production-scale quantities. Compared to other techniques for fault tolerance in FPGAs, our methods are shown to provide significantly greater yield improvement, and a 35 percent non-FT chip yield for a 16/spl times/16 FPGA is more than doubled.

Journal ArticleDOI
TL;DR: This paper presents a simple and automatic method to extract the control flow of a circuit so that the resulting state space can be explored for validation coverage analysis and automatic test generation.
Abstract: The enormous state spaces which must be searched when verifying the correctness of, or generating tests for, complex circuits precludes the use of traditional approaches. Hard-to-find abstractions are often required to simplify the circuits and make the problems tractable. This paper presents a simple and automatic method to extract the control flow of a circuit so that the resulting state space can be explored for validation coverage analysis and automatic test generation. This control flow, capturing the essential "behavior" of the circuit, is represented as a finite state machine called the ECFM (Extracted Control Flow Machine). Simulation is currently the primary means of verifying large circuits, but the definition of a coverage measure for simulation vectors is an open problem. We define functional coverage as the amount of control behavior covered by the test suite. We then combine formal verification techniques, using BDDs as the underlying representation, with traditional ATPG techniques to automatically generate additional sequences which traverse uncovered parts of the control state graph. We also demonstrate how the same abstraction techniques can complement ATPG techniques when attacking hard-to-detect faults in the control part of the design for which conventional ATPG alone proves to be inadequate or inefficient at best. Results on large designs show significant improvement over conventional algorithms.

Journal ArticleDOI
TL;DR: A Hierarchical Adaptive Distributed System-level Diagnosis (Hi-ADSD) algorithm, which is a fully distributed algorithm that allows every fault-free node to achieve diagnosis in, at most, (log/sub 2/ N)/sup 2/ testing rounds.
Abstract: Consider a system composed of N nodes that can be faulty or fault-free. The purpose of distributed system-level diagnosis is to have each fault-free node determine the state of all nodes of the system. This paper presents a Hierarchical Adaptive Distributed System-level Diagnosis (Hi-ADSD) algorithm, which is a fully distributed algorithm that allows every fault-free node to achieve diagnosis in, at most, (log/sub 2/ N)/sup 2/ testing rounds. Nodes are mapped into progressively larger logical clusters, so that tests are run in a hierarchical fashion. Each node executes its tests independently of the other nodes, i.e., tests are run asynchronously. All the information that nodes exchange is diagnostic information. The algorithm assumes no link faults, a fully-connected network and imposes no bounds on the number of faults. Both the worst-case diagnosis latency and correctness of the algorithm are formally proved. As an example application, the algorithm was implemented on a 37-node Ethernet LAN, integrated to a network management system based on SNMP (Simple Network Management Protocol). Experimental results of fault and repair diagnosis are presented. This implementation by itself is also a significant contribution, for, although fault management is a key functional area of network management systems, currently deployed applications often implement only rudimentary diagnosis mechanisms. Furthermore, experimental results are given through simulation of the algorithm for large systems of 64 nodes and 512 nodes.

Journal ArticleDOI
TL;DR: Although conventional ORBs do not yet provide adequate QoS guarantees to applications, the research results indicate it is possible to implement ORBs that can support high-performance, real-time applications.
Abstract: There is increasing demand to extend object-oriented middleware, such as OMG CORBA, to support applications with stringent quality of service (QoS) requirements. However, conventional CORBA Object Request Broker (ORE) implementations incur high latency and low scalability when used for performance-sensitive applications. These inefficiencies discourage developers from using CORBA for mission/life-critical applications such as real-time avionics, telecom call processing, and medical imaging. This paper provides two contributions to the research on CORBA performance. First, we systematically analyze the latency and scalability of two widely used CORBA ORBs, VisiBroker and Orbix. These results reveal key sources of overhead in conventional ORBs. Second, we describe techniques used to improve latency and scalability in TAO, which is a high-performance, real-time implementation of CORBA. Although conventional ORBs do not yet provide adequate QoS guarantees to applications, our research results indicate it is possible to implement ORBs that can support high-performance, real-time applications.

Journal ArticleDOI
TL;DR: A mechanical theorem prover certified the assertion that the quotient will always be correctly rounded to the target precision of the division microcode program used on the AMD5/sub K/86 microprocessor.
Abstract: We report on the successful application of a mechanical theorem prover to the problem of verifying the division microcode program used on the AMD5/sub K/86 microprocessor. The division algorithm is an iterative shift and subtract type. It was implemented using floating point microcode instructions. As a consequence, the floating quotient digits have data dependent precision. This breaks the constraints of conventional SRT division theory. Hence, an important question was whether the algorithm still provided perfectly rounded results at 24, 53, or 64 bits. The mechanically checked proof of this assertion is the central topic of the paper. The proof was constructed in three steps. First, the divide microcode was translated into a formal intermediate language. Then, a manually created proof was transliterated into a series of formal assertions in the ACL2 dialect. After many expansions and modifications to the original proof, the theorem prover certified the assertion that the quotient will always be correctly rounded to the target precision.

Journal ArticleDOI
TL;DR: This work proposes an effective load balancing algorithm for balancing the load over an entire distributed/parallel system and shows that the proposed algorithm has much faster convergence in terms of computational time than the FD algorithm.
Abstract: Load balancing problems for multiclass jobs in distributed/parallel computer systems with general network configurations are considered. We construct a general model of such a distributed/parallel computer system. The system consists of heterogeneous host computers/processors (nodes) which are interconnected by a generally configured communication/interconnection network wherein there are several classes of jobs, each of which has its distinct delay function at each host and each communication link. This model is used to formulate the multiclass job load balancing problem as a nonlinear optimization problem in which the goal is to minimize the mean response time of a job. A number of simple and intuitive theoretical results on the solution of the optimization problem are derived. On the basis of these results, we propose an effective load balancing algorithm for balancing the load over an entire distributed/parallel system. The proposed algorithm has two attractive features. One is that the algorithm can be implemented in a decentralized fashion. Another feature is simple and straightforward structure. Models of nodes, communication networks, and a numerical example are illustrated. The proposed algorithm is compared with a well-known standard steepest-descent algorithm, the FD algorithm. By using numerical experiments, we show that the proposed algorithm has much faster convergence in terms of computational time than the FD algorithm.

Journal ArticleDOI
TL;DR: An upper bound on the size complexity of bit-parallel multiplier using an arbitrary generating polynomial is given and the structure of the proposed WDB multipliers over the binary ground field is proposed.
Abstract: New structures of bit-parallel weakly dual basis (WDB) multipliers over the binary ground field are proposed. An upper bound on the size complexity of bit-parallel multiplier using an arbitrary generating polynomial is given. When the generating polynomial is an irreducible trinomial x/sup m/+x/sup k/+1, 1/spl les/k/spl les/[m/2], the structure of the proposed bit-parallel multiplier requires only m/sup 2/ two-input AND gates and at most m/sup 2/-1 XOR gates. The time delay is no greater than T/sub A/+([log/sub 2/ m]+2)T/sub x/, where T/sub A/ and T/sub X/ are the time delays of an AND gate and an XOR gate, respectively.

Journal ArticleDOI
TL;DR: A widely used bus-encryption microprocessor is vulnerable to a new practical attack that allows easy, unauthorized access to the decrypted memory content.
Abstract: A widely used bus-encryption microprocessor is vulnerable to a new practical attack. This type of processor decrypts on-the-fly while fetching code and data, which are stored in RAM only in encrypted form. The attack allows easy, unauthorized access to the decrypted memory content.

Journal ArticleDOI
TL;DR: This paper presents two new systolic arrays to realize Euclid's algorithm for computing inverses and divisions in finite fields GF(2/sup m/) with the standard basis representation using parallel-in parallel-out and serial-in serial-out schemes.
Abstract: This paper presents two new systolic arrays to realize Euclid's algorithm for computing inverses and divisions in finite fields GF(2/sup m/) with the standard basis representation. One of these two schemes is parallel-in parallel-out, and the other is serial-in serial-out. The former employs O(m/sup 2/) area complexity to provide the maximum throughput in the sense of producing one result every clock cycle, while the latter achieves a throughput of one result per m clock cycles using O(m log,m) area complexity. Both of the proposed architectures are highly regular and, thus, well suited to VLSI implementation. As compared to existing related systolic architectures with the same throughput performance, the proposed parallel-in parallel-out scheme reduces the hardware complexity (and, thus, the area-time product) by a factor of O(m) and the proposed serial-in serial-out scheme by a factor of O(m/log/sub 2/m).

Journal ArticleDOI
TL;DR: The architecture of the InfoPad terminal can be viewed as essentially a switch which connects multimedia data sources in the supporting wired network to appropriate InfoPad output devices, and connects InfoPad input devices to remote processing in the backbone network.
Abstract: The architecture of a device that is optimized for wireless information access and display of multimedia data is substantially different than configurations designed for portable stand-alone operation. The requirements to reduce the weight and energy consumption are the same, but the availability of the wireless link, which is needed for the information access, allows utilization of remote resources. A limiting case is when the only computation that is provided in the portable terminal supports the wireless links or the I/O interfaces, and it is this extreme position that is explored in the InfoPad terminal design. The architecture of the InfoPad terminal, therefore, can be viewed as essentially a switch which connects multimedia data sources in the supporting wired network to appropriate InfoPad output devices (e.g., video display), and connects InfoPad input devices to remote processing (e.g., speech recognizer server) in the backbone network.

Journal ArticleDOI
TL;DR: A new class of multipliers for finite fields GF((2/sup n/)/sup 4/) is introduced, based on a modified version of the Karatsuba-Ofman algorithm, which leads to architectures which show a considerably improved gate complexity compared to traditional approaches and reduced delay if compared with KOA-based architectures with separate module reduction.
Abstract: This contribution introduces a new class of multipliers for finite fields GF((2/sup n/)/sup 4/). The architecture is based on a modified version of the Karatsuba-Ofman algorithm (KOA). By determining optimized field polynomials of degree four, the last stage of the KOA and the module reduction can be combined. This saves computation and area in VLSI implementations. The new algorithm leads to architectures which show a considerably improved gate complexity compared to traditional approaches and reduced delay if compared with KOA-based architectures with separate module reduction. The new multipliers lead to highly modular architectures and are, thus, well suited for VLSI implementations. Three types of field polynomials are introduced and conditions for their existence are established. For the small fields, where n=2,3,...,8, which are of primary technical interest, optimized field polynomials were determined by an exhaustive search. For each field order, exact space and time complexities are provided.

Journal ArticleDOI
TL;DR: Evidence that there are no efficient algorithms for locating maximum sets of paths with path independence properties is given and several approximation algorithms for these problems are proposed.
Abstract: Authentication using a path of trusted intermediaries, each able to authenticate the next in the path, is a well-known technique for authenticating channels in a large distributed system. In this paper, we explore the use of multiple paths to redundantly authenticate a channel and focus on two notions of path independence-disjoint paths and connective paths-that seem to increase assurance in the authentication. We give evidence that there are no efficient algorithms for locating maximum sets of paths with these independence properties and propose several approximation algorithms for these problems. We also describe a service we have deployed, called PathServer, that makes use of our algorithms to find such sets of paths to support authentication in PGP applications.

Journal ArticleDOI
TL;DR: In this paper, the rate monotonic scheduling (RMS) policy is used for real-time systems to guarantee a feasible schedule on a single processor as long as the utilization factor of the task set is below n(2/sup 1/n/-1) which converges to 0.69 for large n. The priority of the RMS policy is maintained even during recovery.
Abstract: The Rate Monotonic Scheduling (RMS) policy is a widely accepted scheduling strategy for real-time systems due to strong theoretical foundations and features attractive to practical uses. For a periodic task set of n tasks with deadlines at the end of task periods, it guarantees a feasible schedule on a single processor as long as the utilization factor of the task set is below n(2/sup 1/n/-1) which converges to 0.69 for large n. We analyze the schedulability of a set of periodic tasks that is scheduled by the RMS policy and is susceptible to a single fault. The recovery action is the reexecution of all uncompleted tasks. The priority of the RMS policy is maintained even during recovery. Under these conditions, we guarantee that no task will miss a single deadline, even in the presence of a fault, if the utilization factor on the processor does not exceed 0.5. Thus, 0.5 is the minimum achievable utilization that permits recovery from faults before the expiration of the deadlines of the tasks. This bound is better than the trivial bound of 0.6912=0.345 that would be obtained if computation times were doubled to provide for reexecutions in the RMS analysis. Our result provides scheduling guarantees for tolerating a variety of intermittent and transient hardware and software faults that can be handled simply by reexecution. In addition, we demonstrate how permanent faults can be tolerated efficiently by maintaining common spares among a set of processors that are independently executing periodic tasks.

Journal ArticleDOI
TL;DR: In this article, a module (2/sup n/1) carry save adder (MCSA) was proposed to reduce the number of partial products in the module multiplication scheme, which is suitable for VLSI implementation for moderate and large n/spl ges/16.
Abstract: The module (2/sup n/+1) multiplication is widely used in the computation of convolutions and in RNS arithmetic and, thus, it is important to reduce the calculation delay. This paper presents a concept of a module (2/sup n/+1) carry save adder (MCSA) and uses two MCSAs to perform the residue reduction. We also apply Booth's algorithm to the module (2/sup n/+1) multiplication scheme in order to reduce the number of partial products. With these techniques, the new architecture reduces the multiplier's calculation delay and is suitable for VLSI implementation for moderate and large n (n/spl ges/16).

Journal ArticleDOI
TL;DR: A system-level simulation model is described and it is shown that it enables accurate predictions of both I/O subsystem and overall system performance, and properly captures the feedback and subsequent performance effects.
Abstract: We describe a system-level simulation model and show that it enables accurate predictions of both I/O subsystem and overall system performance. In contrast, the conventional approach for evaluating the performance of an I/O subsystem design, which is based on standalone subsystem models, is often unable to accurately predict performance changes because it is too narrow in scope. In particular, conventional methodology treats all I/O requests equally, ignoring differences in how individual requests' response times affect system behavior (including both system performance and the subsequent I/O workload). We introduce the concept of request criticality to describe these feedback effects and show that real I/O workloads are not approximated well by either open or closed input models. Because conventional methodology ignores this fact, it often leads to inaccurate performance predictions and can thereby lead to incorrect conclusions and poor design choices. We illustrate these problems with real examples and show that a system-level model, which includes both the I/O subsystem and other important system components (e.g., CPUs and system software), properly captures the feedback and subsequent performance effects.

Journal ArticleDOI
TL;DR: The bounds of the minimum vertex cut set for m-ary n-dimensional hypercubes are studied by requiring each node to have at least k healthy neighbors to show that this model can better reflect fault patterns in a real system than the existing ones.
Abstract: In this paper, we study fault tolerance measures for m-ary n-dimensional hypercubes based on the concept of forbidden faulty sets. In a forbidden faulty set, certain nodes cannot be faulty at the same time and this model can better reflect fault patterns in a real system than the existing ones. Specifically, we study the bounds of the minimum vertex cut set for m-ary n-dimensional hypercubes by requiring each node to have at least k healthy neighbors. Our result enhances and generalizes a result by Latifi et al. for binary hypercubes. Our study also shows that the corresponding result based on the traditional fault model (where k is zero) tends to underestimate network resilience of large networks such as m-ary n-dimensional hypercubes.

Journal ArticleDOI
TL;DR: This paper presents recent advances in the design of constant-time up/down counters in the general context of fast counter design and reveals several methods closely related to the designs of fast adders, as well as some techniques that are only valid for counter design.
Abstract: This paper presents recent advances in the design of constant-time up/down counters in the general context of fast counter design. An overview of existing techniques for the design of long and fast counters reveals several methods closely related to the design of fast adders, as well as some techniques that are only valid for counter design. The main idea behind the novel up/down counters is to recognize that the only extra difficulty with an up/down (vs. up-only or down-only) counter is when the counter changes direction from counting up to counting down (and vice-versa). For dealing with this difficulty, the new design uses a "shadow" register for storing the previous counter state. When counting only up or only down, the counter functions like a standard up-only or down-only constant time counter, but, when it changes direction instead of trying to compute the new value (which typically requires carry propagation), it simply uses the contents of the shadow register which contains the exact desired previous value. An alternative approach for restoring the previous state in constant time is to store the carry bits in a Carry/Borrow register.

Journal ArticleDOI
TL;DR: This short paper summarizes the recent results on construction of low-complexity bit-parallel finite field multiplier using polynomial basis and the complexity and time delay of the proposed multipliers are lower than those of similar proposals.
Abstract: New implementations of bit-parallel multipliers for a class of finite fields are proposed. The class of finite fields is constructed with irreducible AOPs (all one polynomials) and ESPs (equally spaced polynomials). The size and time complexities of our proposed multipliers are lower than or equal to those of the previously proposed multipliers of the same class.