scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Computers in 2003"


Journal ArticleDOI
TL;DR: This paper presents SCAMP (Scalable Membership protocol), a novel peer-to-peer membership protocol which operates in a fully decentralized manner and provides each member with a partial view of the group membership and proposes additional mechanisms to achieve balanced view sizes even with highly unbalanced subscription patterns.
Abstract: Gossip-based protocols for group communication have attractive scalability and reliability properties. The probabilistic gossip schemes studied so far typically assume that each group member has full knowledge of the global membership and chooses gossip targets uniformly at random. The requirement of global knowledge impairs their applicability to very large-scale groups. In this paper, we present SCAMP (Scalable Membership protocol), a novel peer-to-peer membership protocol which operates in a fully decentralized manner and provides each member with a partial view of the group membership. Our protocol is self-organizing in the sense that the size of partial views naturally converges to the value required to support a gossip algorithm reliably. This value is a function of the group size, but is achieved without any node knowing the group size. We propose additional mechanisms to achieve balanced view sizes even with highly unbalanced subscription patterns. We present the design, theoretical analysis, and a detailed evaluation of the basic protocol and its refinements. Simulation results show that the reliability guarantees provided by SCAMP are comparable to previous schemes based on global knowledge. The scale of the experiments attests to the scalability of the protocol.

526 citations


Journal ArticleDOI
TL;DR: This work provides efficient distributed algorithms to optimally solve the best-coverage problem raised in the above-mentioned article and considers a more general sensing model: the sensing ability diminishes as the distance increases.
Abstract: Sensor networks pose a number of challenging conceptual and optimization problems such as location, deployment, and tracking. One of the fundamental problems in sensor networks is the calculation of the coverage. In Meguerdichian et al. (2001), it is assumed that the sensor has uniform sensing ability. We provide efficient distributed algorithms to optimally solve the best-coverage problem raised in the above-mentioned article. In addition, we consider a more general sensing model: the sensing ability diminishes as the distance increases. As energy conservation is a major concern in wireless (or sensor) networks, we also consider how to find an optimum best-coverage-path with the least energy consumption and how to find an optimum best-coverage-path that travels a small distance. In addition, we justify the correctness of the method proposed above that uses the Delaunay triangulation to solve the best coverage problem and show that the search space of the best coverage problem can be confined to the relative neighborhood graph, which can be constructed locally.

483 citations


Journal ArticleDOI
TL;DR: Simulation results show that several adaptive schemes, which can dynamically adjust thresholds based on local connectivity information can offer better reachability as well as efficiency as compared to the previous results.
Abstract: In a multihop mobile ad hoc network, broadcasting is an elementary operation to support many applications. Previously, it is shown that naively broadcasting by flooding may cause serious redundancy, contention, and collision in the network, which we refer to as the broadcast storm problem. Several threshold-based schemes are shown to perform better than flooding in that work. However, how to choose thresholds also poses a dilemma between reachability and efficiency under different host densities. In this paper, we propose several adaptive schemes, which can dynamically adjust thresholds based on local connectivity information. Simulation results show that these adaptive schemes can offer better reachability as well as efficiency as compared to the previous results.

462 citations


Journal ArticleDOI
TL;DR: A jittered oscillator which features an amplified thermal noise source has been designed in order to increase the output throughput and the statistical quality of the generated bit sequences, thus solving one of the major issues in this kind of circuit.
Abstract: The design of a high-speed IC random number source macro-cell, suitable for integration in a smart card microcontroller, is presented. The oscillator sampling technique is exploited and a jittered oscillator which features an amplified thermal noise source has been designed in order to increase the output throughput and the statistical quality of the generated bit sequences. The oscillator feedback loop acts as an offset compensation for the noise amplifier, thus solving one of the major issues in this kind of circuit. A numerical model for the proposed system has been developed which allows us to carry out an analytical expression for the transition probability between successive bits in the output stream. A prototype chip has been fabricated in a standard digital 0.18 /spl mu/m n-well CMOS process which features a 10 Mbps throughput and fulfills the NIST FIPS and correlation-based tests for randomness. The macro-cell area, excluding pads, is 0.0016 mm/sup 2/ (184 /spl mu/m /spl times/ 86 /spl mu/m) and a 2.3 mW power consumption has been measured.

393 citations


Journal ArticleDOI
TL;DR: Two fault detection schemes are presented: the first is a redundancy-based scheme while the second uses an error detecting code, which is a novel scheme which leads to very efficient and high coverage fault detection.
Abstract: The goal of the Advanced Encryption Standard (AES) is to achieve secure communication. The use of AES does not, however, guarantee reliable communication. Prior work has shown that even a single transient error occurring during the AES encryption (or decryption) process will very likely result in a large number of errors in the encrypted/decrypted data. Such faults must be detected before sending to avoid the transmission and use of erroneous data. Concurrent fault detection is important not only to protect the encryption/decryption process from random faults. It will also protect the encryption/decryption circuitry from an attacker who may maliciously inject faults in order to find the encryption secret key. In this paper, we first describe some studies of the effects that faults may have on a hardware implementation of AES by analyzing the propagation of such faults to the outputs. We then present two fault detection schemes: The first is a redundancy-based scheme while the second uses an error detecting code. The latter is a novel scheme which leads to very efficient and high coverage fault detection. Finally, the hardware costs and detection latencies of both schemes are estimated.

379 citations


Journal ArticleDOI
Akashi Satoh1, K. Takano1
TL;DR: An elliptic curve (EC) cryptographic processor architecture that can support Galois fields GF(p) and GF(2/sup n/) for arbitrary prime numbers and irreducible polynomials is proposed by introducing a dual field multiplier.
Abstract: We propose an elliptic curve (EC) cryptographic processor architecture that can support Galois fields GF(p) and GF(2/sup n/) for arbitrary prime numbers and irreducible polynomials by introducing a dual field multiplier. A Montgomery multiplier with an optimized data bus and an on-the-fly redundant binary converter boost the throughput of the EC scalar multiplication. All popular cryptographic functions such as DSA, EC-DSA, RSA, CRT, and prime generation are also supported. All commands are organized in a hierarchical structure according to their complexity. Our processor has high scalability and flexibility between speed, hardware area, and operand size. In the hardware evaluation using a 0.13-/spl mu/m CMOS standard cell library, the high-speed design using 117.5 Kgates with a 64-bit multiplier achieved operation times of 1.21 ms and 0.19 ms for a 160-bit EC scalar multiplication in GF(p) and GF(2/sup n/), respectively. A compact version with an 8-bit multiplier requires only 28.3 K gates and executes the operations in 7.47 ms and 2.79 ms. Not only 160-bit operations, but any bit length can be supported by any hardware configuration so long as the memory capacity is sufficient.

272 citations


Journal ArticleDOI
TL;DR: A word-based version of MM is presented and used to explain the main concepts in the hardware design and gives enough freedom to select the word size and the degree of parallelism to be used, according to the available area and/or desired performance.
Abstract: This paper presents a scalable architecture for the computation of modular multiplication, based on the Montgomery multiplication (MM) algorithm. A word-based version of MM is presented and used to explain the main concepts in the hardware design. The proposed multiplier is able to work with any precision of the input operands, limited only by memory or control constraints. Its architecture gives enough freedom to select the word size and the degree of parallelism to be used, according to the available area and/or desired performance. Design trade offs are analyzed in order to identify adequate hardware configurations for a given area or bandwidth requirement.

242 citations


Journal ArticleDOI
TL;DR: A new class of variable-to-variable-length compression codes that are designed using distributions of the runs of 0s in typical test sequences, referred to as frequency-directed run-length (FDR) codes are presented.
Abstract: Test data compression and test resource partitioning (TRP) are necessary to reduce the volume of test data for system-on-a-chip designs. We present a new class of variable-to-variable-length compression codes that are designed using distributions of the runs of 0s in typical test sequences. We refer to these as frequency-directed run-length (FDR) codes. We present experimental results for ISCAS 89 benchmark circuits and two IBM production circuits to show that FDR codes are extremely effective for test data compression and TRP. We derive upper and lower bounds on the compression expected for some generic parameters of the test sequences. These bounds are especially tight when the number of runs is small, thereby showing that FDR codes are robust, i.e., they are insensitive to variations in the input data stream. In order to highlight the inherent superiority of FDR codes, we present a probabilistic analysis of data compression for a memoryless data source. Finally, we derive entropy bounds for the benchmark test sets and show that the compression obtained using FDR codes is close to the entropy bounds.

232 citations


Journal ArticleDOI
TL;DR: A deterministic fault-tolerant and deadlock-free routing protocol in two-dimensional meshes based on dimension-order routing and the odd-even turn model is proposed, called extended X-Y routing.
Abstract: We propose a deterministic fault-tolerant and deadlock-free routing protocol in two-dimensional (2D) meshes based on dimension-order routing and the odd-even turn model. The proposed protocol, called extended X-Y routing, does not use any virtual channels by prohibiting certain locations of faults and destinations. Faults are contained in a set of disjointed rectangular regions called faulty blocks. The number of faults to be tolerated is unbounded as long as nodes outside faulty blocks are connected in the 2D mesh network. The extended X-Y routing can also be used under a special convex fault region called an orthogonal faulty block, which can be derived from a given faulty block by activating some nonfaulty nodes in the block. Extensions to partially adaptive routing, traffic and adaptivity-balancing using virtual networks, and routing without constraints using virtual channels and virtual networks are also discussed.

225 citations


Journal ArticleDOI
TL;DR: This article presents a highly regular and scalable AES hardware architecture, suited for full-custom as well as for semicustom design flows, that is scalable in terms of throughput and of the used key size.
Abstract: This article presents a highly regular and scalable AES hardware architecture, suited for full-custom as well as for semicustom design flows. Contrary to other publications, a complete architecture (even including CBC mode) that is scalable in terms of throughput and in terms of the used key size is described. Similarities of encryption and decryption are utilized to provide a high level of performance using only a relatively small area (10,799 gate equivalents for the standard configuration). This performance is reached by balancing the combinational paths of the design. No other published AES hardware architecture provides similar balancing or a comparable regularity. Implementations of the fastest configuration of the architecture provide a throughput of 241 Mbits/sec on a 0.6 /spl mu/m CMOS process using standard cells.

216 citations


Journal ArticleDOI
TL;DR: A novel schedulability analysis for verifying the feasibility of large periodic task sets under the rate monotonic algorithm when the exact test cannot be applied on line due to prohibitively long execution times is proposed.
Abstract: We propose a novel schedulability analysis for verifying the feasibility of large periodic task sets under the rate monotonic algorithm when the exact test cannot be applied on line due to prohibitively long execution times. The proposed test has the same complexity as the original Liu and Layland (1973) bound, but it is less pessimistic, thus allowing it to accept task sets that would be rejected using the original approach. The performance of the proposed approach is evaluated with respect to the classical Liu and Layland method and theoretical bounds are derived as a function of n (the number of tasks) and for the limit case of n tending to infinity. The analysis is also extended to include aperiodic servers and blocking times due to concurrency control protocols. Extensive simulations on synthetic tasks sets are presented to compare the effectiveness of the proposed test with respect to the Liu and Layland method and the exact response time analysis.

Journal ArticleDOI
TL;DR: Three physical techniques and one software-implemented technique that have been used to assess the fault tolerance features of the MARS fault-tolerant distributed real-time system are compared and analyzed and the results obtained are discussed.
Abstract: This paper addresses the issue of characterizing the respective impact of fault injection techniques. Three physical techniques and one software-implemented technique that have been used to assess the fault tolerance features of the MARS fault-tolerant distributed real-time system are compared and analyzed. After a short summary of the fault tolerance features of the MARS architecture and especially of the error detection mechanisms that were used to compare the erroneous behaviors induced by the fault injection techniques considered, we describe the common distributed testbed and test scenario implemented to perform a coherent set of fault injection campaigns. The main features of the four fault injection techniques considered are then briefly described and the results obtained are finally presented and discussed. Emphasis is put on the analysis of the specific impact and merit of each injection technique.

Journal ArticleDOI
TL;DR: Differently from existing solutions, no extra hardware is required to run the protocol at each node and there is no need for a designated node to start the scatternet formation process, thus achieving robustness.
Abstract: Describes a protocol for the establishment of multihop ad hoc networks based on Bluetooth devices. The protocol proceeds in three phases: device discovery, partitioning of the network into Bluetooth piconets, and interconnection of the piconets into a connected scatternet. The protocol has the following desirable properties: it is executed at each node with no prior knowledge of the network topology, thus being fully distributed. The selection of the Bluetooth masters is driven by the suitability of a node to be the "best fit" for serving as a master. The generated scatternet is a connected mesh with multiple paths between any pair of nodes, thus achieving robustness. Differently from existing solutions, no extra hardware is required to run the protocol at each node and there is no need for a designated node to start the scatternet formation process. Simulation results are provided which evaluate the impact of the Bluetooth device discovery phase on the performance of the protocol.

Journal ArticleDOI
TL;DR: Two unique algorithms are developed and implemented with low-power and fast circuits that reduce the maximum percent errors that result from binary-to-binary logarithm conversion to 0.9299 percent, 0.4314 percent, and 0.1538 percent.
Abstract: We present a unique 32-bit binary-to-binary logarithm converter including its CMOS VLSI implementation. The converter is implemented using combinational logic only and it calculates a logarithm approximation in a single clock cycle. Unlike other complex logarithm correcting algorithms, three unique algorithms are developed and implemented with low-power and fast circuits that reduce the maximum percent errors that result from binary-to-binary logarithm conversion to 0.9299 percent, 0.4314 percent, and 0.1538 percent. Fast 4, 16, and 32-bit leading-one detector circuits are designed to obtain the leading-one position of an input binary word. A 32-word/spl times/5-bit MOS ROM is used to provide 5-bit integers based on the corresponding leading-one position. Both converter area and speed have been considered in the design approach, resulting in the use of a very efficient 32-bit logarithmic shifter in the 32-bit logarithmic converter. The converter is implemented using 0.6/spl mu/m CMOS technology, and it requires 1,600/spl lambda//spl times/2,800/spl lambda/ of chip area. Simulations of the CMOS design for the 32-bit logarithmic converter, operating at V/sub DD/ equal to 5 volts, run at 55 MHz, and the converter consumes 20 milliwatts.

Journal ArticleDOI
TL;DR: A practical DDoS defense system that can protect the availability of web services during severe DDoS attacks and is evaluated based on a novel game theoretical framework, which characterizes the natural adversarial relationship between a DDoS adversary and the proposed system.
Abstract: The recent tide of Distributed Denial of Service (DDoS) attacks against high-profile web sites demonstrate how devastating DDoS attacks are and how defenseless the Internet is under such attacks. We design a practical DDoS defense system that can protect the availability of web services during severe DDoS attacks. The basic idea behind our system is to isolate and protect legitimate traffic from a huge volume of DDoS traffic when an attack occurs. Traffic that needs to be protected can be recognized and protected using efficient cryptographic techniques. Therefore, by provisioning adequate resource (e.g., bandwidth) to legitimate traffic separated by this process, we are able to provide adequate service to a large percentage of clients during DDoS attacks. The worst-case performance (effectiveness) of the system is evaluated based on a novel game theoretical framework, which characterizes the natural adversarial relationship between a DDoS adversary and the proposed system. We also conduct a simulation study to verify a key assumption used in the game-theoretical analysis and to demonstrate the system dynamics during an attack.

Journal ArticleDOI
TL;DR: This article considers the problem of how to prevent RSA signature and decryption computation with a residue number system (CRT-based approach) speedup from a hardware fault cryptanalysis in a highly reliable and efficient approach and proposes two novel protocols that have comparable performance to Shamir's scheme.
Abstract: This article considers the problem of how to prevent RSA signature and decryption computation with a residue number system (CRT-based approach) speedup from a hardware fault cryptanalysis in a highly reliable and efficient approach. CRT-based speedup for an RSA signature has been widely adopted as an implementation standard ranging from large servers to very tiny smart IC cards. However, given a single erroneous computation result, hardware fault cryptanalysis can totally break the RSA system by factoring the public modulus. Countermeasures using a simple verification function (e.g., raising a signature to the power of a public key) or fault detection (e.g., an expanded modulus approach) have been reported in the literature; however, it is pointed out that very few of these existing solutions are both sound and efficient. Unreasonably, in these methods, they assume that a comparison instruction will always be fault-free when developing countermeasures against hardware fault cryptanalysis. Research shows that the expanded modulus approach proposed by Shamir (1997, 1999) is superior to the approach using a simple verification function when another physical cryptanalysis (e.g., timing cryptanalysis) is considered. So, we intend to improve Shamir's method. In this paper, the new concepts of fault infective CRT computation and fault infective CRT recombination are proposed. Based on the new concepts, two novel protocols are developed with a rigorous proof of security. Two possible parameter settings are provided for the protocols. One setting selects a small public key and the proposed protocols can have comparable performance to Shamir's scheme. The other setting has better performance than Shamir's scheme (i.e., having comparable performance to conventional CRT speedup), but with a large public key. Most importantly, we wish to emphasize the importance of developing and proving the security of physically secure protocols without relying on unreliable or unreasonable assumptions, e.g., always fault-free instructions. In this paper, related protocols are also considered and carefully examined to point out possible weaknesses.

Journal ArticleDOI
TL;DR: This paper has identified a recursive formula from which their parallel implementation is derived and developed high-level parametric codes that are capable of generating the circuits autonomously when only the polynomial is given.
Abstract: This paper presents a theoretical result in the context of realizing high-speed hardware for parallel CRC checksums. Starting from the serial implementation widely reported in the literature, we have identified a recursive formula from which our parallel implementation is derived. In comparison with previous works, the new scheme is faster and more compact and is independent of the technology used in its realization. In our solution, the number of bits processed in parallel can be different from the degree of the polynomial generator. Last, we have also developed high-level parametric codes that are capable of generating the circuits autonomously when only the polynomial is given.

Journal ArticleDOI
TL;DR: An integrated framework for system-on-chip (SOC) test automation based on a new test access mechanism (TAM) architecture consisting of flexible-width test buses that can fork and merge between cores is described.
Abstract: We describe an integrated framework for system-on-chip (SOC) test automation. Our framework is based on a new test access mechanism (TAM) architecture consisting of flexible-width test buses that can fork and merge between cores. Test wrapper and TAM cooptimization for this architecture is performed by representing core tests using rectangles and by employing a novel rectangle packing algorithm for test scheduling. Test scheduling is tightly integrated with TAM optimization and it incorporates precedence and power constraints in the test schedule, while allowing the SOC integrator to designate a group of tests as preemptable. Test preemption helps avoid hardware and power consumption conflicts, thereby leading to a more efficient test schedule. Finally, we study the relationship between TAM width and tester data volume to identify an effective TAM width for the SOC. We present experimental results on our test automation framework for four benchmark SOCs.

Journal ArticleDOI
TL;DR: A deadline mechanism which combines software fault tolerance in hard real-time periodic task systems with a scheduling algorithm which guarantees either the primary or alternate version of each critical task to be completed in time and attempts to complete as many primaries as possible.
Abstract: A hard real-time system is usually subject to stringent reliability and timing constraints. One way to avoid missing deadlines is to trade the quality of computation results for timeliness, and software fault tolerance is often achieved with the use of redundant programs. A deadline mechanism which combines these two methods is proposed to provide software fault tolerance in hard real-time periodic task systems. We consider the problem of scheduling a set of real-time periodic tasks each of which has two versions: primary and alternate. The primary version contains more functions and produces good quality results, but its correctness is more difficult to verify. The alternate version contains only the minimum required functions and produces less precise results and its correctness is easy to verify. We propose a scheduling algorithm which 1) guarantees either the primary or alternate version of each critical task to be completed in time and 2) attempts to complete as many primaries as possible. Our basic algorithm uses a fixed priority-driven preemptive scheduling scheme to preallocate time intervals to the alternates and, at runtime, attempts to execute primaries first. An alternate will be executed only if necessary because of time or bugs.

Journal ArticleDOI
TL;DR: The AQuA architecture is described and the active replication pass-first scheme is presented, in detail, and results from the study of fault detection, recovery, and blocking times are presented.
Abstract: Building dependable distributed systems from commercial off-the-shelf components is of growing practical importance. For both cost and production reasons, there is interest in approaches and architectures that facilitate building such systems. The AQuA architecture is one such approach; its goal is to provide adaptive fault tolerance to CORBA applications by replicating objects. The AQuA architecture allows application programmers to request desired levels of dependability during applications' runtimes. It provides fault tolerance mechanisms to ensure that a CORBA client can always obtain reliable services, even if the CORBA server object that provides the desired services suffers from crash failures and value faults. AQuA includes a replicated dependability manager that provides dependability management by configuring the system in response to applications' requests and changes in system resources due to faults. It uses Maestro/Ensemble to provide group communication services. It contains a gateway to intercept standard CORBA IIOP messages to allow any standard CORBA application to use AQuA. It provides different types of replication schemes to forward messages reliably to the remote replicated objects. All of the replication schemes ensure strong, data consistency among replicas. This paper describes the AQuA architecture and presents, in detail, the active replication pass-first scheme. In addition, the interface to the dependability manager and the design of the dependability manager replication are also described. Finally, we describe performance measurements that were conducted for the active replication pass-first scheme, and we present results from our study of fault detection, recovery, and blocking times.

Journal ArticleDOI
TL;DR: New Mastrovito and dual basis multiplier architectures based on these special irreducible pentanomials are proposed and rigorous analyses of their space and time complexity are given.
Abstract: The state-of-the-art Galois field GF(2/sup m/) multipliers offer advantageous space and time complexities when the field is generated by so special irreducible polynomial. To date, the best complexity results have been obtained when the irreducible polynomial is either a trinomial or an equally spaced polynomial (ESP). Unfortunately, there exist only a few irreducible ESPs in the range of interest for most of the applications, e.g., error-correcting codes, computer algebra, and elliptic curve cryptography. Furthermore, it is not always possible to find an irreducible trinomial of degree m in this range. For those cases where neither an irreducible trinomial nor an irreducible ESP exists, the use of irreducible pentanomials has been suggested. Irreducible pentanomials are abundant, and there are several eligible candidates for a given m. We promote the use of two special types of irreducible pentanomials. We propose new Mastrovito and dual basis multiplier architectures based on these special irreducible pentanomials and give rigorous analyses of their space and time complexity.

Journal ArticleDOI
TL;DR: The MediaBreeze architecture is proposed, which uses hardware support for efficient address generation, looping, and data reorganization (permute, packing/unpacking, transpose, etc.) and provides a better performance than a 16-way processor with current SIMD extensions.
Abstract: Multimedia SIMD extensions such as MMX and AltiVec speed up media processing; however, our characterization shows that the attributes of current general-purpose processors enhanced with SIMD extensions do not match very well with the access patterns and loop structures of media programs. We find that 75 to 85 percent of the dynamic instructions in the processor instruction stream are supporting instructions necessary to feed the SIMD execution units rather than true/useful computations, resulting in the underutilization of SIMD execution units (only 1 to 12 percent of the peak SIMD execution units' throughput is achieved). Contrary to focusing on exploiting more data-level parallelism (DLP), we focus on the instructions that support the SIMD computations and exploit both fine and coarse-grained instruction level parallelism (ILP) in the supporting instruction stream. We propose the MediaBreeze architecture that uses hardware support for efficient address generation, looping, and data reorganization (permute, packing/unpacking, transpose, etc.). Our results on multimedia kernels show that a 2-way processor with SIMD extensions enhanced with MediaBreeze provides a better performance than a 16-way processor with current SIMD extensions. In the case of application benchmarks, a 2-/4-way processor with SIMD extensions augmented with MediaBreeze outperforms a 4-/8-way processor with SIMD extensions. A first-order approximation using ASIC synthesis tools and cell-based libraries shows that this acceleration is achieved at a 10 percent increase in area required by MMX and SSE extensions (0.3 percent increase in overall chip area) and 1 percent of total processor power consumption.

Journal ArticleDOI
TL;DR: Due to the nature of the BDD, the sum of disjoint products (SDP) can be implicitly represented, which avoids huge storage and high computational complexity for large multistate systems.
Abstract: A new algorithm based on binary decision diagram (BDD) for the analysis of a system with multistate components is proposed. Each state of a multistate component is represented by a boolean variable, and a multistate system is represented by a series of multistate fault trees. A Boolean algebra with restrictions on variables is used to address the dependence among these boolean variables that collectively represent the same component and a new BDD operation is proposed to realize this Boolean algebra. Due to the nature of the BDD, the sum of disjoint products (SDP) can be implicitly represented, which avoids huge storage and high computational complexity for large multistate systems. Some applications are given to illustrate the use of our new algorithm.

Journal ArticleDOI
TL;DR: This paper analytically derive the average number of location updates during the interservice time for a movement-based location update scheme under fairly realistic assumptions, which are crucial for all trade off analysis.
Abstract: Mobility management plays a central role in providing ubiquitous communications services in future wireless mobile networks. In mobility management, there are two key operations, location update and paging, commonly used in tracking mobile users on the move. Location update is to inform the network about a mobile user's current location, while paging is used for the network to locate a mobile user. Both operations will incur signaling traffic in the resource limited wireless networks. The more frequent the location updates, the less paging in locating a mobile user; thus, there is a trade off in terms of signaling cost. Most trade off analysis in the literature is carried out under the assumption that some time variables are exponentially distributed. However, such assumptions will not be valid, particularly for the wireless Internet. In this paper, we present some general analytical results without these assumptions, which are essential for the general trade off analysis. Specifically, we analytically derive the average number of location updates during the interservice time for a movement-based location update scheme under fairly realistic assumptions, which are crucial for all trade off analysis. Our general analytical results make thorough numerical analysis for finding the optimal mobility management under various network operation scenarios possible.

Journal ArticleDOI
TL;DR: A design of the multipolling mechanism with the advantages of high channel utilization and low implementation overhead is shown and the results show that the proposed mechanism is more efficient than the one discussed in the IEEE 802.11 E task group.
Abstract: To expand support for applications with QoS requirements in wireless local area networks (WLANs), the 802.11 E Task Group was formed to enhance the current IEEE 802.11 Medium Access Control (MAC) protocol. The multipolling mechanism was discussed in the task group, but some problems remain unsolved. In this paper, we show a design of the multipolling mechanism with the advantages of high channel utilization and low implementation overhead. In our proposed mechanism, wireless stations use a priority-based contention scheme to coordinate in themselves the transmission order on the channel. Moreover, we propose a polling schedule mechanism for our proposed multipoll to serve real-time traffic with constant and variable bit rates. The bounded delay requirement of the real-time traffic can be satisfied in our scheduling model. We establish an admission test to estimate the system capacity and to determine whether a new connection can be accepted. We study the performance of our proposed mechanism analytically, as well as through simulated experiments. The results show that the proposed mechanism is more efficient than the one discussed in the IEEE 802.11 E task group.

Journal ArticleDOI
TL;DR: This work introduces the cluster-based failure recovery concept which determines the best placement of slack within the FT schedule so as to minimize the resulting time overhead and provides transparent failure recovery in that a processor recovering from task failures does not disrupt the operation of other processors.
Abstract: The time-triggered model, with tasks scheduled in static (off line) fashion, provides a high degree of timing predictability in safety-critical distributed systems. Such systems must also tolerate transient and intermittent failures which occur far more frequently than permanent ones. Software-based recovery methods using temporal redundancy, such as task reexecution and primary/backup, while incurring performance overhead, are cost-effective methods of handling these failures. We present a constructive approach to integrating runtime recovery policies in a time-triggered distributed system. Furthermore, the method provides transparent failure recovery in that a processor recovering from task failures does not disrupt the operation of other processors. Given a general task graph with precedence and timing constraints and a specific fault model, the proposed method constructs the corresponding fault-tolerant (FT) schedule with sufficient slack to accommodate recovery. We introduce the cluster-based failure recovery concept which determines the best placement of slack within the FT schedule so as to minimize the resulting time overhead. Contingency schedules, also generated offline, revise this FT schedule to mask task failures on individual processors while preserving precedence and timing constraints. We present simulation results which show that, for small-scale embedded systems having task graphs of moderate complexity, the proposed approach generates FT schedules which incur about 30-40 percent performance overhead when compared to corresponding non-fault-tolerant ones.

Journal ArticleDOI
TL;DR: First, Lee distance Gray codes in Z/sub k//sup n/ are presented and then it is shown how these codes can directly be used to generate edge disjoint Hamiltonian cycles in k-ary n-cubes.
Abstract: Solutions for decomposing a higher dimensional torus to edge disjoint lower dimensional tori, in particular, edge disjoint Hamiltonian cycles are obtained based on the coding theory approach. First, Lee distance Gray codes in Z/sub k//sup n/ are presented and then it is shown how these codes can directly be used to generate edge disjoint Hamiltonian cycles in k-ary n-cubes. Further, some new classes of binary Gray codes are designed from these Lee distance Gray codes and, using these new classes of binary Gray codes, edge disjoint Hamiltonian cycles in hypercubes are generated.

Journal ArticleDOI
TL;DR: This work shows how voltage scaling can be scheduled to reduce energy usage while still meeting real-time deadlines.
Abstract: Many embedded systems operate under severe power and energy constraints. Voltage clock scaling is one mechanism by which energy consumption may be reduced: it is based on the fact that power consumption is a quadratic function of the voltage, while the speed is a linear function. We show how voltage scaling can be scheduled to reduce energy usage while still meeting real-time deadlines.

Journal ArticleDOI
TL;DR: A simple, sufficient test is presented for determining whether a given periodic task system will be successfully scheduled by this algorithm upon a particular uniform multiprocessor platform-this test generalizes earlier results concerning rate-monotonic scheduling upon identical multip rocessor platforms.
Abstract: The rate-monotonic algorithm is arguably one of the most popular algorithms for scheduling systems of periodic real-time tasks. The rate-monotonic scheduling of systems of periodic tasks on uniform multiprocessor platforms is considered here. A simple, sufficient test is presented for determining whether a given periodic task system will be successfully scheduled by this algorithm upon a particular uniform multiprocessor platform-this test generalizes earlier results concerning rate-monotonic scheduling upon identical multiprocessor platforms.

Journal ArticleDOI
TL;DR: SimBed, an execution-driven simulation testbed that measures the execution behavior and power consumption of embedded applications and RTOSs by executing them on an accurate architectural model of a microcontroller with simulated real-time stimuli, shows that there appears no clear winner in timing accuracy between preemptive systems and cooperative systems.
Abstract: We present the modelling of embedded systems with SimBed, an execution-driven simulation testbed that measures the execution behavior and power consumption of embedded applications and RTOSs by executing them on an accurate architectural model of a microcontroller with simulated real-time stimuli. We briefly describe the simulation environment and present a study that compares three RTOSs: /spl mu/C/OS-II, a popular public-domain embedded real-time operating system; Echidna, a sophisticated, industrial-strength (commercial) RTOS; and NOS, a bare-bones multirate task scheduler reminiscent of typical "roll-your-own" RTOSs found in many commercial embedded systems. The microcontroller simulated in this study is the Motorola M-CORE processor: a low-power, 32-bit CPU core with 16-bit instructions, running at 20MHz. Our simulations show what happens when RTOSs are pushed beyond their limits and they depict situations in which unexpected interrupts or unaccounted-for task invocations disrupt timing, even when the CPU is lightly loaded. In general, there appears no clear winner in timing accuracy between preemptive systems and cooperative systems. The power-consumption measurements show that RTOS overhead is a factor of two to four higher than it needs to be, compared to the energy consumption of the minimal scheduler. In addition, poorly designed idle loops can cause the system to double its energy consumption-energy that could be saved by a simple hardware sleep mechanism.