Showing papers in &quot;IEEE Transactions on Computers in 2003&quot;

Adaptive approaches to relieving broadcast storms in a wireless multihop mobile ad hoc network

TL;DR: This work provides efficient distributed algorithms to optimally solve the best-coverage problem raised in the above-mentioned article and considers a more general sensing model: the sensing ability diminishes as the distance increases.

...read moreread less

Abstract: Sensor networks pose a number of challenging conceptual and optimization problems such as location, deployment, and tracking. One of the fundamental problems in sensor networks is the calculation of the coverage. In Meguerdichian et al. (2001), it is assumed that the sensor has uniform sensing ability. We provide efficient distributed algorithms to optimally solve the best-coverage problem raised in the above-mentioned article. In addition, we consider a more general sensing model: the sensing ability diminishes as the distance increases. As energy conservation is a major concern in wireless (or sensor) networks, we also consider how to find an optimum best-coverage-path with the least energy consumption and how to find an optimum best-coverage-path that travels a small distance. In addition, we justify the correctness of the method proposed above that uses the Delaunay triangulation to solve the best coverage problem and show that the search space of the best coverage problem can be confined to the relative neighborhood graph, which can be constructed locally.

...read moreread less

483 citations

Journal Article•DOI•

[...]

Yu-Chee Tseng¹, Sze-Yao Ni, En-Yu Shih²•Institutions (2)

National Chiao Tung University¹, Industrial Technology Research Institute²

01 May 2003-IEEE Transactions on Computers

TL;DR: Simulation results show that several adaptive schemes, which can dynamically adjust thresholds based on local connectivity information can offer better reachability as well as efficiency as compared to the previous results.

...read moreread less

Abstract: In a multihop mobile ad hoc network, broadcasting is an elementary operation to support many applications. Previously, it is shown that naively broadcasting by flooding may cause serious redundancy, contention, and collision in the network, which we refer to as the broadcast storm problem. Several threshold-based schemes are shown to perform better than flooding in that work. However, how to choose thresholds also poses a dilemma between reachability and efficiency under different host densities. In this paper, we propose several adaptive schemes, which can dynamically adjust thresholds based on local connectivity information. Simulation results show that these adaptive schemes can offer better reachability as well as efficiency as compared to the previous results.

...read moreread less

462 citations

Journal Article•DOI•

A high-speed oscillator-based truly random number source for cryptographic applications on a smart card IC

[...]

Marco Bucci, L. Germani, Raimondo Luzzi¹, Alessandro Trifiletti², M. Varanonuovo² - Show less +1 more•Institutions (2)

Infineon Technologies¹, Sapienza University of Rome²

Error analysis and detection procedures for a hardware implementation of the advanced encryption standard

TL;DR: A jittered oscillator which features an amplified thermal noise source has been designed in order to increase the output throughput and the statistical quality of the generated bit sequences, thus solving one of the major issues in this kind of circuit.

...read moreread less

Abstract: The design of a high-speed IC random number source macro-cell, suitable for integration in a smart card microcontroller, is presented. The oscillator sampling technique is exploited and a jittered oscillator which features an amplified thermal noise source has been designed in order to increase the output throughput and the statistical quality of the generated bit sequences. The oscillator feedback loop acts as an offset compensation for the noise amplifier, thus solving one of the major issues in this kind of circuit. A numerical model for the proposed system has been developed which allows us to carry out an analytical expression for the transition probability between successive bits in the output stream. A prototype chip has been fabricated in a standard digital 0.18 /spl mu/m n-well CMOS process which features a 10 Mbps throughput and fulfills the NIST FIPS and correlation-based tests for randomness. The macro-cell area, excluding pads, is 0.0016 mm/sup 2/ (184 /spl mu/m /spl times/ 86 /spl mu/m) and a 2.3 mW power consumption has been measured.

...read moreread less

393 citations

Journal Article•DOI•

[...]

Guido Bertoni¹, Luca Breveglieri¹, Israel Koren², Paolo Maistri¹, Vincenzo Piuri³ - Show less +1 more•Institutions (3)

Polytechnic University of Milan¹, University of Massachusetts Amherst², University of Milan³

A scalable dual-field elliptic curve cryptographic processor

TL;DR: Two fault detection schemes are presented: the first is a redundancy-based scheme while the second uses an error detecting code, which is a novel scheme which leads to very efficient and high coverage fault detection.

...read moreread less

Abstract: The goal of the Advanced Encryption Standard (AES) is to achieve secure communication. The use of AES does not, however, guarantee reliable communication. Prior work has shown that even a single transient error occurring during the AES encryption (or decryption) process will very likely result in a large number of errors in the encrypted/decrypted data. Such faults must be detected before sending to avoid the transmission and use of erroneous data. Concurrent fault detection is important not only to protect the encryption/decryption process from random faults. It will also protect the encryption/decryption circuitry from an attacker who may maliciously inject faults in order to find the encryption secret key. In this paper, we first describe some studies of the effects that faults may have on a hardware implementation of AES by analyzing the propagation of such faults to the outputs. We then present two fault detection schemes: The first is a redundancy-based scheme while the second uses an error detecting code. The latter is a novel scheme which leads to very efficient and high coverage fault detection. Finally, the hardware costs and detection latencies of both schemes are estimated.

...read moreread less

379 citations

Journal Article•DOI•

[...]

Akashi Satoh¹, K. Takano¹•Institutions (1)

IBM¹

A scalable architecture for modular multiplication based on Montgomery's algorithm

TL;DR: An elliptic curve (EC) cryptographic processor architecture that can support Galois fields GF(p) and GF(2/sup n/) for arbitrary prime numbers and irreducible polynomials is proposed by introducing a dual field multiplier.

...read moreread less

Abstract: We propose an elliptic curve (EC) cryptographic processor architecture that can support Galois fields GF(p) and GF(2/sup n/) for arbitrary prime numbers and irreducible polynomials by introducing a dual field multiplier. A Montgomery multiplier with an optimized data bus and an on-the-fly redundant binary converter boost the throughput of the EC scalar multiplication. All popular cryptographic functions such as DSA, EC-DSA, RSA, CRT, and prime generation are also supported. All commands are organized in a hierarchical structure according to their complexity. Our processor has high scalability and flexibility between speed, hardware area, and operand size. In the hardware evaluation using a 0.13-/spl mu/m CMOS standard cell library, the high-speed design using 117.5 Kgates with a 64-bit multiplier achieved operation times of 1.21 ms and 0.19 ms for a 160-bit EC scalar multiplication in GF(p) and GF(2/sup n/), respectively. A compact version with an 8-bit multiplier requires only 28.3 K gates and executes the operations in 7.47 ms and 2.79 ms. Not only 160-bit operations, but any bit length can be supported by any hardware configuration so long as the memory capacity is sufficient.

...read moreread less

272 citations

Journal Article•DOI•

[...]

Alexandre F. Tenca¹, Çetin Kaya Koç¹•Institutions (1)

Oregon State University¹

01 Sep 2003-IEEE Transactions on Computers

TL;DR: A word-based version of MM is presented and used to explain the main concepts in the hardware design and gives enough freedom to select the word size and the degree of parallelism to be used, according to the available area and/or desired performance.

...read moreread less

Abstract: This paper presents a scalable architecture for the computation of modular multiplication, based on the Montgomery multiplication (MM) algorithm. A word-based version of MM is presented and used to explain the main concepts in the hardware design. The proposed multiplier is able to work with any precision of the input operands, limited only by memory or control constraints. Its architecture gives enough freedom to select the word size and the degree of parallelism to be used, according to the available area and/or desired performance. Design trade offs are analyzed in order to identify adequate hardware configurations for a given area or bandwidth requirement.

...read moreread less

242 citations

Journal Article•DOI•

Test data compression and test resource partitioning for system-on-a-chip using frequency-directed run-length (FDR) codes

[...]

Anshuman Chandra¹, Krishnendu Chakrabarty¹•Institutions (1)

Duke University¹

01 Aug 2003-IEEE Transactions on Computers

TL;DR: A new class of variable-to-variable-length compression codes that are designed using distributions of the runs of 0s in typical test sequences, referred to as frequency-directed run-length (FDR) codes are presented.

...read moreread less

Abstract: Test data compression and test resource partitioning (TRP) are necessary to reduce the volume of test data for system-on-a-chip designs. We present a new class of variable-to-variable-length compression codes that are designed using distributions of the runs of 0s in typical test sequences. We refer to these as frequency-directed run-length (FDR) codes. We present experimental results for ISCAS 89 benchmark circuits and two IBM production circuits to show that FDR codes are extremely effective for test data compression and TRP. We derive upper and lower bounds on the compression expected for some generic parameters of the test sequences. These bounds are especially tight when the number of runs is small, thereby showing that FDR codes are robust, i.e., they are insensitive to variations in the input data stream. In order to highlight the inherent superiority of FDR codes, we present a probabilistic analysis of data compression for a memoryless data source. Finally, we derive entropy bounds for the benchmark test sets and show that the compression obtained using FDR codes is close to the entropy bounds.

...read moreread less

232 citations

Journal Article•DOI•

A fault-tolerant and deadlock-free routing protocol in 2D meshes based on odd-even turn model

[...]

Jie Wu¹•Institutions (1)

Florida Atlantic University¹

01 Sep 2003-IEEE Transactions on Computers

TL;DR: A deterministic fault-tolerant and deadlock-free routing protocol in two-dimensional meshes based on dimension-order routing and the odd-even turn model is proposed, called extended X-Y routing.

...read moreread less

Abstract: We propose a deterministic fault-tolerant and deadlock-free routing protocol in two-dimensional (2D) meshes based on dimension-order routing and the odd-even turn model. The proposed protocol, called extended X-Y routing, does not use any virtual channels by prohibiting certain locations of faults and destinations. Faults are contained in a set of disjointed rectangular regions called faulty blocks. The number of faults to be tolerated is unbounded as long as nodes outside faulty blocks are connected in the 2D mesh network. The extended X-Y routing can also be used under a special convex fault region called an orthogonal faulty block, which can be derived from a given faulty block by activating some nonfaulty nodes in the block. Extensions to partially adaptive routing, traffic and adaptivity-balancing using virtual networks, and routing without constraints using virtual channels and virtual networks are also discussed.

...read moreread less

225 citations

Journal Article•DOI•

A highly regular and scalable AES hardware architecture

[...]

Stefan Mangard¹, Manfred Josef Aigner¹, Sandra Dominikus¹•Institutions (1)

Graz University of Technology¹

Rate monotonic analysis: the hyperbolic bound

TL;DR: This article presents a highly regular and scalable AES hardware architecture, suited for full-custom as well as for semicustom design flows, that is scalable in terms of throughput and of the used key size.

...read moreread less

Abstract: This article presents a highly regular and scalable AES hardware architecture, suited for full-custom as well as for semicustom design flows. Contrary to other publications, a complete architecture (even including CBC mode) that is scalable in terms of throughput and in terms of the used key size is described. Similarities of encryption and decryption are utilized to provide a high level of performance using only a relatively small area (10,799 gate equivalents for the standard configuration). This performance is reached by balancing the combinational paths of the design. No other published AES hardware architecture provides similar balancing or a comparable regularity. Implementations of the fastest configuration of the architecture provide a throughput of 241 Mbits/sec on a 0.6 /spl mu/m CMOS process using standard cells.

...read moreread less

216 citations

Journal Article•DOI•

[...]

Enrico Bini, Giorgio Buttazzo¹, Giuseppe Buttazzo²•Institutions (2)

University of Pavia¹, University of Pisa²

01 Jul 2003-IEEE Transactions on Computers

TL;DR: A novel schedulability analysis for verifying the feasibility of large periodic task sets under the rate monotonic algorithm when the exact test cannot be applied on line due to prohibitively long execution times is proposed.

...read moreread less

Abstract: We propose a novel schedulability analysis for verifying the feasibility of large periodic task sets under the rate monotonic algorithm when the exact test cannot be applied on line due to prohibitively long execution times. The proposed test has the same complexity as the original Liu and Layland (1973) bound, but it is less pessimistic, thus allowing it to accept task sets that would be rejected using the original approach. The performance of the proposed approach is evaluated with respect to the classical Liu and Layland method and theoretical bounds are derived as a function of n (the number of tasks) and for the limit case of n tending to infinity. The analysis is also extended to include aperiodic servers and blocking times due to concurrency control protocols. Extensive simulations on synthetic tasks sets are presented to compare the effectiveness of the proposed test with respect to the Liu and Layland method and the exact response time analysis.

...read moreread less

Journal Article•DOI•

Comparison of physical and software-implemented fault injection techniques

[...]

Jean Arlat¹, Yves Crouzet¹, Johan Karlsson², Peter Folkesson², E. Fuchs, Günther Leber - Show less +2 more•Institutions (2)

Centre national de la recherche scientifique¹, Chalmers University of Technology²

01 Sep 2003-IEEE Transactions on Computers

TL;DR: Three physical techniques and one software-implemented technique that have been used to assess the fault tolerance features of the MARS fault-tolerant distributed real-time system are compared and analyzed and the results obtained are discussed.

...read moreread less

Abstract: This paper addresses the issue of characterizing the respective impact of fault injection techniques. Three physical techniques and one software-implemented technique that have been used to assess the fault tolerance features of the MARS fault-tolerant distributed real-time system are compared and analyzed. After a short summary of the fault tolerance features of the MARS architecture and especially of the error detection mechanisms that were used to compare the erroneous behaviors induced by the fault injection techniques considered, we describe the common distributed testbed and test scenario implemented to perform a coherent set of fault injection campaigns. The main features of the four fault injection techniques considered are then briefly described and the results obtained are finally presented and discussed. Emphasis is put on the analysis of the specific impact and merit of each injection technique.

...read moreread less

Journal Article•DOI•

Configuring BlueStars: multihop scatternet formation for Bluetooth networks

[...]

Chiara Petrioli¹, Stefano Basagni², M. Chlamtac³•Institutions (3)

Sapienza University of Rome¹, Northeastern University², University of Texas at Dallas³

CMOS VLSI implementation of a low-power logarithmic converter

TL;DR: Differently from existing solutions, no extra hardware is required to run the protocol at each node and there is no need for a designated node to start the scatternet formation process, thus achieving robustness.

...read moreread less

Abstract: Describes a protocol for the establishment of multihop ad hoc networks based on Bluetooth devices. The protocol proceeds in three phases: device discovery, partitioning of the network into Bluetooth piconets, and interconnection of the piconets into a connected scatternet. The protocol has the following desirable properties: it is executed at each node with no prior knowledge of the network topology, thus being fully distributed. The selection of the Bluetooth masters is driven by the suitability of a node to be the "best fit" for serving as a master. The generated scatternet is a connected mesh with multiple paths between any pair of nodes, thus achieving robustness. Differently from existing solutions, no extra hardware is required to run the protocol at each node and there is no need for a designated node to start the scatternet formation process. Simulation results are provided which evaluate the impact of the Bluetooth device discovery phase on the performance of the protocol.

...read moreread less

Journal Article•DOI•

[...]

K.H. Abed¹, R.E. Siferd¹•Institutions (1)

Wright State University¹

01 Nov 2003-IEEE Transactions on Computers

TL;DR: Two unique algorithms are developed and implemented with low-power and fast circuits that reduce the maximum percent errors that result from binary-to-binary logarithm conversion to 0.9299 percent, 0.4314 percent, and 0.1538 percent.

...read moreread less

Abstract: We present a unique 32-bit binary-to-binary logarithm converter including its CMOS VLSI implementation. The converter is implemented using combinational logic only and it calculates a logarithm approximation in a single clock cycle. Unlike other complex logarithm correcting algorithms, three unique algorithms are developed and implemented with low-power and fast circuits that reduce the maximum percent errors that result from binary-to-binary logarithm conversion to 0.9299 percent, 0.4314 percent, and 0.1538 percent. Fast 4, 16, and 32-bit leading-one detector circuits are designed to obtain the leading-one position of an input binary word. A 32-word/spl times/5-bit MOS ROM is used to provide 5-bit integers based on the corresponding leading-one position. Both converter area and speed have been considered in the design approach, resulting in the use of a very efficient 32-bit logarithmic shifter in the 32-bit logarithmic converter. The converter is implemented using 0.6/spl mu/m CMOS technology, and it requires 1,600/spl lambda//spl times/2,800/spl lambda/ of chip area. Simulations of the CMOS design for the 32-bit logarithmic converter, operating at V/sub DD/ equal to 5 volts, run at 55 MHz, and the converter consumes 20 milliwatts.

...read moreread less

Journal Article•DOI•

Sustaining availability of Web services under distributed denial of service attacks

[...]

Jun Xu¹, Wooyong Lee¹•Institutions (1)

Georgia Institute of Technology¹

01 Feb 2003-IEEE Transactions on Computers

TL;DR: A practical DDoS defense system that can protect the availability of web services during severe DDoS attacks and is evaluated based on a novel game theoretical framework, which characterizes the natural adversarial relationship between a DDoS adversary and the proposed system.

...read moreread less

Abstract: The recent tide of Distributed Denial of Service (DDoS) attacks against high-profile web sites demonstrate how devastating DDoS attacks are and how defenseless the Internet is under such attacks. We design a practical DDoS defense system that can protect the availability of web services during severe DDoS attacks. The basic idea behind our system is to isolate and protect legitimate traffic from a huge volume of DDoS traffic when an attack occurs. Traffic that needs to be protected can be recognized and protected using efficient cryptographic techniques. Therefore, by provisioning adequate resource (e.g., bandwidth) to legitimate traffic separated by this process, we are able to provide adequate service to a large percentage of clients during DDoS attacks. The worst-case performance (effectiveness) of the system is evaluated based on a novel game theoretical framework, which characterizes the natural adversarial relationship between a DDoS adversary and the proposed system. We also conduct a simulation study to verify a key assumption used in the game-theoretical analysis and to demonstrate the system dynamics during an attack.

...read moreread less

Journal Article•DOI•

RSA speedup with Chinese remainder theorem immune against hardware fault cryptanalysis

[...]

Sung-Ming Yen¹, Seungjoo Kim, Seongan Lim, SangJae Moon²•Institutions (2)

National Central University¹, Kyungpook National University²

Test access mechanism optimization, test scheduling, and tester data volume reduction for system-on-chip

TL;DR: This article considers the problem of how to prevent RSA signature and decryption computation with a residue number system (CRT-based approach) speedup from a hardware fault cryptanalysis in a highly reliable and efficient approach and proposes two novel protocols that have comparable performance to Shamir's scheme.

...read moreread less

Abstract: This article considers the problem of how to prevent RSA signature and decryption computation with a residue number system (CRT-based approach) speedup from a hardware fault cryptanalysis in a highly reliable and efficient approach. CRT-based speedup for an RSA signature has been widely adopted as an implementation standard ranging from large servers to very tiny smart IC cards. However, given a single erroneous computation result, hardware fault cryptanalysis can totally break the RSA system by factoring the public modulus. Countermeasures using a simple verification function (e.g., raising a signature to the power of a public key) or fault detection (e.g., an expanded modulus approach) have been reported in the literature; however, it is pointed out that very few of these existing solutions are both sound and efficient. Unreasonably, in these methods, they assume that a comparison instruction will always be fault-free when developing countermeasures against hardware fault cryptanalysis. Research shows that the expanded modulus approach proposed by Shamir (1997, 1999) is superior to the approach using a simple verification function when another physical cryptanalysis (e.g., timing cryptanalysis) is considered. So, we intend to improve Shamir's method. In this paper, the new concepts of fault infective CRT computation and fault infective CRT recombination are proposed. Based on the new concepts, two novel protocols are developed with a rigorous proof of security. Two possible parameter settings are provided for the protocols. One setting selects a small public key and the proposed protocols can have comparable performance to Shamir's scheme. The other setting has better performance than Shamir's scheme (i.e., having comparable performance to conventional CRT speedup), but with a large public key. Most importantly, we wish to emphasize the importance of developing and proving the security of physically secure protocols without relying on unreliable or unreasonable assumptions, e.g., always fault-free instructions. In this paper, related protocols are also considered and carefully examined to point out possible weaknesses.

...read moreread less

Journal Article•DOI•

Parallel CRC realization

[...]

Giuseppe Campobello, G. Patané, Marco Russo¹•Institutions (1)

University of Catania¹

01 Oct 2003-IEEE Transactions on Computers

TL;DR: This paper has identified a recursive formula from which their parallel implementation is derived and developed high-level parametric codes that are capable of generating the circuits autonomously when only the polynomial is given.

...read moreread less

Abstract: This paper presents a theoretical result in the context of realizing high-speed hardware for parallel CRC checksums. Starting from the serial implementation widely reported in the literature, we have identified a recursive formula from which our parallel implementation is derived. In comparison with previous works, the new scheme is faster and more compact and is independent of the technology used in its realization. In our solution, the number of bits processed in parallel can be different from the degree of the polynomial generator. Last, we have also developed high-level parametric codes that are capable of generating the circuits autonomously when only the polynomial is given.

...read moreread less

Journal Article•DOI•

[...]

Vikram lyengar¹, Krishnendu Chakrabarty, Erik Jan Marinissen•Institutions (1)

IBM¹

A fault-tolerant scheduling algorithm for real-time periodic tasks with possible software faults

TL;DR: An integrated framework for system-on-chip (SOC) test automation based on a new test access mechanism (TAM) architecture consisting of flexible-width test buses that can fork and merge between cores is described.

...read moreread less

Abstract: We describe an integrated framework for system-on-chip (SOC) test automation. Our framework is based on a new test access mechanism (TAM) architecture consisting of flexible-width test buses that can fork and merge between cores. Test wrapper and TAM cooptimization for this architecture is performed by representing core tests using rectangles and by employing a novel rectangle packing algorithm for test scheduling. Test scheduling is tightly integrated with TAM optimization and it incorporates precedence and power constraints in the test schedule, while allowing the SOC integrator to designate a group of tests as preemptable. Test preemption helps avoid hardware and power consumption conflicts, thereby leading to a more efficient test schedule. Finally, we study the relationship between TAM width and tester data volume to identify an effective TAM width for the SOC. We present experimental results on our test automation framework for four benchmark SOCs.

...read moreread less

Journal Article•DOI•

[...]

Ching-Chih Han¹, Kang G. Shin¹, Jian Wu¹•Institutions (1)

University of Michigan¹

01 Mar 2003-IEEE Transactions on Computers

TL;DR: A deadline mechanism which combines software fault tolerance in hard real-time periodic task systems with a scheduling algorithm which guarantees either the primary or alternate version of each critical task to be completed in time and attempts to complete as many primaries as possible.

...read moreread less

Abstract: A hard real-time system is usually subject to stringent reliability and timing constraints. One way to avoid missing deadlines is to trade the quality of computation results for timeliness, and software fault tolerance is often achieved with the use of redundant programs. A deadline mechanism which combines these two methods is proposed to provide software fault tolerance in hard real-time periodic task systems. We consider the problem of scheduling a set of real-time periodic tasks each of which has two versions: primary and alternate. The primary version contains more functions and produces good quality results, but its correctness is more difficult to verify. The alternate version contains only the minimum required functions and produces less precise results and its correctness is easy to verify. We propose a scheduling algorithm which 1) guarantees either the primary or alternate version of each critical task to be completed in time and 2) attempts to complete as many primaries as possible. Our basic algorithm uses a fixed priority-driven preemptive scheduling scheme to preallocate time intervals to the alternates and, at runtime, attempts to execute primaries first. An alternate will be executed only if necessary because of time or bugs.

...read moreread less

Journal Article•DOI•

AQuA: an adaptive architecture that provides dependable distributed objects

[...]

Yansong Ren¹, David E. Bakken², T. Courtney³, Michel Cukier⁴, David A. Karr, Paul Rubel⁵, C. Sabnis, William H. Sanders³, Richard E. Schantz⁵, Mouna Seri³ - Show less +6 more•Institutions (5)

Bell Labs¹, Washington State University², University of Illinois at Urbana–Champaign³, University of Maryland, College Park⁴, BBN Technologies⁵

01 Jan 2003-IEEE Transactions on Computers

TL;DR: The AQuA architecture is described and the active replication pass-first scheme is presented, in detail, and results from the study of fault detection, recovery, and blocking times are presented.

...read moreread less

Abstract: Building dependable distributed systems from commercial off-the-shelf components is of growing practical importance. For both cost and production reasons, there is interest in approaches and architectures that facilitate building such systems. The AQuA architecture is one such approach; its goal is to provide adaptive fault tolerance to CORBA applications by replicating objects. The AQuA architecture allows application programmers to request desired levels of dependability during applications' runtimes. It provides fault tolerance mechanisms to ensure that a CORBA client can always obtain reliable services, even if the CORBA server object that provides the desired services suffers from crash failures and value faults. AQuA includes a replicated dependability manager that provides dependability management by configuring the system in response to applications' requests and changes in system resources due to faults. It uses Maestro/Ensemble to provide group communication services. It contains a gateway to intercept standard CORBA IIOP messages to allow any standard CORBA application to use AQuA. It provides different types of replication schemes to forward messages reliably to the remote replicated objects. All of the replication schemes ensure strong, data consistency among replicas. This paper describes the AQuA architecture and presents, in detail, the active replication pass-first scheme. In addition, the interface to the dependability manager and the design of the dependability manager replication are also described. Finally, we describe performance measurements that were conducted for the active replication pass-first scheme, and we present results from our study of fault detection, recovery, and blocking times.

...read moreread less

Journal Article•DOI•

Parallel multipliers based on special irreducible pentanomials

[...]

F. Rodriguez-Henriguez¹, Çetin Kaya Koç²•Institutions (2)

CINVESTAV¹, Oregon State University²

Bottlenecks in multimedia processing with SIMD style extensions and architectural enhancements

TL;DR: New Mastrovito and dual basis multiplier architectures based on these special irreducible pentanomials are proposed and rigorous analyses of their space and time complexity are given.

...read moreread less

Abstract: The state-of-the-art Galois field GF(2/sup m/) multipliers offer advantageous space and time complexities when the field is generated by so special irreducible polynomial. To date, the best complexity results have been obtained when the irreducible polynomial is either a trinomial or an equally spaced polynomial (ESP). Unfortunately, there exist only a few irreducible ESPs in the range of interest for most of the applications, e.g., error-correcting codes, computer algebra, and elliptic curve cryptography. Furthermore, it is not always possible to find an irreducible trinomial of degree m in this range. For those cases where neither an irreducible trinomial nor an irreducible ESP exists, the use of irreducible pentanomials has been suggested. Irreducible pentanomials are abundant, and there are several eligible candidates for a given m. We promote the use of two special types of irreducible pentanomials. We propose new Mastrovito and dual basis multiplier architectures based on these special irreducible pentanomials and give rigorous analyses of their space and time complexity.

...read moreread less

Journal Article•DOI•

[...]

D. Talla¹, Lizy K. John², Doug Burger²•Institutions (2)

Texas Instruments¹, University of Texas at Austin²

01 Aug 2003-IEEE Transactions on Computers

TL;DR: The MediaBreeze architecture is proposed, which uses hardware support for efficient address generation, looping, and data reorganization (permute, packing/unpacking, transpose, etc.) and provides a better performance than a 16-way processor with current SIMD extensions.

...read moreread less

Abstract: Multimedia SIMD extensions such as MMX and AltiVec speed up media processing; however, our characterization shows that the attributes of current general-purpose processors enhanced with SIMD extensions do not match very well with the access patterns and loop structures of media programs. We find that 75 to 85 percent of the dynamic instructions in the processor instruction stream are supporting instructions necessary to feed the SIMD execution units rather than true/useful computations, resulting in the underutilization of SIMD execution units (only 1 to 12 percent of the peak SIMD execution units' throughput is achieved). Contrary to focusing on exploiting more data-level parallelism (DLP), we focus on the instructions that support the SIMD computations and exploit both fine and coarse-grained instruction level parallelism (ILP) in the supporting instruction stream. We propose the MediaBreeze architecture that uses hardware support for efficient address generation, looping, and data reorganization (permute, packing/unpacking, transpose, etc.). Our results on multimedia kernels show that a 2-way processor with SIMD extensions enhanced with MediaBreeze provides a better performance than a 16-way processor with current SIMD extensions. In the case of application benchmarks, a 2-/4-way processor with SIMD extensions augmented with MediaBreeze outperforms a 4-/8-way processor with SIMD extensions. A first-order approximation using ASIC synthesis tools and cell-based libraries shows that this acceleration is achieved at a 10 percent increase in area required by MMX and SSE extensions (0.3 percent increase in overall chip area) and 1 percent of total processor power consumption.

...read moreread less

Journal Article•DOI•

A BDD-based algorithm for analysis of multistate systems with multistate components

[...]

X. Zang¹, D. Wang¹, H. Sun², Kishor S. Trivedi¹•Institutions (2)

Duke University¹, Motorola²

Movement-based mobility management and trade off analysis for wireless mobile networks

TL;DR: Due to the nature of the BDD, the sum of disjoint products (SDP) can be implicitly represented, which avoids huge storage and high computational complexity for large multistate systems.

...read moreread less

Abstract: A new algorithm based on binary decision diagram (BDD) for the analysis of a system with multistate components is proposed. Each state of a multistate component is represented by a boolean variable, and a multistate system is represented by a series of multistate fault trees. A Boolean algebra with restrictions on variables is used to address the dependence among these boolean variables that collectively represent the same component and a new BDD operation is proposed to realize this Boolean algebra. Due to the nature of the BDD, the sum of disjoint products (SDP) can be implicitly represented, which avoids huge storage and high computational complexity for large multistate systems. Some applications are given to illustrate the use of our new algorithm.

...read moreread less

Journal Article•DOI•

[...]

Yuguang Fang¹•Institutions (1)

University of Florida¹

An efficient multipolling mechanism for IEEE 802.11 wireless LANs

TL;DR: This paper analytically derive the average number of location updates during the interservice time for a movement-based location update scheme under fairly realistic assumptions, which are crucial for all trade off analysis.

...read moreread less

Abstract: Mobility management plays a central role in providing ubiquitous communications services in future wireless mobile networks. In mobility management, there are two key operations, location update and paging, commonly used in tracking mobile users on the move. Location update is to inform the network about a mobile user's current location, while paging is used for the network to locate a mobile user. Both operations will incur signaling traffic in the resource limited wireless networks. The more frequent the location updates, the less paging in locating a mobile user; thus, there is a trade off in terms of signaling cost. Most trade off analysis in the literature is carried out under the assumption that some time variables are exponentially distributed. However, such assumptions will not be valid, particularly for the wireless Internet. In this paper, we present some general analytical results without these assumptions, which are essential for the general trade off analysis. Specifically, we analytically derive the average number of location updates during the interservice time for a movement-based location update scheme under fairly realistic assumptions, which are crucial for all trade off analysis. Our general analytical results make thorough numerical analysis for finding the optimal mobility management under various network operation scenarios possible.

...read moreread less

Journal Article•DOI•

[...]

Shou-Chih Lo¹, Guanling Lee, Wen-Tsuen Chen•Institutions (1)

National Tsing Hua University¹

Transparent recovery from intermittent faults in time-triggered distributed systems

TL;DR: A design of the multipolling mechanism with the advantages of high channel utilization and low implementation overhead is shown and the results show that the proposed mechanism is more efficient than the one discussed in the IEEE 802.11 E task group.

...read moreread less

Abstract: To expand support for applications with QoS requirements in wireless local area networks (WLANs), the 802.11 E Task Group was formed to enhance the current IEEE 802.11 Medium Access Control (MAC) protocol. The multipolling mechanism was discussed in the task group, but some problems remain unsolved. In this paper, we show a design of the multipolling mechanism with the advantages of high channel utilization and low implementation overhead. In our proposed mechanism, wireless stations use a priority-based contention scheme to coordinate in themselves the transmission order on the channel. Moreover, we propose a polling schedule mechanism for our proposed multipoll to serve real-time traffic with constant and variable bit rates. The bounded delay requirement of the real-time traffic can be satisfied in our scheduling model. We establish an admission test to estimate the system capacity and to determine whether a new connection can be accepted. We study the performance of our proposed mechanism analytically, as well as through simulated experiments. The results show that the proposed mechanism is more efficient than the one discussed in the IEEE 802.11 E task group.

...read moreread less

Journal Article•DOI•

[...]

Nagarajan Kandasamy¹, John P. Hayes¹, B.T. Murray²•Institutions (2)

University of Michigan¹, Delphi Automotive²

01 Feb 2003-IEEE Transactions on Computers

TL;DR: This work introduces the cluster-based failure recovery concept which determines the best placement of slack within the FT schedule so as to minimize the resulting time overhead and provides transparent failure recovery in that a processor recovering from task failures does not disrupt the operation of other processors.

...read moreread less

Abstract: The time-triggered model, with tasks scheduled in static (off line) fashion, provides a high degree of timing predictability in safety-critical distributed systems. Such systems must also tolerate transient and intermittent failures which occur far more frequently than permanent ones. Software-based recovery methods using temporal redundancy, such as task reexecution and primary/backup, while incurring performance overhead, are cost-effective methods of handling these failures. We present a constructive approach to integrating runtime recovery policies in a time-triggered distributed system. Furthermore, the method provides transparent failure recovery in that a processor recovering from task failures does not disrupt the operation of other processors. Given a general task graph with precedence and timing constraints and a specific fault model, the proposed method constructs the corresponding fault-tolerant (FT) schedule with sufficient slack to accommodate recovery. We introduce the cluster-based failure recovery concept which determines the best placement of slack within the FT schedule so as to minimize the resulting time overhead. Contingency schedules, also generated offline, revise this FT schedule to mask task failures on individual processors while preserving precedence and timing constraints. We present simulation results which show that, for small-scale embedded systems having task graphs of moderate complexity, the proposed approach generates FT schedules which incur about 30-40 percent performance overhead when compared to corresponding non-fault-tolerant ones.

...read moreread less

Journal Article•DOI•

Edge disjoint Hamiltonian cycles in k-ary n-cubes and hypercubes

[...]

M.M. Bae¹, Bella Bose²•Institutions (2)

IBM¹, Oregon State University²

01 Oct 2003-IEEE Transactions on Computers

TL;DR: First, Lee distance Gray codes in Z/sub k//sup n/ are presented and then it is shown how these codes can directly be used to generate edge disjoint Hamiltonian cycles in k-ary n-cubes.

...read moreread less

Abstract: Solutions for decomposing a higher dimensional torus to edge disjoint lower dimensional tori, in particular, edge disjoint Hamiltonian cycles are obtained based on the coding theory approach. First, Lee distance Gray codes in Z/sub k//sup n/ are presented and then it is shown how these codes can directly be used to generate edge disjoint Hamiltonian cycles in k-ary n-cubes. Further, some new classes of binary Gray codes are designed from these Lee distance Gray codes and, using these new classes of binary Gray codes, edge disjoint Hamiltonian cycles in hypercubes are generated.

...read moreread less

Journal Article•DOI•

Voltage-clock-scaling adaptive scheduling techniques for low power in hard real-time systems

[...]

Chandan Krishna, Y.-H. Lee