scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Computers in 2000"


Journal ArticleDOI
TL;DR: The MorphoSys architecture is described, including the reconfigurable processor array, the control processor, and data and configuration memories, and the suitability of MorphoSy for the target application domain is illustrated with examples such as video compression, data encryption and target recognition.
Abstract: This paper introduces MorphoSys, a reconfigurable computing system developed to investigate the effectiveness of combining reconfigurable hardware with general-purpose processors for word-level, computation-intensive applications. MorphoSys is a coarse-grain, integrated, and reconfigurable system-on-chip, targeted at high-throughput and data-parallel applications. It is comprised of a reconfigurable array of processing cells, a modified RISC processor core, and an efficient memory interface unit. This paper describes the MorphoSys architecture, including the reconfigurable processor array, the control processor, and data and configuration memories. The suitability of MorphoSys for the target application domain is then illustrated with examples such as video compression, data encryption and target recognition. Performance evaluation of these applications indicates improvements of up to an order of magnitude (or more) on MorphoSys, in comparison with other systems.

895 citations


Journal ArticleDOI
TL;DR: In this article, the authors describe a potential fault-based attack where key bits leak only through the information whether the device produces a correct answer after a temporary fault or not, and this information is available to the adversary even if a check is performed before output.
Abstract: In order to avoid fault-based attacks on cryptographic security modules (e.g., smart-cards), some authors suggest that the computation results should be checked for faults before being transmitted. In this paper, we describe a potential fault-based attack where key bits leak only through the information whether the device produces a correct answer after a temporary fault or not. This information is available to the adversary even if a check is performed before output.

338 citations


Journal ArticleDOI
TL;DR: A new modified Booth encoding (MBE) scheme is proposed to improve the performance of traditional MBE schemes and a new algorithm is developed to construct multiple-level conditional-sum adder (MLCSMA).
Abstract: This paper presents a design methodology for high-speed Booth encoded parallel multiplier. For partial product generation, we propose a new modified Booth encoding (MBE) scheme to improve the performance of traditional MBE schemes. For final addition, a new algorithm is developed to construct multiple-level conditional-sum adder (MLCSMA). The proposed algorithm can optimize final adder according to the given cell properties and input delay profile. Compared with a binary tree-based conditional-sum adder, the speed performance improvement is up to 25 percent. On average, the design developed herein reduces the total delay by 8 percent for parallel multiplier. The whole design has been verified by gate level simulation.

263 citations


Journal ArticleDOI
TL;DR: Experimental results indicate that QoS negotiation, while maintaining real-time guarantees, enables graceful QoS degradation under conditions in which traditional schedulability analysis and admission control schemes fail.
Abstract: Real-time middleware services must guarantee predictable performance under specified load and failure conditions, and ensure graceful degradation when these conditions are violated. Guaranteed predictable performance typically entails reservation of resources and use of admission control. Graceful degradation, on the other hand, requires dynamic reallocation of resources to maximize the application-perceived system utility while coping with unanticipated overload and failures. We propose a model for quality-of-service (QoS) negotiation in building real-time services to meet both of the above requirements. QoS negotiation is shown to 1) outperform "binary" admission control schemes (either guaranteeing the required QoS or rejecting the service request), 2) achieve higher application-perceived system utility, and 3) deal with violations of the load and failure hypotheses. We incorporated the proposed QoS-negotiation model into an example real-time middleware service, called RTPOOL, which manages a distributed pool of shared computing resources (processors) to guarantee timeliness QoS for real-time applications. In order to guarantee timeliness QoS, the resource pool is encapsulated with its own schedulability analysis, admission control, and load-sharing support. This support differs from others in that it adheres to the proposed QoS-negotiation model. The efficacy and power of QoS negotiation are demonstrated for an automated flight control system implemented on a network of PCs running RTPOOL. This system is used to fly an F-16 fighter aircraft modeled using the Aerial Combat (ACM) F-16 Flight Simulator. Experimental results indicate that QoS negotiation, while maintaining real-time guarantees, enables graceful QoS degradation under conditions in which traditional schedulability analysis and admission control schemes fail.

245 citations


Journal ArticleDOI
TL;DR: A novel parallel-prefix architecture for high speed module 2/sup n/-1 adders is presented, based on the idea of recirculating the generate and propagate signals, instead of the traditional end-around carry approach.
Abstract: A novel parallel-prefix architecture for high speed module 2/sup n/-1 adders is presented. The proposed architecture is based on the idea of recirculating the generate and propagate signals, instead of the traditional end-around carry approach. Static CMOS implementations verify that the proposed architecture compares favorably with the already known parallel-prefix or carry look-ahead structures.

160 citations


Journal ArticleDOI
TL;DR: This paper employs the cellular programming evolutionary algorithm to automatically generate two-dimensional cellular automata (CA) RNGs, and demonstrates that they rapidly produce high-quality random-number sequences.
Abstract: Finding good random number generators (RNGs) is a hard problem that is of crucial import in several fields, ranging from large-scale statistical physics simulations to hardware self-test. In this paper, we employ the cellular programming evolutionary algorithm to automatically generate two-dimensional cellular automata (CA) RNGs. Applying an extensive suite of randomness tests to the evolved CAs, we demonstrate that they rapidly produce high-quality random-number sequences. Moreover, based on observations of the evolved CAs, we are able to handcraft even better RNGs, which not only outperform previously demonstrated high-quality RNGs, but can be potentially tailored to satisfy given hardware constraints.

160 citations


Journal ArticleDOI
TL;DR: New methods for producing optimal binary signed-digit representations that are useful in the fast computation of exponentiations are described, contrary to existing algorithms, which are scanned from left to right.
Abstract: This paper describes new methods for producing optimal binary signed-digit representations. This can be useful in the fast computation of exponentiations. Contrary to existing algorithms, the digits are scanned from left to right (i.e., from the most significant position to the least significant position). This may lead to better performances in both hardware and software.

155 citations


Journal ArticleDOI
TL;DR: This work presents a procedure by which additions and subtractions can be performed rapidly and accurately and shows that these operations are thereby competitive with their floating-point equivalents, and presents some large-scale case studies which show that the average performance of the LNS exceeds floating- point, in terms of both speed and accuracy.
Abstract: A new European research project aims to develop a microprocessor based on the logarithmic number system, in which a real number is represented as a fixed-point logarithm. Multiplication and division therefore proceed in minimal time with no rounding error. However, the system can only offer an overall advantage over floating-point if addition and subtraction can be performed with speed and accuracy at least equal to that of floating-point, but these operations require the interpolation of a nonlinear function which has hitherto been either time-consuming or inaccurate. We present a procedure by which additions and subtractions can be performed rapidly and accurately and show that these operations are thereby competitive with their floating-point equivalents. We then present some large-scale case studies which show that the average performance of the LNS exceeds floating-point, in terms of both speed and accuracy.

148 citations


Journal ArticleDOI
TL;DR: The most important new result is the space complexity of the Mastrovito multiplier for an equally-spaced-polynomial, which is found as (m/sup 2/-/spl Delta/) XOR gates and m/Sup 2/ AND gates, where /spl Delta/ is the spacing factor.
Abstract: We present a new formulation of the Mastrovito multiplication matrix for the field GF(2/sup m/) generated by an arbitrary irreducible polynomial We study in detail several specific types of irreducible polynomials, eg, trinomials, all-one-polynomials, and equally-spaced-polynomials, and obtain the time and space complexity of these designs Particular examples illustrating the properties of the proposed architecture are also given The complexity results established in this paper match the best complexity results known to date The most important new result is the space complexity of the Mastrovito multiplier for an equally-spaced-polynomial, which is found as (m/sup 2/-/spl Delta/) XOR gates and m/sup 2/ AND gates, where /spl Delta/ is the spacing factor

141 citations


Journal ArticleDOI
TL;DR: A hybrid design that combines the faster clock speed of a systolic array with the lower memory requirements of a shift register, resulting in a hybrid design; a tunable parameter allows switch designers to carefully balance the trade-off between bus loading and chip area.
Abstract: With effective packet-scheduling mechanisms, modern integrated networks can support the diverse quality-of-service requirements of emerging applications. However, arbitrating between a large number of small packets on a high-speed link requires an efficient hardware implementation of a priority queue. To highlight the challenges of building scalable priority queue architectures, this paper includes a detailed comparison of four existing approaches: a binary tree of comparators, priority encoder with multiple first-in-first-out lists, shift register, and systolic array. Based on these comparison results, we propose two new architectures that scale to the large number of packets (N) and large number of priority levels (P) necessary in modern switch designs. The first architecture combines the faster clock speed of a systolic array with the lower memory requirements of a shift register, resulting in a hybrid design; a tunable parameter allows switch designers to carefully balance the trade-off between bus loading and chip area. We then extend this architecture to serve multiple output ports in a shared-memory switch. This significantly decreases complexity over the traditional approach of dedicating a separate priority queue to each outgoing link. Using the Verilog hardware description language and the Epoch silicon compiler, we have designed and simulated these two new architectures, as well as the four existing approaches. The simulation experiments compare the designs across a range of priority queue sizes and performance metrics, including enqueue/dequeue speed, chip area, and number of transistors.

137 citations


Journal ArticleDOI
TL;DR: A new definition of the Montgomery inverse is given, and efficient algorithms for computing the classical modular inverse, the Kaliski-Montgomery inverse, and the new Montgomery inverse are introduced.
Abstract: We modify an algorithm given by Kaliski to compute the Montgomery inverse of an integer modulo a prime number. We also give a new definition of the Montgomery inverse, and introduce efficient algorithms for computing the classical modular inverse, the Kaliski-Montgomery inverse, and the new Montgomery inverse. The proposed algorithms are suitable for software implementations on general-purpose microprocessors.

Journal ArticleDOI
TL;DR: A class of count-and-threshold mechanisms, collectively named /spl alpha/-count, which are able to discriminate between transient faults and intermittent faults in computing systems and adopt a mathematically defined structure, which is simple enough to analyze by standard tools.
Abstract: This paper presents a class of count-and-threshold mechanisms, collectively named /spl alpha/-count, which are able to discriminate between transient faults and intermittent faults in computing systems. For many years, commercial systems have been using transient fault discrimination via threshold-based techniques. We aim to contribute to the utility of count-and-threshold schemes, by exploring their effects on the system. We adopt a mathematically defined structure, which is simple enough to analyze by standard tools. /spl alpha/-count is equipped with internal parameters that can be tuned to suit environmental variables (such as transient fault rate, intermittent fault occurrence patterns). We carried out an extensive behavior analysis for two versions of the count-and-threshold scheme, assuming, first, exponentially distributed fault occurrencies and, then, more realistic fault patterns.

Journal ArticleDOI
TL;DR: A method is proposed, based on argument reduction and series expansion, that allows fast evaluation of these functions in high precision, and the strength of this method is that the same scheme allows the computation of all these functions.
Abstract: This paper deals with the computation of reciprocals, square roots, inverse square roots, and some elementary functions using small tables, small multipliers, and, for some functions, a final "large" (almost full-length) multiplication. We propose a method, based on argument reduction and series expansion, that allows fast evaluation of these functions in high precision. The strength of this method is that the same scheme allows the computation of all these functions. We estimate the delay, the size/number of tables, and the size/number of multipliers and compare with other related methods.

Journal ArticleDOI
TL;DR: In this article, two file assignment algorithms based on open queuing networks are proposed to minimize simultaneously the load balance across all disks, as well as the variance of the service time at each disk.
Abstract: We address the problem of assigning nonpartitioned files in a parallel I/O system where the file accesses exhibit Poisson arrival rates and fixed service times. We present two new file assignment algorithms based on open queuing networks which aim at minimizing simultaneously the load balance across all disks, as well as the variance of the service time at each disk. We first present an off-line algorithm, Sort Partition, which assigns to each disk file with similar access time. Next, we show that, assuming that a perfectly balanced file assignment can be found for a given set of files, Sort Partition will find the one with minimal mean response time. We then present an on-line algorithm, Hybrid Partition, that assigns groups of files with similar service times in successive intervals while guaranteeing that the load imbalance at any point does not exceed a certain threshold. We report on synthetic experiments which exhibit skew in file accesses and sizes and we compare the performance of our new algorithms with the vanilla greedy file allocation algorithm.

Journal ArticleDOI
TL;DR: A Stream Memory Controller (SMC) system that combines compile-time detection of streams with execution-time selection of the access order and issue and is practical to implement, using existing compiler technology and requiring only a modest amount of special purpose hardware.
Abstract: Memory bandwidth is rapidly becoming the limiting performance factor for many applications, particularly for streaming computations such as scientific vector processing or multimedia (de)compression. Although these computations lack the temporal locality of reference that makes traditional caching schemes effective, they have predictable access patterns. Since most modern DRAM components support modes that make it possible to perform some access sequences faster than others, the predictability of the stream accesses makes it possible to reorder them to get better memory performance. We describe a Stream Memory Controller (SMC) system that combines compile-time detection of streams with execution-time selection of the access order and issue. The SMC effectively prefetches read-streams, buffers write-streams, and reorders the accesses to exploit the existing memory bandwidth as much as possible. Unlike most other hardware prefetching or stream buffer designs, this system does not increase bandwidth requirements. The SMC is practical to implement, using existing compiler technology and requiring only a modest amount of special purpose hardware. We present simulation results for fast-page mode and Rambus DRAM memory systems and we describe a prototype system with which we have observed performance improvements for inner loops by factors of 13 over traditional access methods.

Journal ArticleDOI
TL;DR: This paper defines a new memory consistency model, called Location Consistency (LC), in which the state of a memory location is modeled as a partially ordered multiset (pomset) of write and synchronization operations.
Abstract: Existing memory models and cache consistency protocols assume the memory coherence property which requires that all processors observe the same ordering of write operations to the same location In this paper, we address the problem of defining a memory model that does not rely on the memory coherence assumption and also the problem of designing a cache consistency protocol based on such a memory model We define a new memory consistency model, called Location Consistency (LC), in which the state of a memory location is modeled as a partially ordered multiset (pomset) of write and synchronization operations We prove that LC is strictly weaker than existing memory models, but is still equivalent to stronger models for the common case of parallel programs that have no data races We also describe a new multiprocessor cache consistency protocol based on the LC memory model We prove that this LC protocol obeys the LC memory model The LC protocol does not need to enforce single write ownership of memory blocks As a result, the LC protocol is simpler and more scalable than existing snooping and directory-based cache consistency protocols

Journal ArticleDOI
TL;DR: This work proposes sacrificing some performance in exchange for energy efficiency by filtering cache references through an unusually small first level cache, which results in a 51 percent reduction in the energy-delay product when compared to a conventional design.
Abstract: Most modern microprocessors employ one or two levels of on-chip caches in order to improve performance. Caches typically are implemented with static RAM cells and often occupy a large portion of the chip area. Not surprisingly, these caches can consume a significant amount of power. In many applications, such as portable devices, energy efficiency is more important than performance. We propose sacrificing some performance in exchange for energy efficiency by filtering cache references through an unusually small first level cache. We refer to this structure as the filter cache. A second level cache, similar in size and structure to a conventional first level cache, is positioned behind the filter cache and serves to mitigate the performance loss. Extensive experiments indicate that a small filter cache still can achieve a high hit rate and good performance. This approach allows the second level cache to be in a low power mode most of the time, thus resulting in power savings. The filter cache is particularly attractive in low power applications, such as the embedded processors used for communication and multimedia applications. For example, experimental results across a wide range of embedded applications show that a direct mapped 255-byte filter cache achieves a 58 percent power reduction while reducing performance by 21 percent. This trade-off results in a 51 percent reduction in the energy-delay product when compared to a conventional design.

Journal ArticleDOI
TL;DR: This paper proposes and evaluates an approach called delayed addition that reduces the carry-propagation bottleneck and improves the performance of arithmetic calculations and presents both integer and floating-point designs that use the technique.
Abstract: The speed of arithmetic calculations in configurable hardware is limited by carry propagation, even with the dedicated hardware found in recent FPGAs. This paper proposes and evaluates an approach called delayed addition that reduces the carry-propagation bottleneck and improves the performance of arithmetic calculations. Our approach employs the idea used in Wallace trees to store the results in an intermediate form and delay addition until the end of a repeated calculation such as accumulation or dot-product; this effectively removes carry propagation overhead from the calculation's critical path. We present both integer and floating-point designs that use our technique. Our pipelined integer multiply-accumulate (MAC) design is based on a fairly traditional multiplier design, but with delayed addition as well. This design achieves a 72 MHz clock rate on an XC4036xla-9 FPGA and 170 MHz clock rate on an XV300epq240-8 FPGA. Next, we present a 32-bit floating-point accumulator based on delayed addition. Here, delayed addition requires a novel alignment technique that decouples the incoming operands from the accumulated result. A conservative version of this design achieves a 40 MHz clock rate on an XC4036xla-9 FPGA and 97 MHz clock rate on an XV100epq240-8 FPGA. We also present a 32-bit floating-point accumulator design with compiler-managed overflow avoidance that achieves a 80 MHz clock rate on an XC4036xla-9 FPGA and 150 MHz clock rate on an XCV100epq240-8 FPGA.

Journal ArticleDOI
TL;DR: In this paper, a new method which uses continuous learning automata to solve the capacity assignment problem is introduced. But, the authors assume that the traffic consists of different classes of packets with different average packet lengths and priorities.
Abstract: The Capacity Assignment (CA) problem focuses on finding the best possible set of capacities for the links that satisfies the traffic requirements in a prioritized network while minimizing the cost. Most approaches consider a single class of packets flowing through the network, but, in reality, different classes of packets with different packet lengths and priorities are transmitted over the networks. In this paper, we assume that the traffic consists of different classes of packets with different average packet lengths and priorities. We shall look at three different solutions to this problem. K. Marayuma and D.T. Tang (1977) proposed a single algorithm composed of several elementary heuristic procedures. A. Levi and C. Ersoy (1994) introduced a simulated annealing approach that produced substantially better results. In this paper, we introduce a new method which uses continuous learning automata to solve the problem. Our new schemes produce superior results when compared with either of the previous solutions and is, to our knowledge, currently the best known solution.

Journal ArticleDOI
TL;DR: Timed messages guarantee deterministic operation by presenting consistent message versions to the replicated tasks and are very effective since they neither require communication between the local scheduler nor do they restrict usage of on-line flexible scheduling, preemptions and nonidentically replicated task sets.
Abstract: Fault-tolerant real-time systems are typically based on active replication where replicated entities are required to deliver their outputs in an identical order within a given time interval. Distributed scheduling of replicated tasks, however, violates this requirement if on-line scheduling, preemptive scheduling, or scheduling of dissimilar replicated task sets is employed. This problem of inconsistent task outputs has been solved previously by coordinating the decisions of the local schedulers such that replicated tasks are executed in an identical order. Global coordination results either in an extremely high communication effort to agree on each schedule decision or in an overly restrictive execution model where on-line scheduling, arbitrary preemptions, and nonidentically replicated task sets are not allowed. To overcome these restrictions, a new method, called timed messages, is introduced. Timed messages guarantee deterministic operation by presenting consistent message versions to the replicated tasks. This approach is based on simulated common knowledge and a sparse time base. Timed messages are very effective since they neither require communication between the local scheduler nor do they restrict usage of on-line flexible scheduling, preemptions and nonidentically replicated task sets.

Journal ArticleDOI
TL;DR: It is concluded that the new rounding algorithm is the fastest rounding algorithm, provided that an injection can be added in during the reduction of the partial products into a carry-save encoded digit string.
Abstract: A new IEEE compliant floating-point rounding algorithm for computing the rounded product from a carry-save representation of the product is presented. The new rounding algorithm is compared with the rounding algorithms of Yu and Zyner (1995) and of Quach et al. (1991). For each rounding algorithm, a logical description and a block diagram is given, the correctness is proven, and the latency is analyzed. We conclude that the new rounding algorithm is the fastest rounding algorithm, provided that an injection (which depends only on the rounding mode and the sign) can be added in during the reduction of the partial products into a carry-save encoded digit string. In double precision format, the latency of the new rounding algorithm is 12 logic levels compared to 14 logic levels in the algorithm of Quach et al. and 16 logic levels in the algorithm of Yu and Zyner.

Journal ArticleDOI
TL;DR: A new family of voting algorithms, called Omission Mean Subsequence Reduced (OMSR), which implicitly recognize and exploit omissive behavior in malicious faults while still maintaining full Byzantine fault tolerance and it is shown that OMSR voting algorithms are more fault-tolerant than previous voting algorithms if any of the currently active faults is omissive.
Abstract: In a fault-tolerant distributed system, it is often necessary for nonfaulty processes to agree on the value of a shared data item. The criterion of Approximate Agreement does not require processes to achieve exact agreement on a value; rather, they need only agree to within a predefined numerical tolerance. Approximate Agreement can be achieved through convergent voting algorithms. Previous research has studied convergent voting algorithms under mixed-mode or hybrid fault models, such as the Thambidurai and Park Hybrid fault model, comprised of three fault modes: asymmetric, symmetric, and benign. This paper makes three major contributions to the state of the art in fault-tolerant convergent voting. (1) We partition both the asymmetric and symmetric fault modes into disjoint omissive and transmissive submodes. The resulting five-mode hybrid fault model is a superset of previous hybrid fault models. (2) We present a new family of voting algorithms, called Omission Mean Subsequence Reduced (OMSR), which implicitly recognize and exploit omissive behavior in malicious faults while still maintaining full Byzantine fault tolerance; (3) We show that OMSR voting algorithms are more fault-tolerant than previous voting algorithms if any of the currently active faults is omissive.

Journal ArticleDOI
TL;DR: Hardware designs, arithmetic algorithms, and software support for a family of variable-precision, interval arithmetic processors that give the programmer the ability to detect and, if desired, to correct implicit errors in finite precision numerical computations.
Abstract: Traditional computer systems often suffer from roundoff error and catastrophic cancellation in floating point computations. These systems produce apparently high precision results with little or no indication of the accuracy. This paper presents hardware designs, arithmetic algorithms, and software support for a family of variable-precision, interval arithmetic processors. These processors give the programmer the ability to detect and, if desired, to correct implicit errors in finite precision numerical computations. They also provide the ability to solve problems that cannot be solved efficiently using traditional floating point computations. Execution time estimates indicate that these processors are two to three orders of magnitude faster than software packages that provide similar functionality.

Journal ArticleDOI
TL;DR: This paper studies a scheme that guarantees the timely recovery from multiple faults within hard real-time constraints in uniprocessor systems and develops a necessary and sufficient feasibility-check algorithm for fault-tolerant scheduling with complexity O(n/sup 2/-/spl kappa/).
Abstract: Real-time systems are being increasingly used in several applications which are time-critical in nature. Fault tolerance is an essential requirement of such systems, due to the catastrophic consequences of not tolerating faults. In this paper, we study a scheme that guarantees the timely recovery from multiple faults within hard real-time constraints in uniprocessor systems. Assuming earliest-deadline-first scheduling (EDF) for aperiodic preemptive tasks, we develop a necessary and sufficient feasibility-check algorithm for fault-tolerant scheduling with complexity O(n/sup 2/-/spl kappa/), where n is the number of tasks to be scheduled and /spl kappa/ is the maximum number of faults to be tolerated.

Journal ArticleDOI
TL;DR: The aim of this paper is to accelerate division, square root, and square root reciprocal computations when the Goldschmidt method is used on a pipelined multiplier by replacing the last iteration by the addition of a correcting term that can be looked up during the early iterations.
Abstract: The aim of this paper is to accelerate division, square root, and square root reciprocal computations when the Goldschmidt method is used on a pipelined multiplier. This is done by replacing the last iteration by the addition of a correcting term that can be looked up during the early iterations. We describe several variants of the Goldschmidt algorithm, assuming 4-cycle pipelined multiplier, and discuss obtained number of cycles and error achieved. Extensions to other than 4-cycle multipliers are given. If we call G/sub m/ the Goldschmidt algorithm with m iterations, our variants allow us to reach an accuracy that is between that of G/sub 3/ and that of G/sub 4/, with a number of cycle equal to that of G/sub 3/.

Journal ArticleDOI
TL;DR: A new fast and efficient reconfiguration algorithm is proposed and empirical study shows that the new algorithm indeed produces good results in terms of the percentages of harvest and degradation of VLSI/WSI arrays.
Abstract: This paper considers the problem of reconfiguring two-dimensional degradable VLSI/WSI arrays under the constraint of row and column rerouting. The goal of the reconfiguration problem is to derive a fault-free subarray T from the defective host array such that the dimensions of T are larger than some specified minimum. This problem has been shown to be NP-complete under various switching and routing constraints. However, we show that a special case of the reconfiguration problem is optimally solvable in linear time. Using this result, a new fast and efficient reconfiguration algorithm is proposed. Empirical study shows that the new algorithm indeed produces good results in terms of the percentages of harvest and degradation of VLSI/WSI arrays.

Journal ArticleDOI
TL;DR: In this article, a branch-and-bound algorithm for computing an orthogonal grid drawing with the minimum number of bends of a biconnected planar graph is presented.
Abstract: We describe a branch-and-bound algorithm for computing an orthogonal grid drawing with the minimum number of bends of a biconnected planar graph. Such an algorithm is based on an efficient enumeration schema of the embeddings of a planar graph and on several new methods for computing lower bounds of the number of bends. We experiment with such algorithm on a large test suite and compare the results with the state of the art. The experiments show the feasibility of the approach and also its limitations. Further, the experiments show how minimizing the number of bends has positive effects on other quality measures of the effectiveness of the drawing. We also present a new method for dealing with vertices of degree larger than four.

Journal ArticleDOI
TL;DR: The mixed traffic scheduler (MTS) for CAN is presented, which provides higher schedulability than fixed-priority schemes like deadline-monotonic (DM) while incurring less overhead than dynamic earliest-deadline (ED) scheduling.
Abstract: The Controller Area Network (CAN) is being widely used in real-time control applications such as automobiles, aircraft, and automated factories. In this pacer, we present the mixed traffic scheduler (MTS) for CAN, which provides higher schedulability than fixed-priority schemes like deadline-monotonic (DM) while incurring less overhead than dynamic earliest-deadline (ED) scheduling. We also describe how MTS can be implemented on existing CAN network adapters such as Motorola's TouCAN. In previous work, we had shown MTS to be far superior to DM in schedulability performance. In this paper, we present implementation overhead measurements showing that processing needed to support MTS consumes only about 5 to 6 percent of CPU time. Considering its schedulability advantage, this makes MTS ideal for use in control applications.

Journal ArticleDOI
TL;DR: A new period-based approach to workload partitioning and assignment for very large distributed real-time systems, in which software components are typically organized hierarchically, and hardware components potentially span several shared and/or dedicated links.
Abstract: We propose a new approach to the problem of workload partitioning and assignment for very large distributed real-time systems, in which software components are typically organized hierarchically, and hardware components potentially span several shared and/or dedicated links. Existing approaches for load partitioning and assignment are based on either schedulability or communication. The first category attempts to construct a feasible schedule for various assignments and chooses the one that minimizes task lateness (or other similar criteria), while the second category partitions the workload heuristically in accordance with the amount of intertask communication. We propose, and argue for, a (new) third category based on task periods, which, among others, combines the ability of handling heterogeneity with excellent scalability. Our algorithm is a recursive invocation of two stages: clustering and assignment. The clustering stage partitions tasks and processors into clusters. The assignment stage maps task clusters to processor clusters. A later scheduling stage will compute a feasible schedule, if any, when the size of processor clusters reduces to one at the bottom of the recursion tree. We introduce a new clustering heuristic and evaluate elements of the period-based approach using simulations to verify its suitability for large real-time applications. Also presented is an example application drawn from the field of command and control that has the potential to benefit significantly from the proposed approach.

Journal ArticleDOI
TL;DR: It is shown that, to maintain point to point and broadcast connectivities, there must be at least S extra stages to tolerate I switch failures and it is proved that an n-dimensional multistage interconnection network is optimally fault-tolerant if and only if the mask vectors of every n consecutive stages span the n- dimensional vector space.
Abstract: Adams and Siegel (1982) proposed an extra stage cube interconnection network that tolerates one switch failure with one extra stage. We extend their results and discover a class of extra stage interconnection networks that tolerate multiple switch failures with a minimal number of extra stages. Adopting the same fault model as Adams and Siegel, the faulty switches can be bypassed by a pair of demultiplexer/multiplexer combinations. It is easy to show that, to maintain point to point and broadcast connectivities, there must be at least S extra stages to tolerate I switch failures. We present the first known construction of an extra stage interconnection network that meets this lower-bound. This 12-dimensional multistage interconnection network has n+f stages and tolerates I switch failures. An n-bit label called mask is used for each stage that indicates the bit differences between the two inputs coming into a common switch. We designed the fault-tolerant construction such that it repeatedly uses the singleton basis of the n-dimensional vector space as the stage mask vectors. This construction is further generalized and we prove that an n-dimensional multistage interconnection network is optimally fault-tolerant if and only if the mask vectors of every n consecutive stages span the n-dimensional vector space.