Showing papers in "IEEE Transactions on Computers in 1994"

PDF

Open Access

Journal Article•DOI•

Fuzzy systems as universal approximators

[...]

Bart Kosko¹•Institutions (1)

01 Nov 1994-IEEE Transactions on Computers

TL;DR: An additive fuzzy system can uniformly approximate any real continuous function on a compact domain to any degree of accuracy.

...read moreread less

Abstract: An additive fuzzy system can uniformly approximate any real continuous function on a compact domain to any degree of accuracy. An additive fuzzy system approximates the function by covering its graph with fuzzy patches in the input-output state space and averaging patches that overlap. The fuzzy system computes a conditional expectation E|Y|X| if we view the fuzzy sets as random sets. Each fuzzy rule defines a fuzzy patch and connects commonsense knowledge with state-space geometry. Neural or statistical clustering systems can approximate the unknown fuzzy patches from training data. These adaptive fuzzy systems approximate a function at two levels. At the local level the neural system approximates and tunes the fuzzy rules. At the global level the rules or patches approximate the function. >

...read moreread less

1,282 citations

Journal Article•DOI•

Theory and applications of cellular automata in cryptography

[...]

Sukumar Nandi¹, B.K. Kar¹, P. Pal Chaudhuri¹•Institutions (1)

Indian Institute of Technology Bombay¹

01 Dec 1994-IEEE Transactions on Computers

TL;DR: High quality pseudorandom pattern generators built around rule 90 and 150 programmable cellular automata with a rule selector has been proposed as running key generators in stream ciphers, both the schemes provide better security against different types of attacks.

...read moreread less

Abstract: This paper deals with the theory and application of Cellular Automata (CA) for a class of block ciphers and stream ciphers. Based on CA state transitions certain fundamental transformations are defined which are block ciphering functions of the proposed enciphering scheme, These fundamental transformations are found to generate the simple (alternating) group of even permutations which in turn is a subgroup of the permutation group, These functions are implemented with a class of programmable cellular automata (PCA) built around rules 51, 153, and 195. Further, high quality pseudorandom pattern generators built around rule 90 and 150 programmable cellular automata with a rule selector (i.e., combining function) has been proposed as running key generators in stream ciphers, Both the schemes provide better security against different types of attacks. With a simple, regular, modular and cascadable structure of CA, hardware implementation of such schemes idealy suit VLSI implementation. >

...read moreread less

381 citations

Journal Article•DOI•

Distributed reset

[...]

Anish Arora¹, Mohamed G. Gouda•Institutions (1)

Ohio State University¹

01 Sep 1994-IEEE Transactions on Computers

TL;DR: A reset subsystem is designed that can be embedded in an arbitrary distributed system in order to allow the system processes to reset the system when necessary, and is very robust: it can tolerate fail-stop failures and repairs of processes and channels, even when a reset is in progress.

...read moreread less

Abstract: A reset subsystem is designed that can be embedded in an arbitrary distributed system in order to allow the system processes to reset the system when necessary. Our design is layered, and comprises three main components: a leader election, a spanning tree construction, and a diffusing computation. Each of these components is self-stabilizing in the following sense: if the coordination between the up-processes in the system is ever lost (due to failures or repairs of processes and channels), then each component eventually reaches a state where coordination is regained. This capability makes our reset subsystem very robust: it can tolerate fail-stop failures and repairs of processes and channels, even when a reset is in progress. >

...read moreread less

313 citations

Journal Article•DOI•

Conditional connectivity measures for large multiprocessor systems

[...]

Shahram Latifi¹, M. Hegde², Mort Naraghi-Pour²•Institutions (2)

University of Nevada, Las Vegas¹, Louisiana State University²

01 Feb 1994-IEEE Transactions on Computers

TL;DR: The vertex connectivity for the n-dimensional cube is obtained, and the minimal sets of faulty nodes that disconnect the cube are characterized.

...read moreread less

Abstract: Introduces a new measure of conditional connectivity for large regular graphs by requiring each vertex to have at least g good neighbors in the graph. Based on this requirement, the vertex connectivity for the n-dimensional cube is obtained, and the minimal sets of faulty nodes that disconnect the cube are characterized. >

...read moreread less

270 citations

Journal Article•DOI•

Testing finite-state machines: state identification and verification

[...]

D. Lee¹, Mihalis Yannakakis¹•Institutions (1)

Bell Labs¹

01 Mar 1994-IEEE Transactions on Computers

TL;DR: In this paper, the complexity of finite-state machine testing has been studied and it has been shown that it is PSPACE-complete to determine whether a finite state machine has a preset distinguishing sequence.

...read moreread less

Abstract: We study the complexity of two fundamental problems in the testing of finite-state machines. 1) Distinguishing sequences (state identification). We show that it is PSPACE-complete to determine whether a finite-state machine has a preset distinguishing sequence. There are machines that have distinguishing sequences, but only of exponential length. We give a polynomial time algorithm that determines whether a finite-state machine has an adaptive distinguishing sequence. (The previous classical algorithms take exponential time.) Furthermore, if there is an adaptive distinguishing sequence, then we give an efficient algorithm that constructs such a sequence of length at most n(n/spl minus/1)/2 (which is the best possible), where n is the number of states. 2) Unique input output sequences (state verification). It is PSPACE-complete to determine whether a state of a machine has a unique input output sequence. There are machines whose states have unique input output sequences but only of exponential length. >

...read moreread less

266 citations

Journal Article•DOI•

False sharing and spatial locality in multiprocessor caches

[...]

Josep Torrellas¹, H.S. Lam², John L. Hennessy²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Stanford University²

01 Jun 1994-IEEE Transactions on Computers

TL;DR: To mitigate false sharing and to enhance spatial locality, the layout of shared data in cache blocks is optimized in a programmer-transparent manner and it is shown that this approach can reduce the number of misses on shared data by about 10% on average.

...read moreread less

Abstract: The performance of the data cache in shared-memory multiprocessors has been shown to be different from that in uniprocessors. In particular, cache miss rates in multiprocessors do not show the sharp drop typical of uniprocessors when the size of the cache block increases. The resulting high cache miss rate is a cause of concern, since it can significantly limit the performance of multiprocessors. Some researchers have speculated that this effect is due to false sharing, the coherence transactions that result when different processors update different words of the same cache block in an interleaved fashion. While the analysis of six applications in the paper confirms that false sharing has a significant impact on the miss rate, the measurements also show that poor spatial locality among accesses to shared data has an even larger impact. To mitigate false sharing and to enhance spatial locality, we optimize the layout of shared data in cache blocks in a programmer-transparent manner. We show that this approach can reduce the number of misses on shared data by about 10% on average. >

...read moreread less

265 citations

Journal Article•DOI•

Design of residue generators and multioperand modular adders using carry-save adders

[...]

Stanislaw J. Piestrak¹•Institutions (1)

University of Wrocław¹

01 Jan 1994-IEEE Transactions on Computers

TL;DR: A comprehensive study of new residue generators and MOMA's is presented and four design schemes of the n-input residue generators mod A, which are best suited for various pairs of n and A, are proposed.

...read moreread less

Abstract: Residue generator is an essential building block of encoding/decoding circuitry for arithmetic error detecting codes and binary-to-residue number system (RNS) converter. In either case, a residue generator is an overhead for a system and as such it should be built with minimum amount of hardware and should not compromise the speed of a system. Multioperand modular adder (MOMA) is a computational element used to implement various operations in digital signal processing systems using RNS. A comprehensive study of new residue generators and MOMA's is presented. The design methods given here take advantage of the periodicity of the series of powers of 2 taken module A (A is a module). Four design schemes of the n-input residue generators mod A, which are best suited for various pairs of n and A, are proposed. Their pipelined versions can be clocked with the cycle determined by the delay of a full-adder and a latch. A family of design methods for parallel and word-serial, using similar concepts, is also given. Both classes of circuits employ new highly-parallel schemes using carry-save adders with end-around carry and a minimal amount of ROM and are well-suited for VLSI implementation. They are faster and use less hardware than similar circuits known to date. One of the MOMA's can be used to build a high-speed residue-to-binary converter based on the Chinese remainder theorem. >

...read moreread less

224 citations

Journal Article•DOI•

Guaranteeing synchronous message deadlines with the timed token medium access control protocol

[...]

G. Agrawal¹, Biao Chen¹, Wei Zhao¹, Sadegh Davari²•Institutions (2)

Texas A&M University¹, University of Houston²

01 Mar 1994-IEEE Transactions on Computers

TL;DR: The problem of guaranteeing synchronous message deadlines in token ring networks where the timed token medium access control protocol is employed is studied and a normalized proportional allocation scheme is proposed, which can guarantee the synchronous messages deadlines for synchronous traffic of up to 33% of available utilization.

...read moreread less

Abstract: We study the problem of guaranteeing synchronous message deadlines in token ring networks where the timed token medium access control protocol is employed. Synchronous bandwidth, defined as the maximum time for which a node can transmit its synchronous messages every time it receives the token, is a key parameter in the control of synchronous message transmission. To ensure the transmission of synchronous messages before their deadlines, synchronous capacities must be properly allocated to individual nodes. We address the issue of appropriate allocation of the synchronous capacities. Several synchronous bandwidth allocation schemes are analyzed in terms of their ability to satisfy deadline constraints of synchronous messages. We show that an inappropriate allocation of the synchronous capacities could cause message deadlines to be missed, even if the synchronous traffic is extremely low. We propose a scheme, called the normalized proportional allocation scheme, which can guarantee the synchronous message deadlines for synchronous traffic of up to 33% of available utilization. >

...read moreread less

177 citations

Journal Article•DOI•

A comparison of trace-sampling techniques for multi-megabyte caches

[...]

R. E. Kessler¹, Mark D. Hill¹, Darien Wood¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jun 1994-IEEE Transactions on Computers

TL;DR: The paper compares the trace-sampling techniques of set sampling and time sampling using the multi-billion reference traces of A.A. Borg et al. (1990) and applies both techniques to multi-megabyte caches, where sampling is most valuable, to find that set sampling meets the 10% sampling goal, while time sampling does not.

...read moreread less

Abstract: The paper compares the trace-sampling techniques of set sampling and time sampling Using the multi-billion reference traces of A Borg et al (1990), we apply both techniques to multi-megabyte caches, where sampling is most valuable We evaluate whether either technique meets a 10% sampling goal: a method meets this goal if, at least 90% of the time, it estimates the trace's true misses per instruction with /spl les/10% relative error using /spl les/10% of the trace Results for these traces and caches show that set sampling meets the 10% sampling goal, while time sampling does not We also find that cold-start bias in time samples is most effectively reduced by the technique of DA Wood et al (1991) Nevertheless, overcoming cold-start bias requires tens of millions of consecutive references >

...read moreread less

147 citations

Journal Article•DOI•

Hypercube communication delay with wormhole routing

[...]

Jong Kim, Chita R. Das

01 Jul 1994-IEEE Transactions on Computers

TL;DR: An analytical model is presented for the performance evaluation of hypercube computers aimed at modeling a deadlock-free wormhole routing scheme prevalent on second generation hypercube systems and extended to virtual cut-through routing and random wormholes routing techniques.

...read moreread less

Abstract: We present an analytical model for the performance evaluation of hypercube computers. This analysis is aimed at modeling a deadlock-free wormhole routing scheme prevalent on second generation hypercube systems. Probability of blocking and average message delay are the two performance measures discussed. We start with the communication traffic to find the probability of blocking. The traffic analysis can capture any message destination distribution. Next, we find the average message delay that consists of two parts. The first part is the actual message transfer delay between any source and destination nodes. The second part of the delay is due to blocking caused by the wormhole routing scheme. The analysis is also extended to virtual cut-through routing and random wormhole routing techniques. The validity of the model is demonstrated by comparing analytical results with those from simulation. >

...read moreread less

127 citations

Journal Article•DOI•

Interleaved memory function interpolators with application to an accurate LNS arithmetic unit

[...]

David Lewis¹•Institutions (1)

University of Toronto¹

01 Aug 1994-IEEE Transactions on Computers

TL;DR: A new method for polynomial interpolation in hardware, with advantages demonstrated by its application to an accurate logarithmic number system (LNS) arithmetic unit, using an interleaved memory function interpolator.

...read moreread less

Abstract: This paper describes a new method for polynomial interpolation in hardware, with advantages demonstrated by its application to an accurate logarithmic number system (LNS) arithmetic unit. The use of an interleaved memory reduces storage requirements by allowing each stored function value to be used in interpolation across several segments. This strategy can be shown to always use fewer words of memory than an optimized polynomial with stored polynomial coefficients. Interleaved memory function interpolators are then applied to the specific goal of an accurate logarithmic number system arithmetic unit. Many accuracy requirements for the LNS arithmetic unit are possible. Although a round to nearest would be desirable, it cannot be easily achieved. The goal suggested is to insure that the worst case LNS relative error is smaller than the worst case floating point (FP) relative error. Using the interleaved memory interpolator, the detailed design of an LNS arithmetic unit is performed using a second order polynomial interpolator including approximately 91K bits of ROM. This arithmetic unit has better accuracy and less complexity than previous LNS units. >

...read moreread less

Journal Article•DOI•

Hardware designs for exactly rounded elementary functions

[...]

Michael J. Schulte, Earl E. Swartzlander

01 Aug 1994-IEEE Transactions on Computers

TL;DR: Hardware designs that produce exactly rounded results for the functions of reciprocal, square-root, 2/sup x/, and log/sub 2/(x) are presented, and delay and area comparisons are made based on the degree of the approximating polynomial and the accuracy of the final result.

...read moreread less

Abstract: This paper presents hardware designs that produce exactly rounded results for the functions of reciprocal, square-root, 2/sup x/, and log/sub 2/(x). These designs use polynomial approximation in which the terms in the approximation are generated in parallel, and then summed by using a multi-operand adder. To reduce the number of terms in the approximation, the input interval is partitioned into subintervals of equal size, and different coefficients are used for each subinterval. The coefficients used in the approximation are initially determined based on the Chebyshev series approximation. They are then adjusted to obtain exactly rounded results for all inputs. Hardware designs are presented, and delay and area comparisons are made based on the degree of the approximating polynomial and the accuracy of the final result. For single-precision floating point numbers, a design that produces exactly rounded results for all four functions has an estimated delay of 80 ns and a total chip area of 98 mm/sup 2/ in a 1.0-micron CMOS technology. Allowing the results to have a maximum error of one unit in the last place reduces the computational delay by 5% to 30% and the area requirements by 33% to 77%. >

...read moreread less

Journal Article•DOI•

Efficient Boolean manipulation with OBDD's can be extended to FBDD's

[...]

Jordan Gergov¹, Christoph Meinel¹•Institutions (1)

University of Trier¹

01 Oct 1994-IEEE Transactions on Computers

TL;DR: It is demonstrated that the verification of the circuit design for the hidden weighted bit function proposed Bryant can be carried out efficiently in terms of FBDD's while this is, for principal reasons, impossible in termsof OBDD's.

...read moreread less

Abstract: OBDD's are the state-of-the-art data structure for Boolean function manipulation. Basic tasks of Boolean manipulation such as equivalence test, satisfiability test, tautology test and single Boolean synthesis steps can be performed efficiently in terms of fixed ordered OBDD's. The bottleneck of most OBDD-applications is the size of the represented Boolean functions since the total computation merely remains tractable as long as the OBDD-representations remain of reasonable size. Since it is well known that OBDD's are restricted FBDD's (free BDD's, i.e., BDD's that test, on each path, each input variable at most once), and that FBDD-representations are often much more (sometimes even exponentially more) concise than OBDD-representations. We propose to work with a more general FBDD-based data structure. We show that FBDD's of a fixed type provide, similar as OBDD's of a fixed variable ordering, canonical representations of Boolean functions, and that basic tasks of Boolean manipulation can be performed in terms of fixed typed FBDD's similarly efficient as in terms of fixed ordered OBDD's. In order to demonstrate the power of the FBDD-concept we show that the verification of the circuit design for the hidden weighted bit function proposed Bryant can be carried out efficiently in terms of FBDD's while this is, for principal reasons, impossible in terms of OBDD's. >

...read moreread less

Journal Article•DOI•

Roll-forward checkpointing scheme: a novel fault-tolerant architecture

[...]

Dhiraj K. Pradhan¹, Nitin H. Vaidya¹•Institutions (1)

Texas A&M University¹

01 Oct 1994-IEEE Transactions on Computers

TL;DR: A novel architecture for a fault-tolerant multiprocessor environment that achieves performance of a triple modular redundant system using duplex system redundancy and requires no rollbacks for recovering from single faults is proposed.

...read moreread less

Abstract: We propose a novel architecture for a fault-tolerant multiprocessor environment. It is assumed that the multiprocessor organization consists of a pool of active processing modules and either a small number of spare modules or active modules with some spare processing capacity. A fault-tolerance scheme is developed for duplex systems using checkpoints. Our scheme, unlike traditional checkpointing schemes, requires no rollbacks for recovering from single faults. The objective is to achieve performance of a triple modular redundant system using duplex system redundancy. >

...read moreread less

Journal Article•DOI•

Salphasic distribution of clock signals for synchronous systems

[...]

Vernon L. Chi¹•Institutions (1)

University of North Carolina at Chapel Hill¹

01 May 1994-IEEE Transactions on Computers

TL;DR: A method that exploits properties of standing waves to reduce substantially clock skews due to unequal path lengths, for distribution network diameters up to several meters is described.

...read moreread less

Abstract: The design of a synchronous system having a global clock must account for propagation-delay-induced phase shifts experienced by the clock signal (clock skew) in its distribution network. As clock speeds and system diameters increase, this requirement becomes increasingly constraining on system designs. The paper describes a method that exploits properties of standing waves to reduce substantially clock skews due to unequal path lengths, for distribution network diameters up to several meters. The basic principles are developed for a loaded transmission line, and then applied to an arbitrary branching tree of such lines to implement a clock distribution network. The extension of this method to two- and three-dimensional distribution media is also presented, suggesting the feasibility of implementing printed circuit board clock planes exhibiting negligible phase shift over their extents. >

...read moreread less

Journal Article•DOI•

Algorithm-based fault tolerance for FFT networks

[...]

Sying-Jyan Wang¹, Niraj K. Jha²•Institutions (2)

National Chung Hsing University¹, Princeton University²

01 Jul 1994-IEEE Transactions on Computers

TL;DR: It is shown that the new approach maintains the high throughput of previous schemes, yet needs lower hardware overhead and achieves higher fault converge than previous schemes by J.Y. Jou and D.I. Tao.

...read moreread less

Abstract: Algorithm-based fault tolerance (ABFT) is a low-overhead system-level fault tolerance technique. Many ABFT schemes have been proposed in the past for fast Fourier transform (FFT) networks. In this paper, a new ABFT scheme for FFT networks is proposed. We show that the new approach maintains the high throughput of previous schemes, yet needs lower hardware overhead and achieves higher fault converge than previous schemes by J.Y. Jou et al. (1988) and D.I. Tao et al. (1990). >

...read moreread less

Journal Article•DOI•

Fast hardware-based algorithms for elementary function computations using rectangular multipliers

[...]

Weng-Fai Wong¹, E. Gogo•Institutions (1)

National University of Singapore¹

01 Mar 1994-IEEE Transactions on Computers

TL;DR: These algorithms exploit microscopic parallelism using specialized hardware with heavy use of truncation based on detailed accuracy analysis for the computation of the common elementary functions, namely division, logarithm, reciprocal square root, arc tangent, sine and cosine.

...read moreread less

Abstract: As the name suggests, elementary functions play a vital role in scientific computations. Yet due to their inherent nature, they are a considerable computing task by themselves. Not surprisingly, since the dawn of computing, the goal of speeding up elementary function computation has been pursued. This paper describes new hardware based algorithms for the computation of the common elementary functions, namely division, logarithm, reciprocal square root, arc tangent, sine and cosine. These algorithms exploit microscopic parallelism using specialized hardware with heavy use of truncation based on detailed accuracy analysis. The contribution of this work lies in the fact that these algorithms are very fast and yet are accurate. If we let the time to perform an IEEE Standard 754 double precision floating point multiplication be /spl tau//sub /spl times//, our algorithms to achieve roughly 3.68/spl tau//sub /spl times//,4.56/spl tau//sub /spl times//, 5.25/spl tau//sub /spl times//, 3.69/spl tau//sub /spl times//, 7.06/spl tau//sub /spl times//, and 6.5/spl tau//sub /spl times//, for division, logarithm, square root, exponential, are tangent and complex exponential (sine and cosine) respectively. The trade-off is the need for tables and some specialized hardware. The total amount of tables required, however, is less than 128 Kbytes. We discuss the hardware, algorithmic and accuracy aspects of these algorithms. >

...read moreread less

Journal Article•DOI•

A nonblocking algorithm for shared queues using compare-and-swap

[...]

S. Prakash¹, Yann Hang Lee¹, Theodore Johnson¹•Institutions (1)

University of Florida¹

01 May 1994-IEEE Transactions on Computers

TL;DR: The authors present a simple and efficient nonblocking shared FIFO queue algorithm with O(n) system latency, no additional memory requirements, and enqueuing and dequeuing times independent of the size of the queue.

...read moreread less

Abstract: Nonblocking algorithms for concurrent objects guarantee that an object is always accessible, in contrast to blocking algorithms in which a slow or halted process can render part or all of the data structure inaccessible to other processes. A number of algorithms have been proposed for shared FIFO queues, but nonblocking implementations are few and either limit the concurrency or provide inefficient solutions. The authors present a simple and efficient nonblocking shared FIFO queue algorithm with O(n) system latency, no additional memory requirements, and enqueuing and dequeuing times independent of the size of the queue. They use the compare & swap operation as the basic synchronization primitive. They model their algorithm analytically and with a simulation, and compare its performance with that of a blocking FIFO queue. They find that the nonblocking queue has better performance if processors are occasionally slow, but worse performance if some processors are always slower than others. >

...read moreread less

Journal Article•DOI•

A systolic, linear-array multiplier for a class of right-shift algorithms

[...]

Peter Kornerup¹•Institutions (1)

Odense University¹

01 Aug 1994-IEEE Transactions on Computers

TL;DR: It is shown how the multiplier, with some simple back-end connections, can compute modular inverses and perform modular division for a power of two as modulus.

...read moreread less

Abstract: A very simple multiplier cell is developed for use in a linear, purely systolic array forming a digit-serial multiplier for unsigned or 2'complement operands. Each cell produces two digit-product terms and accumulates these into a previous sum of the same weight, developing the product least significant digit first. Grouping two terms per cell, the ratio of active elements to latches is low, and only upper bound [n]/2 cells are needed for a full n by n multiply. A module-multiplier is then developed by incorporating a Montgomery type of module-reduction. Two such multipliers interconnect to form a purely systolic module exponentiator, capable of performing RSA encryption at very high clock frequencies, but with a low gate count and small area. It is also shown how the multiplier, with some simple back-end connections, can compute modular inverses and perform modular division for a power of two as modulus. >

...read moreread less

Journal Article•DOI•

Diagnosability of enhanced hypercubes

[...]

Dajin Wang¹•Institutions (1)

Montclair State University¹

01 Sep 1994-IEEE Transactions on Computers

TL;DR: It is proved that in the aspect of diagnosability, enhanced hypercubes also achieve improvements in many measurements such as mean internode distance, diameter and traffic density.

...read moreread less

Abstract: An enhanced hypercube is obtained by adding 2/sup n-1/ more links to a regular hypercube of 2/sup n/ processors. It has been shown that enhanced hypercubes have very good improvements over regular hypercubes in many measurements such as mean internode distance, diameter and traffic density. This paper proves that in the aspect of diagnosability, enhanced hypercubes also achieve improvements. Two diagnosis strategies, both using the well-known PMC diagnostic model, are studied: the precise (one-step) strategy proposed by Preparata, Metze and Chien (1967), and the pessimistic strategy proposed by Friedman (1975). Under the precise strategy, the diagnosability is shown to be increased to n+1 in enhanced hypercubes. (In regular hypercubes, the diagnosability is n under this strategy). Under the pessimistic strategy, the diagnosability is shown to be increased to 2n. (In regular hypercubes, the diagnosability under this strategy is 2n-2). Since the failure probability of one node is fairly low nowadays, so that the increase of diagnosability by one or two will considerably enhance the system's self-diagnostic capability, and considering the fact that diagnosability does not "easily" increase as the links in networks do, these improvements are noticeable. >

...read moreread less

Journal Article•DOI•

Closed form solutions for bus and tree networks of processors load sharing a divisible job

[...]

Sameer Bataineh¹, T. Hsiung¹, Thomas G. Robertazzi¹•Institutions (1)

Jordan University of Science and Technology¹

01 Oct 1994-IEEE Transactions on Computers

TL;DR: The performance of large sym metric tree networks is examined by aggregating the component links and processors into a single single processor and closed form solutions for the minimum finish time and the optimal data allocation are obtained.

...read moreread less

Abstract: Optimal load allocation for load sharing a divisible job over processors interconnected in either a bus or a tree network is considered. The processors are either equipped with front-end processors or not so equipped. Closed form solutions for the minimum finish time and the optimal data allocation for each processor are obtained. The performance of large symmetric tree networks is examined by aggregating the component links and processors into a single equivalent processor. This allows an easy examination of large tree networks. In addition, it becomes possible to find a closed form solution for the optimal amount of data that is to be assigned to each processor in the tree network in order to achieve the minimum finish time. >

...read moreread less

Journal Article•DOI•

Diagonal and toroidal mesh networks

[...]

K.W. Tang¹, S.A. Padubidri¹•Institutions (1)

State University of New York System¹

01 Jul 1994-IEEE Transactions on Computers

TL;DR: It is shown that the diagonal mesh outperforms the toroidal mesh in all cases, and thus provides an attractive alternative to the toroid mesh network.

...read moreread less

Abstract: Diagonal and toroidal mesh are degree-4 point to point interconnection models suitable for connecting communication elements in parallel computers, particularly multicomputers. The two networks have a similar structure. The toroidal mesh is popular and well-studied whereas the diagonal mesh is relatively new. In this paper, we show that the diagonal mesh has a smaller diameter and a larger bisection width. It also retains advantages such as a simple rectangular structure, wirability and scalability of the toroidal mesh network. An optimal self-routing algorithm is developed for these networks. Using this algorithm and the existing routing algorithm for the toroidal mesh, we simulated and compare the performance of these two networks with N=35/spl times/71=2485, N=49/spl times/99=4851, and N=69/spl times/139=9591 nodes under a constant system with a fixed number of messages. Deflection routing is used to resolve conflicts. The effects of various deflection criteria are also investigated. We show that the diagonal mesh outperforms the toroidal mesh in all cases, and thus provides an attractive alternative to the toroidal mesh network. >

...read moreread less

Journal Article•DOI•

Embedding of rings and meshes onto faulty hypercubes using free dimensions

[...]

Pei-Ji Yang¹, Sing-Ban Tien², Cauligi S. Raghavendra³•Institutions (3)

University of Southern California¹, University of Illinois at Urbana–Champaign², Washington State University³

01 May 1994-IEEE Transactions on Computers

TL;DR: Tasks that require linear chain, ring, mesh, and torus structure are considered, which are quite useful in parallel and pipeline computations and based on a key concept called free dimension, which can be used to partition a cube into subcubes such that each subcube contains, at most, one faulty node.

...read moreread less

Abstract: Fault tolerance in hypercubes is achieved by exploiting inherent redundancy and executing tasks on faulty hypercubes. The authors consider tasks that require linear chain, ring, mesh, and torus structure, which are quite useful in parallel and pipeline computations. They assume the number of faults is on the order of the number of dimensions of the hypercube. The techniques are based on a key concept called free dimension, which can be used to partition a cube into subcubes such that each subcube contains, at most, one faulty node. Subgraphs are embedded in each subcube and then merged to form the entire graph. >

...read moreread less

Journal Article•DOI•

Very-high radix division with prescaling and selection by rounding

[...]

Milos D. Ercegovac¹, Tomás Lang², Paolo Montuschi³•Institutions (3)

California State University, Los Angeles¹, University of California, Irvine², Polytechnic University of Turin³

01 Aug 1994-IEEE Transactions on Computers

TL;DR: A division algorithm in which the quotient-digit selection is performed by rounding the shifted residual in carry-save form by finding several convenient values of the radix.

...read moreread less

Abstract: A division algorithm in which the quotient-digit selection is performed by rounding the shifted residual in carry-save form is presented. To allow the use of this simple function, the divisor (and dividend) is prescaled to a range close to one. The implementation presented results in a fast iteration because of the use of carry-save forms and suitable recodings. The execution time is calculated and several convenient values of the radix are selected. Comparison with other dividers for radices 2/sup 9/ to 2/sup 18/ is performed using the same assumptions. >

...read moreread less

Journal Article•DOI•

How to integrate precedence constraints and shared resources in real-time scheduling

[...]

M. Spuri, John A. Stankovic¹•Institutions (1)

University of Massachusetts Amherst¹

01 Dec 1994-IEEE Transactions on Computers

TL;DR: Formal results for precedence constrained, real-time scheduling of unit time tasks are extended to arbitrary timed tasks with preemption and an exact characterisation of the EDF-like schedulers that can be used to transparently enforce precedence constraints among tasks is shown.

...read moreread less

Abstract: Formal results for precedence constrained, real-time scheduling of unit time tasks are extended to arbitrary timed tasks with preemption. An exact characterisation of the EDF-like schedulers that can be used to transparently enforce precedence constraints among tasks is shown. These extended results are then integrated with a well-known protocol that handles real-time scheduling of tasks with shared resources, but does not consider precedence constraints. This results in schedulability formulas for task sets which allow preemption, shared resources, and precedence constraints, and a practical algorithm for many real-time uniprocessor systems. >

...read moreread less

Journal Article•DOI•

Design of CAECC - cellular automata based error correcting code

[...]

Dipanwita Roy Chowdhury¹, S. Basu², Indranil Sen Gupta¹, P. Pal Chaudhuri¹•Institutions (2)

Indian Institute of Technology Kharagpur¹, Courant Institute of Mathematical Sciences²

01 Jun 1994-IEEE Transactions on Computers

TL;DR: A new scheme for designing error detecting and error correcting codes around cellular automata (CA) is reported and a CA-based hardware scheme for very fast decoding (and correcting) of the codewords is also reported.

...read moreread less

Abstract: A new scheme for designing error detecting and error correcting codes around cellular automata (CA) is reported. A simple and efficient scheme for generating SEC-DED codes is presented which can also be extended for generating codes with higher distances. A CA-based hardware scheme for very fast decoding (and correcting) of the codewords is also reported. >

...read moreread less

Journal Article•DOI•

Maximal and near-maximal shift register sequences: efficient event counters and easy discrete logarithms

[...]

Douglas W. Clark, Lih-Jyh Weng

01 May 1994-IEEE Transactions on Computers

TL;DR: For some sizes of shift register, the maximal-length LFSR implementation requires more than a single gate, and for some, the discrete logarithm calculation is hard, the paper proposes the use of certain one-gate L FSR's whose sequence lengths are nearly maximal, and which support easy discreteLogarithms.

...read moreread less

Abstract: A linear feedback shift register, or LFSR, can implement an event counter by shifting whenever an event occurs. A single two-input exclusive-OR gate is often the only additional hardware necessary to allow a shift register to generate, by successive shifts, all of its possible nonzero values. The counting application requires that the number of shifts be recoverable from the LFSR contents so that further processing and analysis may be done. Recovering this number from the shift register value corresponds to a problem from number theory and cryptography known as the discrete logarithm. For some sizes of shift register, the maximal-length LFSR implementation requires more than a single gate, and for some, the discrete logarithm calculation is hard. The paper proposes for such size the use of certain one-gate LFSR's whose sequence lengths are nearly maximal, and which support easy discrete logarithms. These LFSR's have a concise mathematical characterization, and are quite common. The paper concludes by describing an application of these ideas in a computer hardware monitor, and by presenting a table that describes efficient LFSR's of size up to 64 bits. >

...read moreread less

Journal Article•DOI•

A systolic power-sum circuit for GF(2/sup m/)

[...]

Shyue-Win Wei

01 Feb 1994-IEEE Transactions on Computers

TL;DR: The paper shows that a decoder implemented using the new power-sum circuit will have less complex circuitry and shorter decoding delay than one implemented using conventional product-sum multipliers.

...read moreread less

Abstract: A systolic power-sum circuit designed to perform AB/sup 2/+C computations in the finite field GF(2/sup m/) is presented, where A, B, and C are arbitrary elements of GF(2/sup m/). This new circuit is constructed by m/sup 2/ identical cells, each of which consists of three 2-input AND logical gates, one 2-input XOR gate, one 3-input XOR gate, and ten latches. The AB/sup 2/+C computation is required in decoding many error-correcting codes. The paper shows that a decoder implemented using the new power-sum circuit will have less complex circuitry and shorter decoding delay than one implemented using conventional product-sum multipliers. >

...read moreread less

Journal Article•DOI•

The Chaos router

[...]

S. Konstantinidou¹, Lawrence Snyder²•Institutions (2)

IBM¹, University of Washington²

01 Dec 1994-IEEE Transactions on Computers

TL;DR: The Chaos router, a randomizing, nonminimal adaptive packet router is introduced, it is shown to be deadlock free and probabilistically livelock free, and performance results are presented for a variety of work loads.

...read moreread less

Abstract: The Chaos router, a randomizing, nonminimal adaptive packet router is introduced. Adaptive routers allow messages to dynamically select paths, depending on network traffic, and bypass congested nodes. This flexibility contrasts with oblivious packet routers where the path of a packet is statically determined at the source node. A key advancement of the Chaos router over previous nonminimal routers is the use of randomization to eliminate the need for livelock protection. This simplifies adaptive routing to be of approximately the same complexity along the critical decision path as an oblivious router. The primary cost is that the Chaos router is probabilistically livelock free rather than being deterministically livelock free, but evidence is presented implying that these are equivalent in practice. The principal advantage is excellent performance for nonuniform traffic patterns. The Chaos router is described, it is shown to be deadlock free and probabilistically livelock free, and performance results are presented for a variety of work loads. >

...read moreread less

Journal Article•DOI•

Performance analysis of finite buffered multistage interconnection networks

[...]

Youngsong Mun, Hee Yong Youn

01 Feb 1994-IEEE Transactions on Computers

TL;DR: The authors present a model which can accurately evaluate the performance of single-buffered and multibuffered MINs (multistage interconnection networks) with 2*2 switching elements (SESs) and reveal that the proposed models are consistently much more accurate irrespective of the size of the network, buffer, and traffic condition.

...read moreread less

Abstract: Multistage interconnection networks (MIN's) have a number of applications in the areas of computer and communication. While several analytical models have been proposed for the performance evaluation of MIN's, they are either not very accurate or too complex to be generalized. The authors propose a new model for evaluating multibuffered MIN's with 2/spl times/2 switching elements. It effectively and realistically models the correlation of packet movements between two adjacent stages as well as subsequent network cycles. As a result, the proposed model is very accurate for any size and traffic conditions of MIN's. It is also simple and can be easily generalized. >

...read moreread less

Collapse