Showing papers in &quot;IEEE Transactions on Computers in 1985&quot;

“Hot spot” contention and combining in multistage interconnection networks

TL;DR: In this article, the authors presented a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer, and proved that a fat-tree of a given size is nearly the best routing network of that size.

...read moreread less

Abstract: The author presents a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer. A fat-tree routing network is parameterized not only in the number of processors, but also in the amount of simultaneous communication it can support. Since communication can be scaled independently from the number of processors, substantial hardware can be saved for such applications as finite-element analysis without resorting to a special-purpose architecture. It is proved that a fat-tree of a given size is nearly the best routing network of that size. This universality theorem is established using a three-dimensional VLSI model that incorporates wiring as a direct cost. In this model, hardware size is measured as physical volume. It is proved that for any given amount of communications hardware, a fat-tree built from that amount of hardware can stimulate every other network built from the same amount of hardware, using only slightly more time (a polylogarithmic factor greater).

...read moreread less

1,147 citations

Journal Article•DOI•

[...]

G. F. Pfister¹, V. A. Norton¹•Institutions (1)

IBM¹

Decrypting a Class of Stream Ciphers Using Ciphertext Only

TL;DR: The technique of message combining was found to be an effective means of eliminating this problem if it arises due to lock or synchronization contention, severely degrading all memory access, not just access to shared lock locations, due to an effect the authors call tree saturation.

...read moreread less

Abstract: The combining of messages within a multistage switching network has been proposed to reduce memory contention in highly parallel shared-memory multiprocessors, especially for shared lock and synchronization data. A quantitative investigation of the performance impact of such contention and the effectiveness of combining in reducing this impact is reported. The effect of a nonuniform traffic pattern consisting of a single hot spot of higher access rate superimposed on a background of uniform traffic was investigated. The potential degradation due to even moderate hot spot traffic was found to be very significant, severely degrading all memory access, not just access to shared lock locations, due to an effect the authors call tree saturation. The technique of message combining was found to be an effective means of eliminating this problem if it arises due to lock or synchronization contention.

...read moreread less

610 citations

Journal Article•DOI•

[...]

Siegenthaler

Load Sharing in Distributed Systems

TL;DR: The conclusion from the analysis is that the pseudonoise generator's output sequence and the sequences generated by the linear feedback shift registers should be uncorrelated, which leads to constraints for the nonlinear combining function to be used.

...read moreread less

Abstract: Pseudonoise sequences generated by linear feedback shift registers [1] with some nonlinear combining function have been proposed [2]–[5] for cryptographic applications as running key generators in stream ciphers. In this correspondence it will be shown that the number of trials to break these ciphers can be significantly reduced by using correlation methods. By comparison of computer simulations and theoretical results based on a statistical model, the validity of this analysis is demonstrated. Rubin [6] has shown that it is computationally feasible to solve a cipher proposed by Pless [2] in a known plaintext attack, using as few as 15 characters. Here, the number of ciphertext symbols is determined to perform a ciphertext-only attack on the Pless cipher using the correlation attack. Our conclusion from the analysis is that the pseudonoise generator's output sequence and the sequences generated by the linear feedback shift registers should be uncorrelated. This leads to constraints for the nonlinear combining function to be used.

...read moreread less

547 citations

Journal Article•DOI•

[...]

Yung-Terng Wang¹, Morris²•Institutions (2)

Bell Labs¹, AT&T²

01 Mar 1985-IEEE Transactions on Computers

TL;DR: A taxonomy of load sharing algorithms is proposed that draws a basic dichotomy between source- initiative and server-initiative approaches and a performance metric called the Q-factor (quality of load share) is defined which summarizes both overall efficiency and fairness of an algorithm.

...read moreread less

Abstract: An important part of a distributed system design is the choice of a load sharing or global scheduling strategy. A comprehensive literature survey on this topic is presented. We propose a taxonomy of load sharing algorithms that draws a basic dichotomy between source-initiative and server-initiative approaches. The taxonomy enables ten representative algorithms to be selected for performance evaluation. A performance metric called the Q-factor (quality of load sharing) is defined which summarizes both overall efficiency and fairness of an algorithm and allows algorithms to be ranked by performance. We then evaluate the algorithms using both mathematical and simulation techniques. The results of the study show that: i) the choice of load sharing algorithm is a critical design decision; ii) for the same level of scheduling information exchange, server-initiative has the potential of outperforming source-initiative algorithms (whether this potential is realized depends on factors such as communication overhead); iii) the Q-factor is a useful yardstick; iv) some algorithms, which have previously received little attention, e.g., multiserver cyclic service, may provide effective solutions.

...read moreread less

507 citations

Journal Article•DOI•

Tight Bounds on the Complexity of Parallel Sorting

[...]

Tom Leighton¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Apr 1985-IEEE Transactions on Computers

TL;DR: Tight upper and lower bounds are proved on the number of processors, information transfer, wire area, and time needed to sort N numbers in a bounded-degree fixed-connection network.

...read moreread less

Abstract: In this paper, we prove tight upper and lower bounds on the number of processors, information transfer, wire area, and time needed to sort N numbers in a bounded-degree fixed-connection network. Our most important new results are: 1) the construction of an N-node degree-3 network capable of sorting N numbers in O(log N) word steps; 2) a proof that any network capable of sorting N (7 log N)-bit numbers in T bit steps requires area A where AT2 = ?(N2 log2 N); and 3) the construction of a ``small-constant-factor'' bounded-degree network that sorts N ?(log N)-bit numbers in T = ?(log N) bit steps with A = ?(N2) area.

...read moreread less

395 citations

Journal Article•DOI•

VLSI Architectures for Computing Multiplications and Inverses in GF(2 m )

[...]

Wang¹, Troung¹, Shao¹, Omura², Reed³ - Show less +1 more•Institutions (3)

California Institute of Technology¹, University of California, Los Angeles², University of Southern California³

01 Aug 1985-IEEE Transactions on Computers

TL;DR: In this article, a pipeline structure is developed to realize the Massey-Omura multiplier in the finite field GF(2m) with the simple squaring property of the normal basis representation used together with this multiplier.

...read moreread less

Abstract: Finite field arithmetic logic is central in the implementation of Reed-Solomon coders and in some cryptographic algorithms. There is a need for good multiplication and inversion algorithms that can be easily realized on VLSI chips. Massey and Omura [1] recently developed a new multiplication algorithm for Galois fields based on a normal basis representation. In this paper, a pipeline structure is developed to realize the Massey-Omura multiplier in the finite field GF(2m). With the simple squaring property of the normal basis representation used together with this multiplier, a pipeline architecture is also developed for computing inverse elements in GF(2m). The designs developed for the Massey-Omura multiplier and the computation of inverse elements are regular, simple, expandable, and therefore, naturally suitable for VLSI implementation.

...read moreread less

373 citations

Journal Article•DOI•

A Graph Matching Approach to Optimal Task Assignment in Distributed Computing Systems Using a Minimax Criterion

[...]

Chien-Chung Shen¹, Wen-Hsiang Tsai¹•Institutions (1)

National Chiao Tung University¹

01 Mar 1985-IEEE Transactions on Computers

TL;DR: A graph matching approach for solving the task assignment problem encountered in distributed computing systems with a cost function defined in terms of a single unit, time, and a new optimization criterion, called the minimax criterion, based on which both minimization of interprocessor communication and balance of processor loading can be achieved.

...read moreread less

Abstract: A graph matching approach is proposed in this paper for solving the task assignment problem encountered in distributed computing systems. A cost function defined in terms of a single unit, time, is proposed for evaluating the effectiveness of task assignment. This cost function represents the maximum time for a task to complete module execution and communication in all the processors. A new optimization criterion, called the minimax criterion, is also proposed, based on which both minimization of interprocessor communication and balance of processor loading can be achieved. The proposed approach allows various system constraints to be included for consideration. With the proposed cost function and the minimax criterion, optimal task assignment is defined. Graphs are then used to represent the module relationship of a given task and the processor structure of a distributed computing system. Module assignment to system processors is transformed into a type of graph matching, called weak homomorphism. The search of optimal weak homomorphism corresponding to optimal task assignment is next formulated as a state-space search problem. It is then solved by the well-known A* algorithm in artificial intelligence after proper heuristic information for speeding up the search is suggested. An illustrative example and some experimental results are also included to show the effectiveness of the heuristic search.

...read moreread less

358 citations

Journal Article•DOI•

High-Speed VLSI Multiplication Algorithm with a Redundant Binary Addition Tree

[...]

Takagi¹, Yasuura¹, Yajima¹•Institutions (1)

Kyoto University¹

Wafer-Scale Integration of Systolic Arrays

TL;DR: Since the multiplier has a regular cellular array structure similar to an array multiplier, it is suitable for VLSI implementation and is excellent in both computation speed and regularity in layout.

...read moreread less

Abstract: A high-speed VLSI multiplication algorithm internally using redundant binary representation is proposed. In n bit binary integer multiplication, n partial products are first generated and then added up pairwise by means of a binary tree of redundant binary adders. Since parallel addition of two n-digit redundant binary numbers can be performed in a constant time independent of n without carry propagation, n bit multiplication can be performed in a time proportional to log2 n. The computation time is almost the same as that by a multiplier with a Wallace tree, in which three partial products will be converted into two, in contrast to our two-to-one conversion, and is much shorter than that by an array multiplier for longer operands. The number of computation elements of an n bit multiplier based on the algorithm is proportional to n2. It is almost the same as those of conventional ones. Furthermore, since the multiplier has a regular cellular array structure similar to an array multiplier, it is suitable for VLSI implementation. Thus, the multiplier is excellent in both computation speed and regularity in layout. It can be implemented on a VLSI chip with an area proportional to n2 log2 n. The algorithm can be directly applied to both unsigned and 2's complement binary integer multiplication.

...read moreread less

344 citations

Journal Article•DOI•

[...]

Leighton¹, Leiserson•Institutions (1)

Massachusetts Institute of Technology¹

01 May 1985-IEEE Transactions on Computers

TL;DR: Although the underlying network problems are NP-complete, it is proved that the procedures are reliable by assuming a probabilistic model of cell failure, thus minimizing the communication time between cells.

...read moreread less

Abstract: VLSI technologists are fast developing wafer-scale integration. Rather than partitioning a silicon wafer into chips as is usually done, the idea behind wafer-scale integration is to assemble an entire system (or network of chips) on a single wafer, thus avoiding the costs and performance loss associated with individual packaging of chips. A major problem with assembling a large system of microprocessors on a single wafer, however, is that some of the processors, or cells, on the wafer are likely to be defective. In the paper, we describe practical procedures for integrating "around" such faults. The procedures are designed to minimize the length of the longest wire in the system, thus minimizing the communication time between cells. Although the underlying network problems are NP-complete, we prove that the procedures are reliable by assuming a probabilistic model of cell failure. We also discuss applications of the work to problems in VLSI layout theory, graph theory, fault-tolerant systems, planar geometry, and the probabilistic analysis of algorithms.

...read moreread less

268 citations

Journal Article•DOI•

A VLSI Design of a Pipeline Reed-Solomon Decoder

[...]

Shao¹, Truong¹, Yuen¹, Reed²•Institutions (2)

California Institute of Technology¹, University of Southern California²

01 May 1985-IEEE Transactions on Computers

TL;DR: A pipeline structure of a transform decoder similar to a systolic array is developed to decode Reed-Solomon (RS) codes, using a modified Euclidean algorithm for computing the error-locator polynomial.

...read moreread less

Abstract: A pipeline structure of a transform decoder similar to a systolic array is developed to decode Reed-Solomon (RS) codes. An important ingredient of this design is a modified Euclidean algorithm for computing the error-locator polynomial. The computation of inverse field elements is completely avoided in this modification of Euclid's algorithm. The new decoder is regular and simple, and naturally suitable for VLSI implementation. An example illustrating both the pipeline and systolic array aspects of this decoder structure is given for a (15,9) RS code.

...read moreread less

247 citations

Journal Article•DOI•

The Design of Optimal Systolic Arrays

[...]

Guo-Jie Li¹, Wah•Institutions (1)

Purdue University¹

An Optimal Algorithm for Assigning Cryptographic Keys to Control Access in a Hierarchy

TL;DR: In this paper, a methodology to systematically search and reduce this space and to obtain the optimal design is proposed, including matrix multiplication, finite impulse response filtering, deconvolution, and triangular matrix inversion.

...read moreread less

Abstract: Conventional design of systolic arrays is based on the mapping of an algorithm onto an interconnection of processing elements in a VLSI chip. This mapping is done in an ad hoc manner, and the resulting configuration usually represents a feasible but suboptimal design. In this paper, systolic arrays are characterized by three classes of parameters: the velocities of data flows, the spatial distributions of data, and the periods of computation. By relating these parameters in constraint equations that govern the correctness of the design, the design is formulated into an optimization problem. The size of the search space is a polynomial of the problem size, and a methodology to systematically search and reduce this space and to obtain the optimal design is proposed. Some examples of applying the method, including matrix multiplication, finite impulse response filtering, deconvolution, and triangular-matrix inversion, are given.

...read moreread less

Journal Article•DOI•

[...]

Mackinnon¹, Taylor¹, Meijer¹, Akl¹•Institutions (1)

Queen's University¹

The power of parallel prefix

TL;DR: A cryptographic scheme for controlling access to information within a group of users organized in a hierarchy was proposed in [1].

...read moreread less

Abstract: A cryptographic scheme for controlling access to information within a group of users organized in a hierarchy was proposed in [1]. The scheme enables a user at some level to compute from his own cryptographic key the keys of the users below him in the organization.

...read moreread less

Journal Article•DOI•

On Symmetry Detection

[...]

Atallah¹•Institutions (1)

Purdue University¹

01 Jul 1985-IEEE Transactions on Computers

TL;DR: The purpose of this correspondence is to describe an O( n log n) time algorithm for enumerating all the axes of symmetry of a planar figure which is made up of segments, circles, points, etc.

...read moreread less

Abstract: A straight line is an axis ofsymmetry of a planar figure if the figure is invariant to reflection with respect to that line The purpose of this correspondence is to describe an O( n log n) time algorithm for enumerating all the axes of symmetry of a planar figure which is made up of (possibly intersecting) segments, circles, points, etc The solution involves a reduction of the problem to a combinatorial question on words Our algorithm is optimal since we can establish an Ω(n log n) time lower bound for this problem

...read moreread less

Journal Article•DOI•

[...]

Clyde P. Kruskal¹, Larry Rudolph², Marc Snir³•Institutions (3)

University of Illinois at Urbana–Champaign¹, Carnegie Mellon University², Hebrew University of Jerusalem³

Evaluation of a flexible task scheduling algorithm for distributed hard real-time systems

TL;DR: This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model) to solve the prefix computation problem, when the order of the elements is specified by a linked list.

...read moreread less

Abstract: The prefix computation problem is to compute all n initial products a1* . . . *a1,i=1, . . ., n of a set of n elements, where * is an associative operation. An O(((logn) log(2n/p))XI(n/p)) time deterministic parallel algorithm using p≤n processors is presented to solve the prefix computation problem, when the order of the elements is specified by a linked list. For p≤O(n1-e)(e〉0 any constant), this algorithm achieves linear speedup. Such optimal speedup was previously achieved only by probabilistic algorithms. This study assumes the weakest PRAM model, where shared memory locations can only be exclusively read or written (the EREW model).

...read moreread less

Journal Article•DOI•

[...]

John A. Stankovic, Krithi Ramamritham, S. Cheng

01 Dec 1985-IEEE Transactions on Computers

TL;DR: The authors present a scheduling algorithm which works dynamically and on loosely coupled distributed systems for tasks with hard real-time constraints; i.e., the tasks must meet their deadlines.

...read moreread less

Abstract: Most systems which are required to operate under severe real-time constraints assume that all tasks and their characteristics are known a priori. Scheduling of such tasks can be done statistically. Further, scheduling algorithms operating under such conditions are usually limited to multiprocessor configurations. The authors present a scheduling algorithm which works dynamically and on loosely coupled distributed systems for tasks with hard real-time constraints; i.e., the tasks must meet their deadlines. It uses a scheduling component local to every node and a distributed scheduling scheme which is specifically suited to hard real-time constraints and other timing considerations. Periodic tasks, nonperiodic tasks, scheduling overheads, communication overheads due to scheduling and preemption are all accounted for in the algorithm. Simulation studies are used to evaluate the performance of the algorithm.

...read moreread less

Journal Article•DOI•

Systematic Unidirectional Error-Detecting Codes

[...]

Bose¹, Der Jei Lin¹•Institutions (1)

Oregon State University¹

01 Nov 1985-IEEE Transactions on Computers

TL;DR: The theory and design of systematic t-unidirectional error-detecting codes are developed and optimal systematic codes capable of detecting 2, 3, and 6 uniddirectional errors using 2,3, and 4 check bits are given.

...read moreread less

Abstract: The theory and design of systematic t-unidirectional error-detecting codes are developed. Optimal systematic codes capable of detecting 2, 3, and 6 unidirectional errors using 2, 3, and 4 check bits, respectively, are given. For r ≥5 where r is the number of check bits, the systematic codes described here can detect up to 5· 2r-4 + r -4 unidirectional errors. Encoding/ decoding methods for these codes are also investigated.

...read moreread less

Journal Article•DOI•

Measures of Presortedness and Optimal Sorting Algorithms

[...]

Heikki Mannila¹•Institutions (1)

University of Helsinki¹

01 Apr 1985-IEEE Transactions on Computers

TL;DR: The concept of presortedness and its use in sorting is studied, and a new insertion sort algorithm is shown to be optimal with respect to three natural measures.

...read moreread less

Abstract: The concept of presortedness and its use in sorting are studied. Natural ways to measure presortedness are given and some general properties necessary for a measure are proposed. A concept of a sorting algorithm optimal with respect to a measure of presortedness is defined, and examples of such algorithms are given. A new insertion sort algorithm is shown to be optimal with respect to three natural measures. The problem of finding an optimal algorithm for an arbitrary measure is studied, and partial results are proven.

...read moreread less

Journal Article•DOI•

Connectivity of Regular Directed Graphs with Small Diameters

[...]

Imase, Soneoka, Okada

01 Mar 1985-IEEE Transactions on Computers

TL;DR: This paper clarifies the relation between the diameter k and the edge or node connectivity Ce or c,, of digraphs by derived the following two inequalities: where n is the number of nodes, d is the maximum degree, andd is the minimum degree.

...read moreread less

Abstract: This paper clarifies the relation between the diameter k and the edge or node connectivity Ce or c,, of digraphs. The following two inequalities are derived: where n is the number of nodes, d is the maximum degree, and d is the minimum degree.

...read moreread less

Journal Article•DOI•

Efficient Implementations of the Chinese Remainder Theorem for Sign Detection and Residue Decoding

[...]

Thu V. Vu¹•Institutions (1)

Harris Corporation¹

01 Jul 1985-IEEE Transactions on Computers

TL;DR: Two conversion techniques based on the Chinese remainder theorem are developed for use in residue number systems and are preferable for the full conversion from residues to unsigned or 2's complement integers.

...read moreread less

Abstract: Two conversion techniques based on the Chinese remainder theorem are developed for use in residue number systems. The new implementations are fast and simple mainly because adders modulo a large and arbitrary integer M are effectively replaced by binary adders and possibly a lookup table of small address space. Although different in form, both techniques share the same principle that an appropriate representation of the summands must be employed in order to evaluate a sum modulo M efficiently. The first technique reduces the sum modulo M in the conversion formula to a sum modulo 2 through the use of fractional representation, which also exposes the sign bit of numbers. Thus, this technique is particularly useful for sign detection and for any operation requiring a comparison with a binary fraction of M. The other technique is preferable for the full conversion from residues to unsigned or 2's complement integers. By expressing the summands in terms of quotients and remainders with respect to a properly chosen divisor, the second technique systematically replaces the sum modulo M by two binary sums, one accumulating the quotients modulo a power of 2 and the other accumulating the remainders the ordinary way. A final recombination step is required but is easily implemented with a small lookup table and binary adders.

...read moreread less

Journal Article•DOI•

Fault-Tolerant Routing in DeBruijn Comrnunication Networks

[...]

Esfahanian¹, Hakimi²•Institutions (2)

Michigan State University¹, Northwestern University²

An Application of Bayesian Decision Theory to Decentralized Control of Job Scheduling

TL;DR: This paper shows that the node-connectivity of SRG is (2r - 2) and presents routing methods for situations with a certain number of node failures, and the routing algorithms are shown to be computationally efficient.

...read moreread less

Abstract: A class of communication networks which is suitable for "multiple processor systems" was studied by Pradhan and Reddy. The underlying graph (to be called Shift and Replace graph or SRG) is based on DeBruijn digraphs and is a function of two parameters r and m. Pradhan and Reddy have shown that the node-connectivity of SRG is at least r. The same authors give a routing algorithm which generally requires 2m hops if the number of node failures is ≤(r -1). In this paper we show that the node-connectivity of SRG is (2r - 2). This would immediately imply that the system can tolerate up to (2r - 3) node failures. We then present routing methods for situations with a certain number of node failures. When this number is ≤(r - 2) our routing algorithm requires at most m + 3 + logr m hops if 3 + logr m ≤m. When the number of node failures is ≤(2r - 3) our routing algorithm requires at most m + 5 + logr m hops if 4 + logr m ≤ m. In all the other situations our routing algorithm requires no more than 2m hops. The routing algorithms are shown to be computationally efficient.

...read moreread less

Journal Article•DOI•

[...]

Stankovic¹•Institutions (1)

University of Massachusetts Amherst¹

01 Feb 1985-IEEE Transactions on Computers

TL;DR: A heuristic for the effective cooperation of multiple decentralized components of a job scheduling function that can dynamically adapt to the quality of the state information being processed and is based on Bayesian decision theory.

...read moreread less

Abstract: There is a wide spectrum of techniques that can be aptly named decentralized control. However, certain functions in distributed operating systems, e.g., scheduling, operate under such demanding requirements that no known optimal control solutions exist. It has been shown that heuristics are necessary. This paper presents a heuristic for the effective cooperation of multiple decentralized components of a job scheduling function. An especially useful feature of the heuristic is that it can dynamically adapt to the quality of the state information being processed. Extensive simulation results show the utility of this heuristic. The simulation results are compared to several analytical models and a baseline simulation model. The heuristic itself is based on the application of Bayesian decision theory. Bayesian decision theory was used because its principles can be applied as a systematic approach to complex decision making under conditions of imperfect knowledge, and it can run relatively cheaply in real time.

...read moreread less

Journal Article•DOI•

A semi-Markov model for the performance of multiple-bus systems

[...]

Trevor Mudge¹, Humoud B. Al-Sadoun²•Institutions (2)

University of Michigan¹, Kuwait University²

An Approximation Algorithm for Diagnostic Test Scheduling in Multicomputer Systems

TL;DR: A discrete-time model is presented of memory interference in multiprocessor systems using multiple-bus interconnection networks that differs from earlier models in its ability to model variable connection time and arbitrary inter-request time.

...read moreread less

Abstract: A discrete-time model is presented of memory interference in multiprocessor systems using multiple-bus interconnection networks. It differs from earlier models in its ability to model variable connection time and arbitrary inter-request time. The model describes each processing element's behavior by means of a semi-Markov process, taking as input the number of processing elements, the number of memory modules, the number of buses, the mean think time of the processing elements, and the first and second moments of the connection time between processing elements and memories. The model produces as output the memory bandwidth, processing element utilization, memory module utilization, average queue length at a memory, and average waiting time experienced by a processing element while waiting to access a memory. Using the model, it is possible to analyze the interaction of the input parameters on the system performance without using a complex Markov chain; a four-state semi-Markov process is sufficient regardless of the think and connection time distributions. The accuracy and capability of the model are illustrated.

...read moreread less

Journal Article•DOI•

[...]

Krawczyk¹, Kubale•Institutions (1)

University of Gdańsk¹

Reliable Loop Topologies for Large Local Computer Networks

TL;DR: This correspondence shows that the DTS problem is NP-complete, and presents a longest, first sequential scheduling algorithm which runs in worst case time O(dm log n) and uses O(m) space to produce a solution of length less than four times optimal.

...read moreread less

Abstract: The problem of diagnostic test scheduling (DTS) is to assign to each edge e of a diagnostic graph G a time interval of length l(e) so that intervals corresponding to edges at any given vertex do not overlap and the overall finishing time is minimum. In this correspondence we show that the DTS problem is NP-complete. Then we present a longest, first sequential scheduling algorithm which runs in worst case time O(dm log n) and uses O(m) space to produce a solution of length less than four times optimal. Then we show that the general performance bound can be strengthened to 3 * OPT(G) for low-degree graphs and to 2 ·OPT(G) in some special cases of binomial diagnostic graphs.

...read moreread less

Journal Article•DOI•

[...]

Raghavendra¹, Gerla, Avizienis•Institutions (1)

University of Southern California¹

Synchronizing Large VLSI Processor Arrays

TL;DR: A highly reliable and efficient double-loop network architecture that is based on forward loop backward hop topology, with a loop in the forward direction connecting all the neighboring nodes, and a backward loop connecting nodes that are separated by a distance.

...read moreread less

Abstract: Single-loop networks tend to become unreliable when the number of nodes in the network becomes large. Reliability can be improved using double loops. In this paper a highly reliable and efficient double-loop network architecture is proposed and analyzed. This network is based on forward loop backward hop topology, with a loop in the forward direction connecting all the neighboring nodes, and a backward loop connecting nodes that are separated by a distance ⌊√N⌋where N is the number of nodes in the network. It is shown that this topology is optimal, among this class of double-loop networks, in terms of diameter, average hop distance, processing overhead, delay, throughput, and reliability. The paper includes derivation of closed form expressions for diameter and average hop distance, throughput, and number of distinct routes between two farthest nodes. For fault-tolerance study, the effect of node and link failures on the performance of the network is analyzed. A simple distributed routing algorithm for reliable loop network operation is also presented.

...read moreread less

Journal Article•DOI•

[...]

Fisher¹, Kung¹•Institutions (1)

Carnegie Mellon University¹

01 Aug 1985-IEEE Transactions on Computers

TL;DR: This paper provides a spectrum of synchronization models; based on the assumptions made for each model, theoretical lower bounds on clock skew are derived, and appropriate or best possible synchronization schemes for large processor arrays are proposed.

...read moreread less

Abstract: Highly parallel VLSI computing structures consist of many processing elements operating simultaneously. In order for such processing elements to communicate among themselves, some provision must be made for synchronization of data transfer. The simplest means of synchronization is the use of a global clock. Unfortunately, large clocked systems can be difficult to implement because of the inevitable problem of clock skews and delays, which can be especially acute in VLSI systems as feature sizes shrink. For the near term, good engineering and technology improvements can be expected to maintain the feasibility of clocking in such systems; however, clock distribution problems crop up in any technology as systems grow. An alternative means of enforcing necessary synchronization is the use of self-timed asynchronous schemes, at the cost of increased design complexity and hardware cost. Realizing that different circumstances call for different synchronization methods, this paper provides a spectrum of synchronization models; based on the assumptions made for each model, theoretical lower bounds on clock skew are derived, and appropriate or best possible synchronization schemes for large processor arrays are proposed.

...read moreread less

Journal Article•DOI•

Bandwidth availability of multiple-bus multiprocessors

[...]

Chita R. Das¹, Laxmi N. Bhuyan¹•Institutions (1)

University of Louisiana at Lafayette¹

A Practical Approach to Fault Simulation and Test Generation for Bridging Faults

TL;DR: The effect of failures on the performance of multiple-bus multiprocessors is considered and mathematical models are developed to compute the reliability and the performance-related bandwidth availability.

...read moreread less

Abstract: The effect of failures on the performance of multiple-bus multiprocessors is considered. Bandwidth expressions for this architecture are derived for uniform and nonuniform memory references. Mathematical models are developed to compute the reliability and the performance-related bandwidth availability (BA). The results obtained for the multiple-bus interconnection are compared with those of a crossbar. The models are also extended to analyze the partial bus structure, where the memories are divided into groups and each group is connected to a subset of buses. The reliability and the BA of the multiple-bus and partial bus architectures are compared.

...read moreread less

Journal Article•DOI•

[...]

Abramovici¹, Menon•Institutions (1)

AT&T¹

01 Jul 1985-IEEE Transactions on Computers

TL;DR: This approach is based on extending fault simulation and test generation for stuck faults to cover bridging faults as well, and shows that adequate bridging fault coverage can be obtained in most cases without using sequences of vectors.

...read moreread less

Abstract: In this correspondence we prepent a practical approach to fault simulation and test generation for bridging faults in combinational circuits. Unlike previous work, we consider Unrestricted bridging faults, including those that introduce feedback. Our approach is based on extending fault simulation and test generation for stuck faults to cover bridging faults as well. We consider combinational testing only, and show that adequate bridging fault coverage can be obtained in most cases without using sequences of vectors.

...read moreread less

Journal Article•DOI•

On the effective bandwidth of interleaved memories in vector processor systems

[...]

Wilfried Oed, Otto Lange

Data Coherence Problem in a Multicache System

TL;DR: In this paper, the authors present some analytical results for the calculation of the resulting effect bandwidth for one and two access streams to a memory system in a vector processor in a Cray X-MP and corresponding simulations are presented.

...read moreread less

Abstract: Memory interleaving and multiple access ports are the key to a high memory bandwidth in vector processor systems. Each of the active ports supports an independent access stream to memory among which access conflicts may arise. Such conflicts lead to a decrease in memory bandwidth. The authors present some analytical results for the calculation of the resulting effect bandwidth for one and two access streams to a memory system in a vector processor. In particular, conditions for conflict-free access are given together with some conflicting cases that should be avoided. Finally, examples of measurements on a Cray X-MP and corresponding simulations are presented.

...read moreread less

Journal Article•DOI•

[...]

Yen¹, King-Sun Fu²•Institutions (2)

Hewlett-Packard¹, Purdue University²