# Showing papers in "IEEE Transactions on Computers in 1983"

••

TL;DR: In this article, the authors propose an approach based on characterizing the position and orientation of an object as a single point in a configuration space, in which each coordinate represents a degree of freedom in the position or orientation of the object.

Abstract: This paper presents algorithms for computing constraints on the position of an object due to the presence of ther objects. This problem arises in applications that require choosing how to arrange or how to move objects without collisions. The approach presented here is based on characterizing the position and orientation of an object as a single point in a configuration space, in which each coordinate represents a degree of freedom in the position or orientation of the object. The configurations forbidden to this object, due to the presence of other objects, can then be characterized as regions in the configuration space, called configuration space obstacles. The paper presents algorithms for computing these configuration space obstacles when the objects are polygons or polyhedra.

1,996 citations

••

Osaka University

^{1}TL;DR: The FAN (fan-out-oriented test generation algorithm) is presented, which is faster and more efficient than the PODEM algorithm reported by Goel and an automatic test generation system composed of the FAN algorithm and the concurrent fault simulation.

Abstract: In order to accelerate an algorithm for test generation, it is necessary to reduce the number of backtracks in the algorithm and to shorten the process time between backtracks. In this paper, we consider several techniques to accelerate test generation and present a new test generation algorithm called FAN (fan-out-oriented test generation algorithm). It is shown that the FAN algorithm is faster and more efficient than the PODEM algorithm reported by Goel. We also present an automatic test generation system composed of the FAN algorithm and the concurrent fault simulation. Experimental results on large combinational circuits of up to 3000 gates demonstrate that the system performs test generation very fast and effectively.

821 citations

••

TL;DR: The design for the NYU Ultracomputer is presented, a shared-memory MIMD parallel machine composed of thousands of autonomous processing elements that uses an enhanced message switching network with the geometry of an Omega-network to approximate the ideal behavior of Schwartz's paracomputers model of computation.

Abstract: We present the design for the NYU Ultracomputer, a shared-memory MIMD parallel machine composed of thousands of autonomous processing elements. This machine uses an enhanced message switching network with the geometry of an Omega-network to approximate the ideal behavior of Schwartz's paracomputer model of computation and to implement efficiently the important fetch-and-add synchronization primitive. We outine the hardware that would be required to build a 4096 processor system using 1990's technology. We also discuss system software issues, and present analytic studies of the network performance. Finally, we include a sample of our effort to implement and simulate parallel variants of important scientific programs.

708 citations

••

TL;DR: An asymptotic analysis of the performance of unbuffered banyan networks is presented, thereby solving a problem left open by Patel.

Abstract: This paper studies the performance of unbuffered and buffered, packet-switching, multistage interconnection networks. We begin by reviewing the definition of banyan networks and introducing some generalizations of them. We then present an asymptotic analysis of the performance of unbuffered banyan networks, thereby solving a problem left open by Patel. We analyze the performance of the unbuffered generalized banyan networks, and compare networks with approximately equivalent hardware complexity. Finally, we analyze the performance of buffered banyan networks and again compare networks with approximately equivalent hardware complexity.

563 citations

••

Brown University

^{1}TL;DR: This paper presents an implementation of the bottom-left heuristic for two-dimensional bin-packing which requires linear space and quadratic time, and believes that even for relatively small values of N, it gives the most efficient implementation of this heuristic, to date.

Abstract: We study implementations of the bottom-left heuristic for two-dimensional bin-packing. To pack N rectangles into an infinite vertical strip of fixed width, the strategy considered here places each rectangle in turn as low as possible in the strip in a left-justified position. For reasons of simplicity and good performance, the bottom-left heuristic has long been a favorite in practical applications; however, the best implementations found so far require a number of steps O(N3). In this paper, we present an implementation of the bottom-left heuristic which requires linear space and quadratic time. The algorithm is fairly practical, and we believe that even for relatively small values of N, it gives the most efficient implementation of the heuristic, to date. It proceeds by first determining all the possible locations where the next rectangle can fit, then selecting the lowest of them. It is optimal among all the algorithms based on this exhaustive strategy, and its generality makes it adaptable to different packing heuristics.

304 citations

••

Duke University

^{1}TL;DR: This paper describes by a series of examples a strategy for designing testable fault-tolerant arrays of processors by introducing redundancy in an array's communication links rather than in its processing elements (PE's).

Abstract: This paper describes by a series of examples a strategy for designing testable fault-tolerant arrays of processors. The strategy achieves fault tolerance by introducing redundancy in an array's communication links rather than in its processing elements (PE's). The major characteristics of the designs produced are as follows.

267 citations

••

TL;DR: The area-time complexity of sorting is analyzed under an updated model of VLSI computation, which makes a distinction between "processing" circuits and "memory" circuits; the latter are less important since they are denser and consume less power.

Abstract: The area-time complexity of sorting is analyzed under an updated model of VLSI computation. The new model makes a distinction between "processing" circuits and "memory" circuits; the latter are less important since they are denser and consume less power. Other adjustments to the model make it possible to compare pipelined and nonpipelined designs.

214 citations

••

TL;DR: Two graph theoretic models are introduced that provide a uniform procedure for analyzing 2n-input/2n-output Multistage Interconnection Networks (MIN's), implemented with 2- input/2-output Switching Elements (SE's) and satisfying a characteristics called the "buddy property."

Abstract: This paper introduces two graph theoretic models that provide a uniform procedure for analyzing 2n-input/2n-output Multistage Interconnection Networks (MIN's), implemented with 2-input/2-output Switching Elements (SE's) and satisfying a characteristics called the "buddy property." These models show that all such n-stage MIN's are topologically equivalent and hence prove that one MIN can be implemented from integrated circuits designed for another MIN. The proposed techniques also allow identical modeling and comparison of permutation capabilities of n-stage MIN's and other link-controlled networks like augmented data manipulator and SW Banyan Network and hence, allows comparison of their permutation. In the case of any conflict in the MIN, an upper bound for the required number of passes has been obtained.

203 citations

••

TL;DR: This paper surveys nine designs for VLSI circuits that compute N-element Fourier transforms; the largest of the designs requires O(N2 log N) units of silicon area; it can start a new Fourier transform every O(log N) time units.

Abstract: This paper surveys nine designs for VLSI circuits that compute N-element Fourier transforms. The largest of the designs requires O(N2 log N) units of silicon area; it can start a new Fourier transform every O(log N) time units. The smallest designs have about 1/Nth of this throughput, but they require only 1/Nth as much area.

182 citations

••

TL;DR: A merging algorithm is presented that is optimal up to a constant factor when merging two lists of equal size (independent of the number of processors); as a special case, with N processors it merges two lists, each of size N, in 1.893 lg lg N + 4 comparison steps.

Abstract: We study the number of comparison steps required for searching, merging, and sorting with P processors. We present a merging algorithm that is optimal up to a constant factor when merging two lists of equal size (independent of the number of processors); as a special case, with N processors it merges two lists, each of size N, in 1.893 lg lg N + 4 comparison steps. We use the merging algorithm to obtain a sorting algorithm that, in particular, sorts N values with N processors in 1.893 lg N lg lg N/lg lg lg N(plus lower order terms) comparison steps. The algorithms can be implemented on a shared memory machine that allows concurrent reads from the same location with constant overhead at each comparison step.

180 citations

••

TL;DR: A simple procedure is proposed that can be used to construct a directed graph whose diameter is less than or equal to that of any previously proposed graph.

Abstract: This paper proposes a simple procedure for the design of small-diameter graphs. It can be used to construct a directed graph whose diameter is less than or equal to that of any previously proposed graph.

••

IBM

^{1}TL;DR: In this paper, the authors considered analytical queueing models of programs with internal concurrency and developed two approximate solution methods for the performance prediction of such systems. But the results of the approximations are compared to those of simulations.

Abstract: Analytic queueing models of programs with internal concurrency are considered. The program behavior model allows a process to spawn two or more concurrent tasks at some point during its execution. Except for queueing effects, the tasks execute independently of one another, and at the end of their execution, either wait for all of their siblings to finish execution or merge with the parent if all have finished execution. Two approximate solution methods for the performance prediction of such systems are developed, and results of the approximations are compared to those of simulations. The approximations are both computationally efficient and highly accurate. The gain in performance due to multitasking and multiprocessing is studied with a series of examples.

••

TL;DR: It is shown that augmenting an arbitrary mesh-connected computer with a second communication system called broadcasting significantly decreases the time to do sample problems such as semigroup calculations or finding the median, but it cannot significantly improve sorting.

Abstract: We consider the effects of augmenting an arbitrary mesh-connected computer with a second communication system called broadcasting. In broadcasting, a processor sends a value to all the other processors simultaneously, taking unit time, with the restriction that only one broadcast occurs at any one time. We show that this significantly decreases the time to do sample problems such as semigroup calculations or finding the median, but it cannot significantly improve sorting. For example, in a one-dimensional mesh-connected computer without broadcasting, if there are n numbers, each stored separately in consecutive processors, then θ(n) time is needed to find their minimum, find their median, or sort them, while with broadcasting, this can be done in θ(n1/2), θ((n log n)1/2), and θ(n) time, respectively.

••

IBM

^{1}TL;DR: A simple and efficient solution to this problem, derived from the connections between polynomials over finite fields and linear feedback shift registers, is presented and applications to the problem of VLSI self-testing are discussed and illustrated.

Abstract: One has a shift register of length n and a collection of designated subsets of {0, 1,···, n-1}. The problem is to devise a method for feeding a string of bits into the shift register in such an order that, for each designated subset S = {k1,···, kr}, if one keeps track of the bit patterns appearing at the corresponding positions k1, ···, krof the shift register, all 2r possible bit patterns will ultimately appear at those positions. A simple and efficient solution to this problem, derived from the connections between polynomials over finite fields and linear feedback shift registers, is presented. Applications of this solution to the problem of VLSI self-testing are discussed and illustrated.

••

TL;DR: It is possible to find the smallest nonnegative integer R congruent modulo M to the product AB of two nonnegative integers without dividing by M.

Abstract: It is possible to find the smallest nonnegative integer R congruent modulo M to the product AB of two nonnegative integers without dividing by M. In multiple precision arithmetic, doing away with the division cuts the calculation time by varying amounts, depending on machine architecture. It also cuts storage space.

••

TL;DR: It is shown that the area of any circuit computing a transitive function grows quadratically with the circuit's maximum data rate, expressed in bits/S, which provides a precise analytic expression of an area-time tradeoff for a wide class of VLSI circuits.

Abstract: We introduce a property of Boolean functions, called transitivity which consists of integer, polynomial, and matrix products as well as of many interesting related computational problems. We show that the area of any circuit computing a transitive function grows quadratically with the circuit's maximum data rate, expressed in bits/S. This result provides a precise analytic expression of an area-time tradeoff for a wide class of VLSI circuits. Furthermore (as shown elsewhere), this tradeoff is achievable. We have thus matching (to within a constant multiplicative factor) upper and lower complexity bounds for the three above products, in the VLSI circuits computational model.

••

TL;DR: Applying the analytic results to the slotted ALOHA with single packet messages, it is proved mathematically that a method by Kleinrock and Lam for taking into account the influence of the propagation delay is an excellent approximation.

Abstract: The dynamic behavior of the R-ALOHA packet broadcast system with multipacket messages is analyzed in this paper. It is assumed that each user handles one message at a time and the number of packets in a message is geometrically distributed. A Markovian model of the system is first formulated which explicitly contains the influence of the propagation delay of the broadcast channel. An approximate technique called equilibrium point analysis (EPA) is utilized to analyze the multidimensional Markov chain. The system stability behavior and the throughput-average message delay performance are demonstrated by the EPA. Numerical results from both analysis and simulation are given to assess the accuracy of the analytic results. Applying the analytic results to the slotted ALOHA with single packet messages, we prove mathematically that a method by Kleinrock and Lam for taking into account the influence of the propagation delay is an excellent approximation.

••

TL;DR: The generalized shuffle network (GSN) as mentioned in this paper is based on a new interconnection pattern called a generalized shuffle and is capable of connecting any number of processors M to any number memory modules N. The technique results in a variety of interconnection networks depending on how M nd N are factored.

Abstract: This paper introduces a general class of self-routing interconnection networks for tightly coupled multiprocessor systems. The proposed network, named a "generalized shuffle network (GSN)," is based on a new interconnection pattern called a generalized shuffle and is capable of connecting any number of processors M to any number of memory modules N. The technique results in a variety of interconnection networks depending on how M nd N are factored. The network covers a broad spectrum of interconnections, starting from shared bus to crossbar switches and also includes various multistage interconnection networks (MIN's).

••

TL;DR: It is shown that the sign/logarithm approach provides improved arithmetic quantization error performance for a given word size over FFT's implemented with conventional fixed or floating point arithmetic, and that its implementation is faster and less complex than conventional approaches.

Abstract: Sign/logarithm arithmetic is applicable to a variety of numerical applications where wide dynamic range and small wordsize are required. In this paper the basic sign/logarithm arithmetic operations required for signal processing (i.e., addition, subtraction, and multiplication) are reviewed, the computational errors are analyzed for FFT realization, and simulation results are presented which serve to verify the analysis. It is shown that the sign/logarithm approach provides improved arithmetic quantization error performance for a given word size over FFT's implemented with conventional fixed or floating point arithmetic, and that the sign/logarithm implementation is faster and less complex than conventional approaches.

••

TL;DR: A general class of fault-Tolerant multistage interconnection networks is presented, wherein fault-tolerance is achieved by providing multiple disjoint paths between every input and output.

Abstract: A general class of fault-tolerant multistage interconnection networks is presented, wherein fault-tolerance is achieved by providing multiple disjoint paths between every input and output. These networks are derived from the Omega networks and as such retain all the connection properties of the parent networks in the absence of faults. An R-path network in this class can tolerate (R-1) arbitrary faults in the intermediate stages of the network at a cost that is far less than providing R copies of the original network. Different techniques for constructing such networks are presented and relevant properties and control algorithms are investigated.

••

TL;DR: The OTC and OTN can be looked upon as general purpose parallel processors since a number of other problems such as sorting and DFT can be solved on them with an area * time2 performance matching that of other networks.

Abstract: In this paper we describe two interconnection networks for parallel processing, namely the orthogonal trees network and the orthogonal tree cycles (OTN and OTC). Both networks are suitable for VISI implementation and have been analyzed using Thompson's model of VLSI. While the OTN and OTC have time performances similar to fast networks such as the perfect shuffle network (PSN), the cube comnected cycles (CCC), etc., they have substantially better area * time2 performances for a number of matrix and graph problems. For instance, the connected components and a minimal spanning tree of an undirected N-vertex graph can be found in 0(log4 N) time on the OTC with an area * time2 performance of 0(N2 log8 N) and 0(N2 log9 N) respectively. This is asymptoticaly much better than the performances of the CCC, PSN and Mesh. The OTC and OTN can be looked upon as general purpose parallel processors since a number of other problems such as sorting and DFT can be solved on them with an area * time2 performance matching that of other networks. Finally, programming the OTN and OTC is simple and they are also amenable to pipelining a series of problems.

••

TL;DR: Testing of logic networks by verifying the Walsh coefricients of the outputs is explored, and measurement of one of these can detect arbitrarily many input leads stuck, and just two measurements can detect any single stuck-at fault in appropriately designed networks.

Abstract: Testing of logic networks by verifying the Walsh coefricients of the outputs is explored. Measurement of one of these can detect arbitrarily many input leads stuck, and just two measurements, requiring little hardware, can detect any single stuck-at fault in appropriately designed networks.

••

IBM

^{1}TL;DR: The algorithm for redundancy removal described in this paper has been used successfully for both of the above purposes and achieves savings in computer resources at the expense of possibly failing to discover some redundancies.

Abstract: A signal in a logical network is called redundant if it can be replaced by a constant without changing the function of the network Detecting redundancy is important for two reasons: guaranteeing coverage in stuck-fault testing, and simplifying multilevel logic without converting to two levels In particular, removing redundancy allows simplification in the presence of don't cares The algorithm for redundancy removal described in this paper has been used successfully for both of the above purposes It achieves savings in computer resources at the expense of possibly failing to discover some redundancies

••

Duke University

^{1}TL;DR: A review and a critical evaluation of a representative class of state-of-the-art models for ultrahigh reliability prediction leads to a new model now under development that combines the flexibility and accuracy of simulation with the speed of analytic models.

Abstract: A review and a critical evaluation of a representative class of state-of-the-art models for ultrahigh reliability prediction is presented. This evaluation naturally leads us to a new model for ultrahigh reliability prediction now under development. The new model combines the flexibility and accuracy of simulation with the speed of analytic models.

••

IBM

^{1}TL;DR: A simple way is developed of generating a test set which simultaneously provides exhaustive pattern testing with respect to all input subsets of a logic circuit up to a certain size and can be effectively implemented via a scan path type shifter.

Abstract: We develop in this paper a simple way of generating a test set which simultaneously provides exhaustive pattern testing with respect to all input subsets of a logic circuit up to a certain size. It is shown that such a test set may be formed with vectors of a particular set of weights. Main theorems and examples are established and illustrated in the binary case (for 2-value logic circuits) and then generalized to nonbinary cases (for multivalue logic circuits). Such test sets are simple in structure and become optimal in size in certain cases. It is also shown that such a test set can be effectively implemented via a scan path type shifter.

••

TL;DR: A realistic model for divide-and-conquer based algorithms is postulated; the efficiency of some algorithms is analyzed, taking into account all relevant parameters of the model (time, data movement and number of processors.)

Abstract: The well known divide-and-conquer paradigm has proved to be useful for deriving efficient algorithms for many problems. Several researchers have pointed out its usefulness for parallel processing; however, the problem of analyzing such parallel algorithms in a realistic setting has been largely overlooked. In this paper a realistic model for divide-and-conquer based algorithms is postulated; the efficiency of some algorithms is then analyzed, taking into account all relevant parameters of the model (time, data movement and number of processors.)

••

TL;DR: A network clock distribution scheme which guarantees equal length clock paths is presented and a comparison between two approaches used in the design of a basic network switching module is developed.

Abstract: A central issue in the design of multiprocessor systems is the interconnection network which provides communication paths between the processors For large systems, high bandwidth interconnection networks will require numerous "network chips" with each chip implementing some subnetwork of the original larger network Modularity and growth are important properties for such networks since multiprocessor systems may vary in size This paper is concerned with the question of timing control of such networks Two approaches, asynchronous and clocked, are used in the design of a basic network switching module The modules and the approaches are then modeled and equations for network time delay are developed These equations form the basis for a comparison between the two approaches The importance of clock distribution strategies and clock skew is quantified, and a network clock distribution scheme which guarantees equal length clock paths is presented

••

Rice University

^{1}TL;DR: An algorithm is presented to merge two subfiles of size n/2 each, stored in the left and the right halves of a linearly connected processor array, in 3n/2 route steps and log n compare-exchange steps.

Abstract: An algorithm is presented to merge two subfiles of size n/2 each, stored in the left and the right halves of a linearly connected processor array, in 3n/2 route steps and log n compare-exchange steps. This algorithm is extended to merge two horizontally adjacent subfiles of size m × n/2 each, stored in an m × n mesh-connected processor array in row-major order, in m + 2n route steps and log mn compare-exchange steps. These algorithms are faster than their counterparts proposed so far.

••

TL;DR: A new, fully parallel mixed- Radix conversion (MRC) algorithm which utilizes the maximum parallelism that exists in the residues (RNS) to mixed-radix (MR) digits conversion to achieve high throughput rate and very short conversion time is presented.

Abstract: A new, fully parallel mixed-radix conversion (MRC) algorithm which utilizes the maximum parallelism that exists in the residues (RNS) to mixed-radix (MR) digits conversion to achieve high throughput rate and very short conversion time is presented. The new algorithm has a conversion time of two table look-up cycles for moduli sets consisting of up to 15 moduli. As a comparison, the classical Szabo and Tanaka MRC algorithm has a conversion time of (n − 1) clock cycles for an n-moduli RNS. This algorithm can be implemented by off-the-shelf ECL IC's to achieve a conversion time of 50 ns and a throughput rate of 40 MHz for a 150-bit RNS.

••

TL;DR: Two planar geometric problems relating to a convex n-gon P and a simple nonconvex m-gon Q are considered.

Abstract: Two planar geometric problems relating to a convex n-gon P and a simple nonconvex m-gon Q are considered.