# Showing papers in "IEEE Transactions on Computers in 1980"

••

TL;DR: In this article, the contract net protocol has been developed to specify problem-solving communication and control for nodes in a distributed problem solver, where task distribution is affected by a negotiation process, a discussion carried on between nodes with tasks to be executed and nodes that may be able to execute those tasks.

Abstract: The contract net protocol has been developed to specify problem-solving communication and control for nodes in a distributed problem solver. Task distribution is affected by a negotiation process, a discussion carried on between nodes with tasks to be executed and nodes that may be able to execute those tasks.

3,612 citations

••

TL;DR: The massively parallel processor (MPP) as discussed by the authors was designed to process satellite imagery at high rates, achieving 8-bit integer data, addition can occur at 6553 million operations per second (MOPS) and multiplication at 1861 MOPS.

Abstract: The massively parallel processor (MPP) system is designed to process satellite imagery at high rates. A large number (16 384) of processing elements (PE's) are configured in a square array. For optimum performance on operands of arbitrary length, processing is performed in a bit-serial manner. On 8-bit integer data, addition can occur at 6553 million operations per second (MOPS) and multiplication at 1861 MOPS. On 32-bit floating-point data, addition can occur at 430 MOPS and multiplication at 216 MOPS.

815 citations

••

TL;DR: A baseline network and a configuration concept are introduced to evaluate relationships among some proposed multistage interconnection networks and it is proven that the data manipulator, flip network, omega network, indirect binary n-cube network, and regular SW banyan network are topologically equivalent.

Abstract: A baseline network and a configuration concept are introduced to evaluate relationships among some proposed multistage interconnection networks. It is proven that the data manipulator (modified version), flip network, omega network, indirect binary n-cube network, and regular SW banyan network (S = F = 2) are topologically equivalent. The configuration concept facilitates developing a homogeneous routing algorithm which allows one-to-one and one- to-many connections from an arbitrary side of a network to the other side. This routing algorithm is extended to full communication which allows connections between terminals on the same side of a network. A conflict resolution scheme is also included. Some practical implications of our results are presented for further research.

799 citations

••

TL;DR: A hierarchical modeling scheme is used to formulate the capability function and capability is used, in turn, to evaluate performability, and techniques are illustrated for a specific application: the performability evaluation of an aircraft computer in the environment of an air transport mission.

Abstract: If the performance of a computing system is "degradable," performance and reliability issues must be dealt with simultaneously in the process of evaluating system effectiveness. For this purpose, a unified measure, called "performability," is introduced and the foundations of performability modeling and evaluation are established. A critical step in the modeling process is the introduction of a "capability function" which relates low-level system behavior to user-oriented performance levels. A hierarchical modeling scheme is used to formulate the capability function and capability is used, in turn, to evaluate performability. These techniques are then illustrated for a specific application: the performability evaluation of an aircraft computer in the environment of an air transport mission.

760 citations

••

TL;DR: In this paper, a general graph-theoretic model is developed at the register transfer level which takes the microprocessor organization and the instruction set as parameters and generate tests to detect all the faults in the fault model.

Abstract: The goal of this paper is to develop test generation procedures for testing microprocessors in a user environment. Classical fault detection methods based on the gate and flip-flop level or on the state diagram level description of microprocessors are not suitable for test generation. The problem is further compounded by the availability of a large variety of microprocessors which differ widely in their organization, instruction repertoire, addressing modes, data storage, and manipulation facilities, etc. In this paper, a general graph-theoretic model is developed at the register transfer level. Any microprocessor can be easily modeled using information only about its instruction set and the functions performed. This information is readily available in the user's manual. A fault model is developed on a functional level quite independent of the implementation details. The effects of faults in the fault model are investigated at the level of the graph-theoretic model. Test generation procedures are proposed which take the microprocessor organization and the instruction set as parameters and generate tests to detect all the faults in the fault model. The complexity of the test sequences measured in terms of the number of instructions is given. Our effort in generating tests for a real microprocessor and evaluating their fault coverage is described.

380 citations

••

TL;DR: At the end of an IC production line, integrated circuits are generally submitted to three kinds of tests: 1) parametric tests to check electrical characteristics (voltage, current, power consumption), 2) dynamic tests to Check response times under nominal operating conditions, and 3) functional tests toCheck its logical behavior.

Abstract: At the end of an IC production line, integrated circuits are generally submitted to three kinds of tests: 1) parametric tests to check electrical characteristics (voltage, current, power consumption), 2) dynamic tests to check response times under nominal operating conditions, and 3) functional tests to check its logical behavior.

350 citations

••

TL;DR: Many practically important problems in computational geometry may be regarded as a generalization of "clipping," and may be formulated precisely in terms of a function called "membership classification."

Abstract: Many practically important problems in computational geometry may be regarded as a generalization of "clipping," and may be formulated precisely in terms of a function called "membership classification." This function operates on a pair of point sets called the reference and candidate sets; it segments the candidate into three subsets which are "inside," "outside," and "on the boundary of" the reference. Examples of classification problems include clipping, polygon intersection, point inclusion, and solid interference.

308 citations

••

TL;DR: The purpose of this paper is to present some ideas on multiprocessor design and on automatic translation of sequential programs into parallel programs for multip rocessors.

Abstract: The purpose of this paper is to present some ideas on multiprocessor design and on automatic translation of sequential programs into parallel programs for multiprocessors. With respect to machine design, two subjects are discussed. First, a multiprocessor allowing parallelism at a very low level is sketched and then, a brief discussion on the interconnection network is presented.

281 citations

••

TL;DR: This paper investigates the problem of reporting all intersecting pairs in a set of n rectilinearly oriented rectangles in the plane and describes an algorithm that solves this problem in worst case time proportional to n lg n + k, where k is the number of interesecting pairs found.

Abstract: In this paper we investigate the problem of reporting all intersecting pairs in a set of n rectilinearly oriented rectangles in the plane. This problem arises in applications such as design rule checking of very large-scale integrated (VLSI) circuits and architectural databases. We describe an algorithm that solves this problem in worst case time proportional to n lg n + k, where k is the number of interesecting pairs found. This algorithm is optimal to within a constant factor. As an intermediate step of this algorithm, we solve a problem related to the range searching problem that arises in database applications. Although the algorithms that we describe are primarily theoretical devices (being very difficult to code), they suggest other algorithms that are quite practical.

258 citations

••

TL;DR: A set of algebraic tools is developed and is used to prove that Lawrie's inverse Omega network, Pease's indirect binary n-cube array, and a network related to the 3-stage rearrangeable switching network studied by Clos and Beneš have identical switching capabilities.

Abstract: In this paper a number of properties of Shuffle/Exchange networks are analyzed. A set of algebraic tools is developed and is used to prove that Lawrie's inverse Omega network, Pease's indirect binary n-cube array, and a network related to the 3-stage rearrangeable switching network studied by Clos and Benes have identical switching capabilities. The approach used leads to a number of insights on the structure of the fast Fourier transform (FFT) algorithm. The inherent permuting power, or "universality," of the networks when used iteratively is then probed, leading to some nonintuitive results which have implications on the optimal control of Shuffle/Exchange-type networks for realizing permutations and broadcast connections.

233 citations

••

IBM

^{1}TL;DR: This paper focuses on classical testing of combinational circuits and the large storage requirement for a list of the fault-free response of the circuit to the test set.

Abstract: Classical testing of combinational circuits requires a list of the fault-free response of the circuit to the test set. For most practical circuits implemented today the large storage requirement for such a list makes such a test procedure very expensive. Moreover, the computational cost to generate the test set increases exponentially with the circuit size.

••

TL;DR: A linear feedback shift register can be used to compress a serial stream of test result data and it is possible for an erroneous bit stream and the correct one to result in the same signature.

Abstract: A linear feedback shift register can be used to compress a serial stream of test result data. The compressed erroneous bit stream caused by a fault is said to form the "signature" of the fault. Since the bit stream is compressed, however, it is possible for an erroneous bit stream and the correct one to result in the same signature.

••

TL;DR: The program has successfully produced sets of delay tests for large logic networks and the average coverage achieved by these tests faDs within 95.8 percent to 99.9 percent of optimal.

Abstract: Delay testing is a test procedure to verify the timing performance of manufactured logic networks. When a level-sensitive scan design (LSSD) discipline is used, all networks are combinational. Appropriate test patterns are selected on the basis of certain theoretical criteria. These criteria are embodied in an experimental test generation program. The program has successfully produced sets of delay tests for large logic networks. The average coverage achieved by these tests faDs within 95.8 percent to 99.9 percent of optimal.

••

TL;DR: A monolithic processor computes products, quotients, and several common transcendental functions, based on the well-known principles of "CORDIC," but recourse to a subtle novel corollary results in a scale factor of unity.

Abstract: A monolithic processor computes products, quotients, and several common transcendental functions. The algorithms are based on the well-known principles of "CORDIC," but recourse to a subtle novel corollary results in a scale factor of unity. Compared to older machines, the overhead burden is significantly reduced. Also, expansion of the functional repertoire beyond the circular domain, i.e., addition to the menu of hyperbolic and linear operations, is a relatively trivial matter, in terms of both hardware cost and execution time. A bulk CMOS technology with conservative layout rules is used for the sake of high reliability, low-power consumption, and good cycle speed.

••

TL;DR: The range of application areas to which distributed processing has been applied effectively is limited as discussed by the authors. In order to extend this range, new models for organizing distributed systems must be developed, and a new approach for distributed processing is needed.

Abstract: The range of application areas to which distributed processing has been applied effectively is limited. In order to extend this range, new models for organizing distributed systems must be developed.

••

TL;DR: The performance of the R-ALOHA protocol for multiple access is studied and numerical results from both analysis and simulation are presented to illustrate the accuracy of the analytic models as well as performance characteristics of the rhoHA protocol.

Abstract: In packet broadcast networks, users are interconnected via a broadcast channel. The key problem is multiple access of the shared broadcast channel. The performance of the R-ALOHA protocol for multiple access is studied in this paper. Two user models with Poisson message arrivals are analyzed; each message consists of a group of packets with a general probability distribution for group size. In the first model, each user handles one message at a time. In the second model, each user has infinite buffering capacity for queueing. Analytic models are developed for characterizing message delay and channel utilization. Bounds on channel throughput are established for two slightly different protocols. Numerical results from both analysis and simulation are presented to illustrate the accuracy of the analytic models as well as performance characteristics of the R-ALOHA protocol.

••

TL;DR: The main vehicle of this approach is the deduction of internal line values in a circuit under test N*.

Abstract: In this paper we present a new approach to multiple fault diagnosis in combinational circuits based on an effect-cause analysis. The main vehicle of our approach is the deduction of internal line values in a circuit under test N*. The knowledge of these values allows us to identify fault situations in N* (causes) which are compatible with the applied test and the obtained response (the effect). A fault situation specifies faulty as well as fault-free lines. Other applications include identifying the existence of nonstuck faults in N* and determination of faults not detected by a given test, including redundant faults. The latter application allows for the generation of tests for multiple faults without performing fault enumeration.

••

TL;DR: Properties of the reverse-exchange interconnection network are used to develop a reconfiguration scheme and a two-pass structure for enhancing the efficiency of a class of multistage interconnection networks and it is proved that arbitrary permutations can be realized in two passes.

Abstract: Properties of the reverse-exchange interconnection network are used to develop a reconfiguration scheme and a two-pass structure for enhancing the efficiency of a class of multistage interconnection networks. Functional relationships among a class of multistage interconnection networks are first derived. According to the functional relationships, we propose a reconfiguration scheme which enables a network to accomplish various interconnection functions of other networks. Then the admissible permutations along with related recursive control algorithms of the reverse-exchange interconnection network are specified through a set of theorems. Using the reverse-exchange property, we also prove that the algorithms actually work. Finally, we prove that arbitrary permutations can be realized in two passes (or 2 · 1og2N switching steps where N is the network size). By taking advantage of Benes network control algorithms, a way to control the two-pass structure is also developed.

••

TL;DR: It is demonstrated that minimum-length SPSF tests can be inherently asymmetric and interpreted as polyominoes.

Abstract: The design of minimum-length test sequences for pattern sensitivity in random-access memory (RAM) arrays is examined. The single pattern-sensitive fault (SPSF) model is used in which operations addressed to at most one memory cell are allowed to be faulty at any time. The influence of an SPSF affecting cell Ci is restricted to a fixed set of cells called the neighborhood of Ci. A new method is presented for efficiently generating the sequence of writes required in an SPSF test. This method yields optimal sequences for a useful class of neighborhoods called tiling neighborhoods. It is observed that RAM neighborhoods can be interpreted as polyominoes. A general procedure is given for constructing an SPSF test containing the minimum number of writes but a nonminimum number of reads. The difficult problem of minimizing the number of reads in an SPSF test is investigated for the 2-cell memory M2. A test of length 36 for M2 is derived which is optimal under certain reasonable restrictions. It is demonstrated that minimum-length SPSF tests can be inherently asymmetric.

••

Bell Labs

^{1}TL;DR: A class of pattern-sensitive faults in semiconductor random-access memories are studied and efficient test procedures to detect and locate modeled faults are presented.

Abstract: A class of pattern-sensitive faults in semiconductor random-access memories are studied. Efficient test procedures to detect and locate modeled faults are presented.

••

TL;DR: In this paper, a technique for multiplying numbers, modulo a prime number, using look-up tables stored in read-only memories is discussed, and the application is in the computation of number theoretic transforms implemented in a ring which is isomorphic to a direct sum of several Galois fields, parallel computations being performed in each field.

Abstract: This paper discusses a technique for multiplying numbers, modulo a prime number, using look-up tables stored in read-only memories. The application is in the computation of number theoretic transforms implemented in a ring which is isomorphic to a direct sum of several Galois fields, parallel computations being performed in each field.

••

TL;DR: Separable error-correcting/detecting codes are developed that provide protection against combinations of both unidirectional and random errors.

Abstract: Separable error-correcting/detecting codes are developed that provide protection against combinations of both unidirectional and random errors. Specifically, codes are presented which can both: 1) correct (detect) some t random errors, and 2) detect any number of unidirectional errors which may also contain t or fewer random errors. Necessary and sufficient conditions for the existence of these codes are also developed. Decoding algorithms for these codes are presented, and implementations of the algorithms are also discussed.

••

TL;DR: The theory underlying the partitioning of MSIMD system permutation networks into independent subnetworks is explored and the use of the theory is demonstrated by applying it to the Cube, Illiac, PM2I, and Shuffle-Exchange SIMD machine interconnection networks.

Abstract: The age of the microcomputer has made feasible large-scale multiprocessor systems. In order to use this parallel processing power in the form of a flexible multiple-SIMD (MSIMD) system, the interconnection network must be partitionable and dynamically reconfigurable. The theory underlying the partitioning of MSIMD system permutation networks into independent subnetworks is explored. Conditions for determining if a network can be partitioned into independent subnetworks and the ways in which it can be partitioned are presented. The use of the theory is demonstrated by applying it to the Cube, Illiac, PM2I, and Shuffle-Exchange SIMD machine interconnection networks. Both recirculating (single stage) and multistage network implementations are considered.

••

TL;DR: Two solutions for concurrent search and insertion in AVL trees are developed to allow several readers to share nodes with a writer process and introduces additional concurrency among writers by applying various parallelization techniques.

Abstract: This paper addresses the problem of concurrent access to dynamically balanced binary search trees. Specifically, two solutions for concurrent search and insertion in AVL trees are developed. The first solution is relatively simple and is intended to allow several readers to share nodes with a writer process. The second solution uses the first as a starting point and introduces additional concurrency among writers by applying various parallelization techniques. Simulation results used to evaluate the parallel performance of these algorithms with regard to the amount of concurrency achieved and the parallel overhead incurred are summarized.

••

TL;DR: An algorithm, based on the "bounded branch and bound" integer programming technique, has been developed to obtain the optimal solution of the model, and it is found to be more efficient than several existing general nonlinear integer programming algorithms.

Abstract: In this paper a model is developed for the optimization of distributed information systems. Compared with the previous work in this area, the model is more complete, since it considers simultaneously the distribution of processing power, the allocation of programs and databases, and the assignment of communication line capacities. It also considers the return flow of information, as well as the dependencies between programs and databases. In addition, an algorithm, based on the "bounded branch and bound" integer programming technique, has been developed to obtain the optimal solution of the model. The algorithm is more efficient than several existing general nonlinear integer programming algorithms. Also, it avoids some of the disadvantages of heuristic and decomposition algorithms which are used widely in the optimization of computer networks and distributed databases. The algorithm has been implemented in Fortran, and the computation times of the algorithm for several test problems have been found very reasonable.

••

Bell Labs

^{1}TL;DR: The results of an extended effort to develop a unified approach to reliability modeling of fault-tolerant computers which strikes a good compromise between generality and practicality are summarized.

Abstract: The diversified nature of fault-tolerant computers led to the development of a multiplicity of reliability models which are seemingly unrelated to each other. As a result, it becomes difficult to develop automated tools for reliability analysis which are both general and efficient. Thus, the potential of reliability modeling as a practical and useful tool in the design process of fault-tolerant computers has not been fully realized. This paper summarizes the results of an extended effort to develop a unified approach to reliability modeling of fault-tolerant computers which strikes a good compromise between generality and practicality. The unified model developed encompasses repairable and nonrepairable systems and models, transient as well as permanent faults, and their recovery. Based on the unified model, a powerful and efficient reliability estimation program ARIES has been developed.

••

TL;DR: Hu's level scheduling strategy is applied to examples of sparse matrix equations with surprisingly good results.

Abstract: The solution process of Ax = b is modeled by an acyclic directed graph in which the nodes represent the arithmetic operations applied to the elements of A, and the arcs represent the precedence relations that exist among the operations in the solution process. Operations that can be done in parallel are identified in the model and the absolute minimum completion time and lower bounds on the minimum number of processors required to solve the equations in minimal time can be found from it. Properties of the model are derived. Hu's level scheduling strategy is applied to examples of sparse matrix equations with surprisingly good results. Speed-up using parallel processing is found to be proportional to the number of processors when it is 10-20 percent of the order of A.

•

[...]

TL;DR: The concept of merged arithmetic as discussed by the authors is introduced and demonstrated in the context of multiterm multiplication/addition, which involves synthesizing a composite arithmetic function (such as an inner product) directly instead of decomposing the function into discrete multiplication and addition operations.

Abstract: The concept of merged arithmetic is introduced and demonstrated in the context of multiterm multiplication/addition. The merged approach involves synthesizing a composite arithmetic function (such as an inner product) directly instead of decomposing the function into discrete multiplication and addition operations. This approach provides equivalent arithmetic throughput with lower implementation complexity than conventional fast multipliers and carry look-ahead adder trees.

••

General Electric

^{1}TL;DR: It is shown that a dataflow machine can automatically unfold the nested loops of n X n matrix multiply to reduce its time complexity from 0(n3) to 0( n) so long as sufficient processors and communication capacity is available.

Abstract: Our goal is to devise a computer comprising large numbers of cooperating processors (LSI). In doing so we reject the sequential and memory cell semantics of the von Neumann model, and instead adopt the asynchronous and functional semantics of dataflow. We briefly describe the high-level dataflow programming language Id, as well as an initial design for a dataflow machine and the results of detailed deterministic simulation experiments on a part of that machine. For example, we show that a dataflow machine can automatically unfold the nested loops of n X n matrix multiply to reduce its time complexity from 0(n3) to 0(n) so long as sufficient processors and communication capacity is available. Similarly, quicksort executes with average 0(n) time demanding 0(n) processors. Also discussed are the use of processor and communication time complexity analysis and "flow analysis," as aids in understanding the behavior of the machine.

••

TL;DR: Performability modeling and evaluation methods are applied to the SIFT computer in the computational environment of an air transport mission to determine the performability of the total system for various choices of computer and environment parameter values.

Abstract: Performability modeling and evaluation methods are applied to the SIFT computer in the computational environment of an air transport mission. User-visible performance of the "total system" (SIFT plus its environment) is modeled as a random variable taking values in a set of "accomplishment levels." These levels are defined in terms of four attributes of total system behavior: safety, no change in mission profile, no operational penalties, and no economic penalties. The "base model" of the total system is a stochastic process whose states describe the internal structure of SIFT as well as relevant conditions of its environment. Base model state trajectories are related to accomplishment levels via a "capability function" which is formulated in terms of a three-level model hierarchy. Solution methods are then applied to determine the performability of the total system for various choices of computer and environment parameter values.