scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Computers in 1984"


Journal ArticleDOI
TL;DR: Algorithm-based fault tolerance schemes are proposed to detect and correct errors when matrix operations such as addition, multiplication, scalar product, LU-decomposition, and transposition are performed using multiple processor systems.
Abstract: The rapid progress in VLSI technology has reduced the cost of hardware, allowing multiple copies of low-cost processors to provide a large amount of computational capability for a small cost. In addition to achieving high performance, high reliability is also important to ensure that the results of long computations are valid. This paper proposes a novel system-level method of achieving high reliability, called algorithm-based fault tolerance. The technique encodes data at a high level, and algorithms are designed to operate on encoded data and produce encoded output data. The computation tasks within an algorithm are appropriately distributed among multiple computation units for fault tolerance. The technique is applied to matrix compomations which form the heart of many computation-intensive tasks. Algorithm-based fault tolerance schemes are proposed to detect and correct errors when matrix operations such as addition, multiplication, scalar product, LU-decomposition, and transposition are performed using multiple processor systems. The method proposed can detect and correct any failure within a single processor in a multiple processor system. The number of processors needed to just detect errors in matrix multiplication is also studied.

1,312 citations


Journal ArticleDOI
TL;DR: A general class of hypercube structures is presented in this paper for interconnecting a network of microcomputers in parallel and distributed environments and the performance is compared to that of other existing hyper cube structures such as Boolean n-cube and nearest neighbor mesh computers.
Abstract: A general class of hypercube structures is presented in this paper for interconnecting a network of microcomputers in parallel and distributed environments. The interconnection is based on a mixed radix number system and the technique results in a variety of hypercube structures for a given number of processors N, depending on the desired diameter of the network. A cost optimal realization is obtained through a process of discrete optimization. The performance of such a structure is compared to that of other existing hypercube structures such as Boolean n-cube and nearest neighbor mesh computers.

786 citations


Journal ArticleDOI
Hurst1
TL;DR: This tutorial/survey paper will review the historical developments in this field, both in circuit realizations and in methods of handling multiple-valued design data, and consider the present state-of-the-art and future expectations.
Abstract: Multiple-valued logic, in which the number of discrete logic levels is not confined to two, has been the subject of much research over many years. The practical objective of this work has been to increase the information content of the digital signals in a system to a higher value than that provided by binary operation. In this tutorial/survey paper we will review the historical developments in this field, both in circuit realizations and in methods of handling multiple-valued design data, and consider the present state-of-the-art and future expectations.

489 citations


Journal ArticleDOI
Kasahara1, Narita
TL;DR: This paper proposes a heuristic algorithm named CP/MISF (critical path/most immediate successors first) and an optimization/approximation algorithm named DF/IHS (d thfirst/implicit heuristic search) which can reduce markedly space complexity and average computation time.
Abstract: This paper describes practical optimization/ approximation algorithms for scheduling a set of partially ordered computational tasks onto a multiprocessor system so that the schedule length will be minimized. Since this problem belongs to the class of ''strong'' NP-hard problems, we must foreclose the possibility of constructing not only pseudopolynomial time optimization algorithms but also fully polynomial time approximation schemes unless P = NP. This paper proposes a heuristic algorithm named CP/MISF (critical path/most immediate successors first) and an optimization/approximation algorithm named DF/IHS (d thfirst/implicit heuristic search). DF/IHS is an excellent scheduling method which can reduce markedly space complexity and average computation time by combining the branch-and-bound method with CP/MISF; it allows us to solve very large scale problems with a few hundred tasks. Numerical examples are included to demonstrate the effectiveness of the proposed algorithms.

480 citations


Journal ArticleDOI
TL;DR: A formal theory of MOS logic circuits is developed starting from a description of circuit behavior in terms of switch graphs and an algorithm for a logic simulator based on the switch-level model which computes the new state of the network by solving a set of equations in a simple, discrete algebra.
Abstract: The switch-level model describes the logical behavior of digital systems implemented in metal oxide semiconductor (MOS) technology. In this model a network consists of a set of nodes connected by transistor "switches" with each node having a state 0, 1, or X (for invalid or uninitialized), and each transistor having a state "open," "closed," or "indeterminate." Many characteristics of MOS circuits can be modeled accurately, including: ratioed, complementary, and precharged logic; dynamic and static storage; (bidirectional) pass transistors; buses; charge sharing; and sneak paths. In this paper we present a formal development of the switch-level model starting from a description of circuit behavior in terms of switch graphs. Then we describe an algorithm for a logic simulator based on the switch-level model which computes the new state of the network by solving a set of equations in a simple, discrete algebra. This algorithm has been implemented in the simulator MOSSIM II and operates at speeds approaching those of conventional logic gate simulators. By developing a formal theory of MOS logic circuits, we have achieved a greater degree of generality and accuracy than is found in other logic simulators for MOS.

386 citations


Journal ArticleDOI
TL;DR: It is shown that tp-diagnosable systems, due to their robust interconnection structure, possess heretofore unknown graph theoretic properties relative to vertex cover sets and maximum matchings.
Abstract: Consider a system composed of n independent processors, each of which tests a subset of the others. It is assumed that at most tp of these processors are permanently faulty and that the outcome of a test is reliable if and only if the processor which performed the test is fault free. Such a system is said to be tp-diagnosable if, given any complete collection of test results, the set of faulty processors can be uniquely identified. In this paper, it is shown that tp-diagnosable systems, due to their robust interconnection structure, possess heretofore unknown graph theoretic properties relative to vertex cover sets and maximum matchings. An 0(n2.5) algorithm is given which exploits these properties to identify the set of faulty processors in a tp-diagnosable system. The algorithm is shown to be correct, complete, not based on any conjecture, and superior to any other known fault identification algorithm for the general class of tp-diagnosable systems.

377 citations


Journal ArticleDOI
Krishnamurthy1
TL;DR: This-paper generalizes the ideas of Fiduccia and Mattheyses and suggests a class of increasingly sophisticated heuristics, and shows that the computational complexity of any specific heuristic in the suggested class remains linear in the size of the network.
Abstract: Recently, a fast (linear) heuristic for improving min-cut partitions of VLSI networks was suggested by Fiduccia and Mattheyses [6]. In this-paper we generalize their ideas and suggest a class of increasingly sophisticated heuristics. We then show, by exploiting the data structures originally suggested by them, that the computational complexity of any specific heuristic in the suggested class remains linear in the size of the network.

373 citations


Journal ArticleDOI
TL;DR: VLSI implementations have constraints which differ from those of discrete implementations, requiring another look at some of the typical FFT'algorithms in the light of these constraints.
Abstract: In some signal processing applications, it is desirable to build very high performance fast Fourier transform (FFT) processors. To meet the performance requirements, these processors are typically highly pipelined. Until the advent of VLSI, it was not possible to build a single chip which could be used to construct pipeline FFT processors of a reasonable size. However, VLSI implementations have constraints which differ from those of discrete implementations, requiring another look at some of the typical FFT'algorithms in the light of these constraints.

327 citations


Journal ArticleDOI
TL;DR: A new analytical method of computing the fault coverage that is fast compared with simulation is described that is possible to identify the ``random-pattern-resistant'' faults, modify the logic to make them easy to detect, and thus, increase the fault Coverage of the random test.
Abstract: A major problem in self testing with random inputs is verification of the test quality, i.e., the computation of the fault coverage. The brute-force approach of using full-fault simulation does not seem attractive because of the logic structure volume, and the CPU time encountered. A new approach is therefore necessary. This paper describes a new analytical method of computing the fault coverage that is fast compared with simulation. If the fault coverage falls below a certain threshold, it is possible to identify the ``random-pattern-resistant'' faults, modify the logic to make them easy to detect, and thus, increase the fault coverage of the random test.

296 citations


Journal ArticleDOI
TL;DR: The state of the art of computational geometry is surveyed, a discipline that deals with the complexity of geometric problems within the framework of the analysis of algorithms.
Abstract: We survey the state of the art of computational geometry, a discipline that deals with the complexity of geometric problems within the framework of the analysis of algorithms. This newly emerged area of activities has found numerous applications in various other disciplines, such as computer-aided design, computer graphics, operations research, pattern recognition, robotics, and statistics. Five major problem areas—convex hulls, intersections, searching, proximity, and combinatorial optimizations—are discussed. Seven algorithmic techniques—incremental construction, plane-sweep, locus, divide-and-conquer, geometric transformation, prune-and-search, and dynamization—are each illustrated with an example. A collection of problem transformations to establish lower bounds for geo-metric problems in the algebraic computation/decision model is also included.

271 citations


Journal ArticleDOI
TL;DR: This tutorial paper addresses some of the principles and provides examples of concurrent architectures and designs that have been inspired by VLSI technology.
Abstract: This tutorial paper addresses some of the principles and provides examples of concurrent architectures and designs that have been inspired by VLSI technology. The circuit density offered by VLSI provides the means for implementing systems with very large numbers of computing elements, while its physical characteristics provide an incentive to organize systems so that the elements are relatively loosely coupled. One class of computer architectures that evolve from this reasoning include an interesting and varied class of concurrent machines that adhere to a structural model based on the repetition of regularly connected elements. The systems included under this structural model range from 1) systems that combine storage and logic at a fine grain size, and are typically aimed at computations with images or storage retrieval, to 2) systems that combine registers and arithmetic at a medium grain size to form computational or systolic arrays for signal processing and matrix computations, to 3) arrays of instruction interpreting computers that use teamwork to perform many of the same demanding computations for which we use high-performance single process computers today.

Journal ArticleDOI
McCluskey1
TL;DR: A new approach to test pattern generation which is particularly suitable for self-test is described, which requires much less computation time and fault coverage is much higher—all irredundant multiple as well as single stuck faults are detected.
Abstract: A new approach to test pattern generation which is particularly suitable for self-test is described. Required computation time is much less than for present day automatic test pattern generation (ATPG) programs. Fault simulation or fault modeling is not required. More patterns may be obtained than from standard ATPG programs. However, fault coverage is much higher—all irredundant multiple as well as single stuck faults are detected. The test patterns are easily generated algorithmically either by program or hardware.

Journal ArticleDOI
TL;DR: Two systolic architectures are developed for performing the product–sum computation AB + C in the finite field GF( 2m) of 2melements, where A, B, and C are arbitrary elements of GF(2m).
Abstract: Two systolic architectures are developed for performing the product–sum computation AB + C in the finite field GF(2m) of 2melements, where A, B, and C are arbitrary elements of GF(2m). The first multiplier is a serial-in, serial-out one-dimensional systolic array, while the second multiplier is a parallel-in, parallel-out two-dimensional systolic array. The first multiplier requires a smaller number of basic cells than the second multiplier. The second multiplier heeds less average time per computation than the first multiplier if a number of computations are performed consecutively. To perform single computations both multipliers require the same computational time. In both cases the architectures are simple and regular and possess the properties of concurrency and modularity. As a consequence they are well suited for use in VLSI systems.

Journal ArticleDOI
Hennessy1
TL;DR: In a VLSI implementation of an architecture, many problems can arise from the base technology and its limitations, so the architects must be aware of these limitations and understand their implications at the instruction set level.
Abstract: A processor architecture attempts to compromise between the needs of programs hosted on the architecture and the performance attainable in implementing the architecture. The needs of programs are most accurately reflected by the dynamic use of the instruction set as the target for a high level language compiler. In VLSI, the issue of implementation of an instruction set architecture is significant in determining the features of the architecture. Recent processor architectures have focused on two major trends: large microcoded instruction sets and simplified, or reduced, instruction sets. The attractiveness of these two approaches is affected by the choice of a single-chip implementation. The two different styles require different tradeoffs to attain an implementation in silicon with a reasonable area. The two styles consume the chip area for different purposes, thus achieving performance by different strategies. In a VLSI implementation of an architecture, many problems can arise from the base technology and its limitations. Although circuit design techniques can help alleviate many of these problems, the architects must be aware of these limitations and understand their implications at the instruction set level.

Journal ArticleDOI
TL;DR: By line digraph iterations it is possible to construct digraphs with a number of vertices larger than (d2- l)/d2times the (nonattainable) Moore bound, which solves the (d, k) digraph problem for k = 2.
Abstract: This paper studies the behavior of the diameter and the average distance between vertices of the line digraph of a given digraph. The results obtained are then applied to the so-called (d, k) digraph problem, that is, to maximize the number of vertices in a digraph of maximum out-degree d and diameter k. By line digraph iterations it is possible to construct digraphs with a number of vertices larger than (d2- l)/d2times the (nonattainable) Moore bound. In particular, this solves the (d, k) digraph problem for k = 2. Also, the line digraph technique provides us with a simple local routing algorithm for the corresponding networks.

Journal ArticleDOI
TL;DR: A new and comprehensive model of the instruction execution process is developed and it is shown that with the use of appropriate codewords all faults can be classified into three types.
Abstract: This paper presents a new and systematic method to generate tests for microprocessors. A functional level model for the microprocessor is used and it is represented by a reduced graph. A new and comprehensive model of the instruction execution process is developed. Various types of faults are analyzed and it is shown that with the use of appropriate codewords all faults can be classified into three types. This gives rise to a systematic procedure to generate tests which is independent of the microprocessor implementation details. Tests are given to detect faults in any microprocessor, first for the READ register instructions, and then for the remaining instructions. These tests can be executed by the microprocessor in a self-test mode, thus dispensing with the need for an external tester.

Journal ArticleDOI
Nicolau1, Fisher
TL;DR: This paper focuses on long instruction word architectures, such as attached scientific processors and horizontally microcoded CPU's, and argues that even if the authors had infinite hardware, these architectures could not provide a speedup of more than a factor of 2 or 3 on real programs.
Abstract: Long instruction word architectures, such as attached scientific processors and horizontally microcoded CPU's, are a popular means of obtaining code speedup via fine-grained parallelism. The falling cost of hardware holds out the hope of using these architectures for much more parallelism. But this hope has been diminished by experiments measuring how much parallelism is available in the code to start with. These experiments implied that even if we had infinite hardware, long instruction word architectures could not provide a speedup of more than a factor of 2 or 3 on real programs.

Journal ArticleDOI
Heidelberger1, Lavenberg
TL;DR: This survey of the major quantitative methods used in computer performance evaluation, focusing on post-1970 developments and emphasizing trends and challenges, divides the methods into three main areas, namely performance measurement, analytic performance modeling, and simulation performance modeling.
Abstract: The quantitative evaluation of computer performance is needed during the entire life cycle of a computer system. We survey the major quantitative methods used in computer performance evaluation, focusing on post-1970 developments and emphasizing trends and challenges. We divide the methods used into three main areas, namely performance measurement, analytic performance modeling, and simulation performance modeling, which we survey in the three main sections of the paper. Although we concentrate on the methods per se, rather than on the results of applying the methods, numerous application examples are cited. The methods to be covered have been applied across the entire spectrum of computer systems from personal computers to large mainframes and supercomputers, including both centralized and distributed systems. The application of these methods has certainly not decreased over the years and we anticipate their continued use as well as their enhancement when needed to evaluate future systems.

Journal ArticleDOI
Savir1, Bardell1
TL;DR: In this article, the authors examined the problem of fault detection in the presence of nonmasking multiple faults treated, and the question of distinguishing between them is also examined, showing that a test that merely exposes each fault has a high probability of distinguishing the faults.
Abstract: The testing of large logic networks with random patterns is examined. Work by previous workers for single faults is extended to a class of multiple fault situations. Not only is the problem of fault detection in the presence of nonmasking multiple faults treated, but the question of distinguishing between them is also examined. It is shown that a test that merely exposes each fault has a high probability of distinguishing between the faults. The relationships between quality, diagnostic resolution, and random pattern test length are developed. The results have application to self-test schemes that use random patterns as stimuli.

Journal ArticleDOI
Fine1, Tobagi
TL;DR: In this paper, the authors present many implicit-token DAMA schemes in a unified manner grouped according to their basic access mechanisms, and compare them in terms of performance and other important attributes.
Abstract: Local area communications networks based on packet broadcasting techniques provide simple architectures and efficient and flexible operation. Various ring systems and CSMA contention bus systems have been in operation for several years. More recently, a number of distributed demand assignment multiple access (DAMA) schemes suitable for broadcast bus networks have emerged which provide conflict-free broadcast communications by means of various scheduling techniques. Among these schemes, the Token-Passing Bus Access method uses explicit tokens, i.e., control messages, to provide the required scheduling. Others use implicit tokens, whereby stations in the network rely on information deduced from the activity on the bus to schedule their transmissions. In this paper we present many implicit-token DAMA schemes in a unified manner grouped according to their basic access mechanisms, and compare them in terms of performance and other important attributes.

Journal ArticleDOI
TL;DR: A two-phase algorithm for finding the maximum of a set of values stored one per processor on an n X n array of processors that uses conventional links during the first phase and the global bus during the second is presented.
Abstract: The problem of finding the maximum of a set of values stored one per processor on an n X n array of processors is analyzed. The array has a time-shared global bus in addition to conventional processor-processor links. A two-phase algorithm for finding the maximum is presented that uses conventional links during the first phase and the global bus during the second. This algorithm is faster than algorithms that use either only the global bus or only the conventional links.

Journal ArticleDOI
Sasao1
TL;DR: A PLA minimization system having the following features is presented: minimization of both two-level PLA's and PLA's with two-bit decoders, and essential prime implicants detection without generating all the primeimplicants.
Abstract: A PLA minimization system having the following features is presented: 1) minimization of both two-level PLA's and PLA's with two-bit decoders; 2) optimal input variable assignment to the decoders; 3) optimal output phase assignment; and 4) essential prime implicants detection without generating all the prime implicants.

Journal ArticleDOI
TL;DR: This paper identifies six fundamental distributed computer system research issues, points out open research problems in these areas, and describes how these six issues and solutions to problems associated with them transect the communications subnet, the distributed operating system, and the distributed database areas.
Abstract: Distributed computer systems have been the subject of a vast amount of research. Many prototype distributed computer systems have been built at university, industrial, commercial, and government research laboratories, and production systems of all sizes and types have proliferated. It is impossible to survey all distributed computing system research. Instead, this paper identifies six fundamental distributed computer system research issues, points out open research problems in these areas, and describes how these six issues and solutions to problems associated with them transect the communications subnet, the distributed operating system, and the distributed database areas. It is intended that this perspective on distributed computer system research serve as a form of survey, but more importantly to illustrate and encourage a better integration and exchange of ideas from various subareas of distributed computer system research.

Journal ArticleDOI
TL;DR: A survey of architectural approaches to fault-tolerant design is conducted, emphasizing the basic concepts employed in the design of these systems, and the tradeoffs and alternatives available to the system designer in attempting to meet applications requirements.
Abstract: This paper presents a brief history of fault-tolerant computing. This is followed by a survey of architectural approaches to fault-tolerant design, emphasizing the basic concepts employed in the design of these systems, and the tradeoffs and alternatives available to the system designer in attempting to meet applications requirements. Classes of fault-tolerance applications are identified, along with design approaches which are applicable, and several problem areas are identified in which new research results are badly needed.

Journal ArticleDOI
TL;DR: In this correspondence, the reliability and fault tolerance issues in binary tree architecture with spares are considered and two different fault-tolerance mechanisms are described and studied.
Abstract: Binary tree network architectures are applicable in the design of hierarchical computing systems and in specialized high-performance computers. In this correspondence, the reliability and fault tolerance issues in binary tree architecture with spares are considered. Two different fault-tolerance mechanisms are described and studied, namely: 1) scheme with spares; and 2) scheme with performance degradation. Reliability analysis and estimation of the fault-tolerant binary tree structures are performed using the interactive ARIES 82 program. The discussion is restricted to the topological level, and certain extensions of the schemes are also discussed.

Journal ArticleDOI
TL;DR: New systolic arrays that can lead to efricient VLSI solutions to both the G CD problem and the extended GCD problem are described.
Abstract: The problem of finding a greatest common divisor (GCD) of any two nonzero polynomials is fundamental to algebraic and symbolic computations, as well as to the decoder implementation for a variety of error-correcting codes. This paper describes new systolic arrays that can lead to efricient VLSI solutions to both the GCD problem and the extended GCD problem.

Journal ArticleDOI
TL;DR: Sufficient conditions are obtained for designing a distributed fault-tolerant system by employing the given algorithm, which has the interesting property that it lets as many as all of the nodes and internode communication facilities fail, but upon repair or replacement of faulty facilities, the system can converge to normal operation if no more than a certain number of facilities remain faulty.
Abstract: The problem of designing distributed fault-tolerant computing systems is considered. A model in which the network nodes are assumed to possess the ability to "test" certain other network facilities for the presence of failures is employed. Using this model, a distributed algorithm is presented which allows all the network nodes to correctly reach independent diagnoses of the condition (faulty or fault-free) of all the network nodes and internode communication facilities, provided the total number of failures oes not exceed a given bound. The proposed algorithm allows for the reentry of repaired or replaced faulty facilities back into the network, and it also has provisions for adding new nodes to the system. Sufficient conditions are obtained for designing a distributed fault-tolerant system by employing the given algorithm. The algorithm has the interesting property that it lets as many as all of the nodes and internode communication facilities fail, but upon repair or replacement of faulty facilities, the system can converge to normal operation if no more than a certain number of facilities remain faulty.

Journal ArticleDOI
TL;DR: The testability of two well-known array multiplier structures is studied in detail and it is shown that, with appropriate cell design, array multipliers can be designed to be very easily testable.
Abstract: Array multipliers are well suited for VLSI implementation because of the regularity in their iterative structure. However, most VLSI circuits are difficult to test. This correspondence shows that, with appropriate cell design, array multipliers can be designed to be very easily testable. An array multiplier is called C-testable if all its adder cells can be exhaustively tested while requiring only a constant number of test patterns. The testability of two well-known array multiplier structures is studied in detail. The conventional design of the carry–save array multiplier is modified. The modified design is shown to be C-testable and requires only 16 test patterns. Similar results are obtained for the Baugh–Wooley two's complement array multiplier. A modified design of the Baugh–Wooley array multiplier is shown to be C-testable and requires 55 test patterns. The C-testability of two other array multipliers, namely the carry–propagate and the TRW designs, is also presented.

Journal ArticleDOI
TL;DR: A detailed terminal reliability analysis of the Gamma network is performed, deriving expressions for the reliability between an input and output terminal and the permuting capabilities of the gamma network.
Abstract: The Gamma network is an interconnection network connecting N = 2n inputs to N outputs. It is a multistage network with N switches per stage, each of which is a 3 input, 3 output crossbar. The stages are linked via "power of two" and identify connections in such a way that redundant paths exist between the input and output terminals. In this network, a path from a source to a destination may be represented using one of the redundant forms of the difference between the source and destination numbers. The redundancy in paths may thus be studied using the theory of redundant number systems. Results are obtained on the distribution of paths connecting inputs and outputs, and the permuting capabilities of the Gamma network. Frequently used permutations and control mechanisms are discussed briefly. We also perform a detailed terminal reliability analysis of the Gamma network, deriving expressions for the reliability between an input and output terminal.

Journal ArticleDOI
TL;DR: This work focuses on register-register architectures like the CRAY-1 where pipeline control logic is localized to one or two pipeline stages and is referred to as "instruction issue logic."
Abstract: Basic principles and design tradeoffs for control of pipelined processors are first discussed. We concentrate on register-register architectures like the CRAY-1 where pipeline control logic is localized to one or two pipeline stages and is referred to as "instruction issue logic." Design tradeoffs are explored by giving designs for a variety of instruction issue methods that represent a range of complexity and sophistication. These vary from the CRAY-1 issue logic to a version of Tomasulo's algorithm, first used in the IBM 360/91 floating point unit. Also studied are Thornton's "scoreboard" algorithm used on the CDC 6600 and an algorithm we have devised. To provide a standard for comparison, all the issue methods are used to implement the CRAY-1 scalar architecture. Then, using a simulation model and the Lawrence Livermore Loops compiled with the CRAY Fortran compiler, performance results for the various issue methods are given and discussed.