scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Computers in 1982"


Journal ArticleDOI
TL;DR: It is shown that addition of n-bit binary numbers can be performed on a chip with a regular layout in time proportional to log n and with area proportional to n.
Abstract: With VLSI architecture, the chip area and design regularity represent a better measure of cost than the conventional gate count. We show that addition of n-bit binary numbers can be performed on a chip with a regular layout in time proportional to log n and with area proportional to n.

1,147 citations


Journal ArticleDOI
TL;DR: An isomorphism between the behavior of Petri nets with exponentially distributed transition rates and Markov processes is presented and this work solves for the steady state average message delay and throughput on a communication link when the alternating bit protocol is used for error recovery.
Abstract: An isomorphism between the behavior of Petri nets with exponentially distributed transition rates and Markov processes is presented. In particular, k-bounded Petri nets are isomorphic to finite Markov processes and can be solved by standard techniques if k is not too large. As a practical example, we solve for the steady state average message delay and throughput on a communication link when the alternating bit protocol is used for error recovery.

1,090 citations


Journal ArticleDOI
TL;DR: This paper discusses elections and reorganizations of active nodes in a distributed computing system after a failure, and two types of reasonable failure environments are studied.
Abstract: After a failure occurs in a distributed computing system, it is often necessary to reorganize the active nodes so that they can continue to perform a useful task. The first step in such a reorganization or reconfiguration is to elect a coordinator node to manage the operation. This paper discusses such elections and reorganizations. Two types of reasonable failure environments are studied. For each environment assertions which define the meaning of an election are presented. An election algorithm which satisfies the assertions is presented for each environment.

647 citations


Journal ArticleDOI
Williams1, Parker
TL;DR: The different techniques of design for testability are discussed in detail, including techniques which can be applied to today's technologies and techniques which have been recently introduced and will soon appear in new designs.
Abstract: This paper discusses the basics of design for testability. A short review of testing is given along with some reasons why one should test. The different techniques of design for testability are discussed in detail. These include techniques which can be applied to today's technologies and techniques which have been recently introduced and will soon appear in new designs.

428 citations


Journal ArticleDOI
TL;DR: It is shown that the k-nearest neighbor problem and other seemingly unrelated problems can be solved efficiently with the Voronoi diagram.
Abstract: The notion of Voronoi diagram for a set of N points in the Euclidean plane is generalized to the Voronoi diagram of order k and an iterative algorithm to construct the generalized diagram in 0(k2N log N) time using 0(k2(N − k)) space is presented. It is shown that the k-nearest neighbor problem and other seemingly unrelated problems can be solved efficiently with the diagram.

361 citations


Journal ArticleDOI
TL;DR: It is shown that for most practical ALU implementations, including the carry-lookahead adders, the RESO technique will detect all errors caused by faults in a bit-slice or a specific subcircuit of the bit slice.
Abstract: A new method of concurrent error detection in the Arithmetic and Logic Units (ALU's) is proposed. This method, called "Recomputing with Shifted Operands" (RESO), can detect errors in both the arithmetic and logic operations. RESO uses the principle of time redundancy in detecting the errors and achieves its error detection capability through the use of the already existing replicated hardware in the form of identical bit slices. It is shown that for most practical ALU implementations, including the carry-lookahead adders, the RESO technique will detect all errors caused by faults in a bit-slice or a specific subcircuit of the bit slice. The fault model used is more general than the commonly assumed stuck-at fault model. Our fault model assumes that the faults are confined to a small area of the circuit and that the precise nature of the faults is not known. This model is very appropriate for the VLSI circuits.

344 citations


Journal ArticleDOI
TL;DR: In this paper, a task allocation model that allocates application tasks among processors in distributed computing systems satisfying minimum interprocessor communication cost, balanced utilization of each processor, and all engineering application requirements is presented.
Abstract: This paper presents a task allocation model that allocates application tasks among processors in distributed computing systems satisfying: 1) minimum interprocessor communication cost, 2) balanced utilization of each processor, and 3) all engineering application requirements.

328 citations


Journal ArticleDOI
Adams1, Siegel1
TL;DR: It is shown that the ESC provides fault tolerance for any single failure, and the network can be controlled even when it has a failure, using a simple modification of a routing tag scheme proposed for the Generalized Cube.
Abstract: The Extra Stage Cube (ESC) interconnection network, a fault-tolerant structure, is proposed for use in large-scale parallel and distributed supercomputer systems. It has all of the interconnecting capabilities of the multistage cube-type networks that have been proposed for many supersystems. The ESC is derived from the Generalized Cube network by the addition of one stage of interchange boxes and a bypass capability for two stages. It is shown that the ESC provides fault tolerance for any single failure. Further, the network can be controlled even when it has a failure, using a simple modification of a routing tag scheme proposed for the Generalized Cube. Both one-to-one and broadcast connections under routing tag control are performable by the faulted ESC. The ability of the ESC to operate with multiple faults is examined. The ways in which the ESC can be partitioned and permute data are described.

328 citations


Journal ArticleDOI
TL;DR: In this paper, the authors describe the development of a wavefront-based language and architecture for a programmable special-purpose multiprocessor array (POMP) based on the notion of computational wavefront.
Abstract: This paper describes the development of a wavefront-based language and architecture for a programmable special-purpose multiprocessor array. Based on the notion of computational wavefront, the hardware of the processor array is designed to provide a computing medium that preserves the key properties of the wavefront. In conjunction, a wavefront language (MDFL) is introduced that drastically reduces the complexity of the description of parallel algorithms and simulates the wavefront propagation across the computing network. Together, the hardware and the language lead to a programmable wavefront array processor (WAP). The WAP blends the advantages of the dedicated systolic array and the general-purpose data-flow machine, and provides a powerful tool for the high-speed execution of a large class of matrix operations and related algorithms which have widespread applications.

263 citations


Journal ArticleDOI
Meyer1
TL;DR: This paper considers the modeling of a degradable buffer/multiprocessor system whose performance Y is the (normalized) average throughput rate realized during a bounded interval of time and shows that a closed-form solution of performability can indeed be obtained.
Abstract: If computing system performance is degradable, then as recognized in a number of recent studies, system evaluation must deal simultaneously with aspects of both performance and reliability. One approach is the evaluation of a system's "performability," which relative to a specified performance variable Y, generally requires solution of the probability distribution function of Y. In this paper we examine the feasibility of closed-form solutions of performability when Y is continuous. In particular, we consider the modeling of a degradable buffer/multiprocessor system whose performance Y is the (normalized) average throughput rate realized during a bounded interval of time. Employing an approximate decomposition of the model, we show that a closed-form solution can indeed be obtained.

213 citations


Journal ArticleDOI
Cristian1
TL;DR: A unified point of view on programmed exception handling and default exception handling based on automatic backward recovery based onautomatic backward recovery is constructed, finding a class of faults for which default exception Handling can provide effective fault tolerance.
Abstract: Some basic concepts underlying the issue of fault-tolerant software design are investigated. Relying on these concepts, a unified point of view on programmed exception handling and default exception handling based on automatic backward recovery is constructed. The cause–effect relationship between software design faults and failure occurrences is explored and a class of faults for which default exception handling can provide effective fault tolerance is characterized. It is also shown that there exists a second class of design faults which cannot be tolerated by using default exception handling. The role that software verification methods can play in avoiding the production of such faults is discussed.

Journal ArticleDOI
Pradhan1, Reddy
TL;DR: A communication architecture for distributed processors is presented, based on a new topolgy developed, one which interconnects n nodes by using rn links where the maximum internode distance is logrn, and where each node has, at most, 2r, I/O ports.
Abstract: A communication architecture for distributed processors is presented here. This architecture is based on a new topolgy we have developed, one which interconnects n nodes by using rn links where the maximum internode distance is logrn, and where each node has, at most, 2r, I/O ports. It is also shown that this network is fault-tolerant, being able to tolerate up to (r − 1) node failures.

Journal ArticleDOI
TL;DR: A memory system designed for parallel array access based on the use of a prime nwnber of memories and a powerful combination of indexing hardware and data alignment switches is described.
Abstract: In this paper we describe a memory system designed for parallel array access. The system is based on the use of a prime nwnber of memories and a powerful combination of indexing hardware and data alignment switches. Particular emphasis is placed on the indexing equations and their implementation.

Journal ArticleDOI
TL;DR: A parallel algorithm to determine the switch settings for a Benes permutation network is developed and runs in 0(N½) time on an N½-mesh-connected computer and 0(log4) time on both a cube connected and a perfect shuffle computer with N processing elements.
Abstract: A parallel algorithm to determine the switch settings for a Benes permutation network is developed. This algorithm can determine the switch settings for an N input/output Benes network in 0(log2N) time when a fully interconnected parallel computer with N processing elements is used. The algorithm runs in 0(N½) time on an N½× N½mesh-connected computer and 0(log4N) time on both a cube connected and a perfect shuffle computer with N processing elements. It runs in 0(k log3N) time on cube connected and perfect shuffle computers with N1+1/kprocessing elements.

Journal ArticleDOI
Lu1
TL;DR: The use of watchdog processors in the implementation of Structural Integrity Checking (SIC) is described and a model for ideal SIC is given in terms of formal languages and automata.
Abstract: The use of watchdog processors in the implementation of Structural Integrity Checking (SIC) is described. A model for ideal SIC is given in terms of formal languages and automata. Techniques for use in implementing SIC are presented. The modification of a Pascal compiler into an SIC Pascal preprocessor is summarized.

Journal ArticleDOI
TL;DR: This correspondence is concerned with the development of algorithms for special-purpose VLSI arrays and the approach used is to identify algorithm transformations which modify favorably the index set and the data dependences, but perserve the ordering imposed on theindex set by the data dependsences.
Abstract: This correspondence is concerned with the development of algorithms for special-purpose VLSI arrays. The approach used in this correspondence is to identify algorithm transformations which modify favorably the index set and the data dependences, but perserve the ordering imposed on the index set by the data dependences. Conditions for the existance of such transformations are given for a class of algorithms. Also, a methodology is proposed for the synthesis of VLSI algorithms.

Journal ArticleDOI
TL;DR: A new modeling methodology to characterize failure processes in digital computers due to hardware transients is presented, and models of common fault-tolerant redundant structures are developed using decreasing hazard function distributions.
Abstract: In this paper a new modeling methodology to characterize failure processes in digital computers due to hardware transients is presented. The basic assumption made is that system sensitivity to hardware transient errors is a function of critical resources usage. The failure rate of a given resource is approximated by a deterministic function of time, depending on the average workload of that resource, plus a Gaussian process. The probability density function of the time to failure obtained under this assumption has a decreasing hazard function, explaining why decreasing hazard function densities such as the Weibull fit experimental data so well. Data on transient errors obtained from several systems are analyzed. Statistical tests confirm the good fit between decreasing hazard distributions and actual data. Finally, models of common fault-tolerant redundant structures are developed using decreasing hazard function distributions. The analysis indicates significant differences between reliability predictions based on the exponential distribution and those based on decreasing hazard function distributions. Reliability differences of 0.2 and factors greater than 2 in Mission Time Improvement are seen in model results. System designers should be aware of these differences.

Journal ArticleDOI
TL;DR: This correspondence develops an algorithm to perform BPC permutations on a cube connected SIMD computer that is shown to be optimal in the sense that it uses the fewest possible number of unit routes to accomplish any B PC permutation.
Abstract: In this correspondence we develop an algorithm to perform BPC permutations on a cube connected SIMD computer. The class of BPC permutations includes many of the frequently occurring permutations such as matrix transpose, vector reversal, bit shuffle, and perfect shuffle. Our algorithm is shown to be optimal in the sense that it uses the fewest possible number of unit routes to accomplish any BPC permutation.

Journal ArticleDOI
TL;DR: This correspondence analyzes the computational complexity of fault detection problems for combinational circuits and proposes an approach to design for testability, and shows that for k-level (k ≥ 3) monotone/unate circuits these problems are still NP-complete, but that these are solvable in polynomial time for 2-level monot one/ unate circuits.
Abstract: In this correspondence we analyze the computational complexity of fault detection problems for combinational circuits and propose an approach to design for testability. Although major fault detection problems have been known to be in general NP-complete, they were proven for rather complex circuits. In this correspondence we show that these are still NP-complete even for monotone circuits, and thus for unate circuits. We show that for k-level (k ≥ 3) monotone/unate circuits these problems are still NP-complete, but that these are solvable in polynomial time for 2-level monotone/unate circuits. A class of circuits for which these fault detection problems are solvable in polynomial time is presented. Ripple-carry adders, decoder circuits, linear circuits, etc., belong to this class. A design approach is also presented in which an arbitrary given circuit is changed to such an easily testable circuit by inserting a few additional test-points.

Journal ArticleDOI
Bose1, Rao
TL;DR: This paper defines symmetric, asymmetric, and unidirectional error classes and derives the necessary and sufficient conditions for a binary code to be uniddirectional error correcting/detecting.
Abstract: In this paper we present some basic theory on unidirectional error correcting/detecting codes. We define symmetric, asymmetric, and unidirectional error classes and proceed to derive the necessary and sufficient conditions for a binary code to be unidirectional error correcting/detecting.

Journal ArticleDOI
Feuer1
TL;DR: This paper develops a relation between the partitioning properties of computer logic and the distribution of connection lengths and finds that an exponential partitioning function leads to an inverse power law length distribution.
Abstract: This paper develops a relation between the partitioning properties of computer logic and the distribution of connection lengths. The computation of length distributions is important for wirability analysis and delay estimation. The principal result is that an exponential partitioning function leads to an inverse power law length distribution.

Journal ArticleDOI
TL;DR: In this article, an analytical model for the program behavior of a multitasked system is introduced, including the behavior of each process and the interactions between processes with regard to the sharing of data blocks.
Abstract: In many commercial multiprocessor systems, each processor accesses the memory through a private cache. One problem that could limit the extensibility of the system and its performance is the enforcement of cache coherence. A mechanism must exist which prevents the existence of several different copies of the same data block in different private caches. In this paper, we present an in-depth analysis of the effects of cache coherency in multiprocessors. A novel analytical model for the program behavior of a multitasked system is introduced. The model includes the behavior of each process and the interactions between processes with regard to the sharing of data blocks. An approximation is developed to derive the main effects of the cache coherency contributing to degradations in system performance.

Journal ArticleDOI
TL;DR: Two bit-serial parallel processing systems are developed: an airborne associative processor and a ground based massively parallel processor.
Abstract: About a decade ago, a bit-serial parallel processing system STARAN®1 was developed. It used standard integrated circuits that were available at that time. Now, with the availability of VLSI, a much greater processing capability can be packed in a unit volume. This has led to the recent development of two bit-serial parallel processing systems: an airborne associative processor and a ground based massively parallel processor.

Journal ArticleDOI
TL;DR: The effective bandwidth in a multiprocessor with shared memory with N processors and N memory modules is compared using as interconnection networks the crossbar or the multiple-bus.
Abstract: In this paper we compare the effective bandwidth in a multiprocessor with shared memory using as interconnection networks the crossbar or the multiple-bus. We consider a system with N processors and N memory modules, in which the processor requests to the memory modules are independent and uniformly distributed random variables. We consider two cases: in the first the processor makes another request immediately after a memory service, and in the second there is some internal processing time.

Journal ArticleDOI
TL;DR: The Burroughs Scientific Processor (BSP) as mentioned in this paper was a high-performance computer system that performed the Department of Energy LLL loops at roughly the speed of the CRAY-1.
Abstract: The Burroughs Scientific Processor (BSP), a high-performance computer system, performed the Department of Energy LLL loops at roughly the speed of the CRAY-1. The BSP combined parallelism and pipelining, performing memory-to-memory operations. Seventeen memory units and two crossbar switch data alignment networks provided conflict-free access to most indexed arrays. Fast linear recurrence algorithms provided good performance on constructs that some machines execute serially. A system manager computer ran the operating system and a vectorizing Fortran compiler. An MOS file memory system served as a high bandwidth secondary memory.

Journal ArticleDOI
TL;DR: Markovian models are developed for the performance analysis of multiprocessor systems intercommunicating via a set of buses and are found to be surprisingly accurate for a wide range of configurations.
Abstract: Markovian models are developed for the performance analysis of multiprocessor systems intercommunicating via a set of buses. The performance index is the average number of active processors, called processing power. From processing power a variety of other performance measures can be derived as dictated by the specific processor application. Exact models are first introduced and are illustrated with a simple example. The computational complexity of the exact models is shown to increase very rapidly with system size, thus making the exact analysis impractical even for medium size systems. To overcome the complexity of computation, several approximate models are introduced. The approximate results are compared with the exact ones and found to be surprisingly accurate for a wide range of configurations. Simulation is used to validate the analytic models and to test their robustness.

Journal ArticleDOI
TL;DR: Two types of efficient algorithms for fast implementation of the 2-D discrete cosine transform (2-D DCT) are developed and they reduce the number of multiplications significantly compared to the fast algorithm developed by Chen et al.
Abstract: Two types of efficient algorithms for fast implementation of the 2-D discrete cosine transform (2-D DCT) are developed. One involves recursive structure which implies that the algorithm for (M/2 X N/2) block be extended to (M X N/2) (M/2 X M) and (M X N) blocks (M and N are integer powers of two). The second algorithm is nonrecursive and therefore it has to be tailored for each block size. Both algorithms involve real arithmetic and they reduce the number of multiplications significantly compared to the fast algorithm developed by Chen et al. [8], while the number of additions remain unchanged.

Journal ArticleDOI
TL;DR: While traditional logic is useful for specifying combinational circuits, it is shown how the extensions of temporal logic apply to the specification of memory, as well as the safeness and liveness properties of active circuits representing processes.
Abstract: The use of temporal logic for the specification of hardware modules is explored. Temporal logic is an extension of conventional logic. While traditional logic is useful for specifying combinational circuits, it is shown how the extensions of temporal logic apply to the specification of memory, as well as the safeness and liveness properties of active circuits representing processes. These ideas are demonstrated by the example of a self-timed arbiter. An implementation of the arbiter is also given, and its formal verification by a kind of reachability analysis is discussed. This verification approach is also useful for finding design errors, as demonstrated by an example.

Journal ArticleDOI
TL;DR: The main results in this paper demonstrate that there exist pairs of integers 〈E, D〉 such that any n-vertex rectangular grid can be embedded into a square grid having at most En vertices, in such a way that images in the square grid of vertices that are adjacent in the rectangular grid are at most distance D apart.
Abstract: The main results in this paper demonstrate that there exist pairs of integers 〈E, D〉 (for "area Expansion" and "edge Dilation," respectively) such that any n-vertex rectangular grid can be embedded into a square grid having at most En vertices, in such a way that images in the square grid of vertices that are adjacent in the rectangular grid are at most distance D apart. Several techniques for "squaring"-up rectangular grids are presented; sample values for the parameter-pair 〈E, D〉 are: 〈E = 1.2, D = 15〉, 〈E = 1.45, D = 9〉, 〈E = 1.8, D = 3〉. Note that these values of E and D hold for all rectangular grids, independent of number of vertices. The quest for these results was motivated by the question of whether or not one could automatically "square up" circuit layouts having aspect ratios very far from unity, without compromising the efficiency of the layout (in terms of area and length of the longest run of wire). The results reported here yield an affirmative answer to this question, at least in an idealized setting. One corollary of the embeddings presented here is that the side-2n½ square "king's-move" grid contains as a subgraph every n-vertex rectangular grid. Another way to think of this result is that this embellished grid can be "programmed" or "personalized," by setting switches, to represent any n-vertex rectangular grid.

Journal ArticleDOI
Heidelberger1, Trivedi
TL;DR: Computer performance models of parallel processing systems in which a job subdivides into two or more tasks at some point during its execution are considered and an approximate solution method is developed.
Abstract: Computer performance models of parallel processing systems in which a job subdivides into two or more tasks at some point during its execution are considered. Except for queueing effects, the tasks execute independently of one another and do not require synchronization. An approximate solution method is developed and results of the approximation are compared to those of simulations. Bounds on the performance improvement due to overlap are derived.